Skip to main content

Showing 1–50 of 99 results for author: Swaminathan, A

  1. arXiv:2406.16218  [pdf, other

    cs.AI cs.LG

    Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

    Authors: Ching-An Cheng, Allen Nie, Adith Swaminathan

    Abstract: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. We propose an end-to-end optimization framework, Trace, which treats the computational workflow of an AI system as a graph akin to neural networks, based on a generalization of back-propagation. Optimization of computational workflows often involves ri… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.09743  [pdf, ps, other

    math.CA nlin.SI

    Stability of the Toda equations related to a perturbed $R_i$ type recurrence relation

    Authors: Vinay Shukla, A. Swaminathan

    Abstract: In this manuscript, a modified $R_I$ type recurrence relation is considered whose recurrence coefficients are perturbed by addition or multiplication of a constant. The perturbed system of recurrence coefficients is represented by Toda lattice equations, which are derived. These equations are then represented in a matrix form. With the help of this matrix representation, a known Lax pair is recove… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 19 figures

    MSC Class: 42C05; 15A24; 37K10

  3. arXiv:2406.08431  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

    Authors: Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.01633  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    On Overcoming Miscalibrated Conversational Priors in LLM-based Chatbots

    Authors: Christine Herlihy, Jennifer Neville, Tobias Schnabel, Adith Swaminathan

    Abstract: We explore the use of Large Language Model (LLM-based) chatbots to power recommender systems. We observe that the chatbots respond poorly when they encounter under-specified requests (e.g., they make incorrect assumptions, hedge with a long response, or refuse to answer). We conjecture that such miscalibrated response tendencies (i.e., conversational priors) can be attributed to LLM fine-tuning us… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Preprint of UAI'24 conference publication

  5. arXiv:2405.16434  [pdf, other

    cs.AI cs.CL cs.NE

    The Importance of Directional Feedback for LLM-based Optimizers

    Authors: Allen Nie, Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

    Abstract: We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback. Inspired by the classical optimization literature, we classify the natural language feedback into directional and non-directional, where the former is a generalization of the first-order feedback to the natural lan… ▽ More

    Submitted 20 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted and Presented at Foundation Models for Decision Making at NeurIPS 2023 (December 15, 2023). Work completed from June 2023 to September 2023

  6. arXiv:2405.11959  [pdf, other

    math.CA

    A common zero at the end point of the support of measure for the quasi-natured spectrally transformed polynomials

    Authors: Vikash Kumar, A. Swaminathan

    Abstract: In this work, the explicit expressions of coefficients involved in quasi-type kernel polynomials of order one and quasi-Geronimus polynomials of order one are determined for Jacobi polynomials. These coefficients are responsible for establishing the orthogonality of quasi-spectral polynomials for Jacobi polynomials. Additionally, the orthogonality of quasi-type kernel Laguerre polynomials of order… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures

    MSC Class: 42C05; 33C45; 26C10

  7. arXiv:2405.09421  [pdf, ps, other

    math.NT math.AG

    A positive proportion of monic odd-degree hyperelliptic curves of genus $g \geq 4$ have no unexpected quadratic points

    Authors: Jef Laga, Ashvin A. Swaminathan

    Abstract: Let $\mathcal{F}_g$ be the family of monic odd-degree hyperelliptic curves of genus $g$ over $\mathbb{Q}$. Poonen and Stoll have shown that for every $g \geq 3$, a positive proportion of curves in $\mathcal{F}_g$ have no rational points except the point at infinity. In this note, we prove the analogue for quadratic points: for each $g\geq 4$, a positive proportion of curves in $\mathcal{F}_g$ have… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 10 pages

    MSC Class: 11G30 (Primary) 14G05; 14H25 (Secondary)

  8. arXiv:2405.05256  [pdf, other

    cs.CV cs.AI cs.LG

    THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

    Authors: Prannay Kaul, Zhizhong Li, Hao Yang, Yonatan Dukler, Ashwin Swaminathan, C. J. Taylor, Stefano Soatto

    Abstract: Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats -- typically a multiple-choice response regarding a particular object or attribute -- which we term "Typ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: In CVPR 2024

  9. arXiv:2404.18065  [pdf, other

    cs.CV cs.AI

    Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

    Authors: Xiaolong Li, Jiawei Mo, Ying Wang, Chethan Parameshwara, Xiaohan Fei, Ashwin Swaminathan, CJ Taylor, Zhuowen Tu, Paolo Favaro, Stefano Soatto

    Abstract: In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model. Multi-view diffusion models, such as MVDream, have shown to generate high-fidelity 3D assets using score distillation sampling (SDS). However, applied na… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 9 pages, 10 figures

  10. arXiv:2404.04469  [pdf, other

    cs.CV

    Mixed-Query Transformer: A Unified Image Segmentation Architecture

    Authors: Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto

    Abstract: Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. In this paper, we introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  11. arXiv:2404.02883  [pdf, other

    cs.CV cs.AI cs.LG

    On the Scalability of Diffusion-based Text-to-Image Generation

    Authors: Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto

    Abstract: Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for better performance at reduced cost. The different training settings and expensive training cost make a fair model comparison extremely difficult. In this work,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  12. arXiv:2403.18920  [pdf, other

    cs.CR cs.AI cs.CV

    CPR: Retrieval Augmented Generation for Copyright Protection

    Authors: Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano Soatto

    Abstract: Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model's output. To reduce risks of leaking private information contain… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  13. arXiv:2403.14003  [pdf, other

    cs.CV cs.CL cs.LG

    Multi-Modal Hallucination Control by Visual Information Grounding

    Authors: Alessandro Favero, Luca Zancato, Matthew Trager, Siddharth Choudhary, Pramuditha Perera, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: Generative Vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as "hallucination" and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreas… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  14. arXiv:2403.11024  [pdf

    cs.CV

    Fast Sparse View Guided NeRF Update for Object Reconfigurations

    Authors: Ziqi Lu, Jianbo Ye, Xiaohan Fei, Xiaolong Li, Jiawei Mo, Ashwin Swaminathan, Stefano Soatto

    Abstract: Neural Radiance Field (NeRF), as an implicit 3D scene representation, lacks inherent ability to accommodate changes made to the initial static scene. If objects are reconfigured, it is difficult to update the NeRF to reflect the new state of the scene without time-consuming data re-capturing and NeRF re-training. To address this limitation, we develop the first update method for NeRFs to physical… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  15. arXiv:2403.03789  [pdf, other

    math.CA

    Recovering orthogonality from quasi-nature of Spectral transformations

    Authors: Vikash Kumar, Francisco Marcellán, A. Swaminathan

    Abstract: In this contribution, quasi-orthogonality of polynomials generated by Geronimus and Uvarov transformations is analyzed. An attempt is made to discuss the recovery of the source orthogonal polynomial from the quasi-Geronimus and quasi-Uvarov polynomials of order one. Moreover, the discussion on the difference equation satisfied by quasi-Geronimus and quasi-Uvarov polynomials is presented. Furthermo… ▽ More

    Submitted 17 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 26 pages, 6 figures

    MSC Class: 42C05; 33C45; 26C10; 11A55

  16. arXiv:2403.01038  [pdf, other

    cs.CR cs.AI

    AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

    Authors: Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, Zhou Li

    Abstract: Large language models (LLMs) have demonstrated impressive results on natural language tasks, and security researchers are beginning to employ them in both offensive and defensive systems. In cyber-security, there have been multiple research efforts that utilize LLMs focusing on the pre-breach stage of attacks like phishing and malware generation. However, so far there lacks a comprehensive study r… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  17. arXiv:2402.18780  [pdf, other

    cs.CV

    A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D

    Authors: Xiaohan Fei, Chethan Parameshwara, Jiawei Mo, Xiaolong Li, Ashwin Swaminathan, CJ Taylor, Paolo Favaro, Stefano Soatto

    Abstract: The development of generative models that create 3D content from a text prompt has made considerable strides thanks to the use of the score distillation sampling (SDS) method on pre-trained diffusion models for image generation. However, the SDS method is also the source of several artifacts, such as the Janus problem, the misalignment between the text prompt and the generated 3D model, and 3D mod… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  18. arXiv:2312.06853  [pdf, other

    cs.AI

    LLF-Bench: Benchmark for Interactive Learning from Language Feedback

    Authors: Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

    Abstract: We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions. Learning from language feedback (LLF) is essential for people, largely because the rich information this feedback provides can help a learner avoid much of trial and error and the… ▽ More

    Submitted 13 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  19. arXiv:2312.04340  [pdf, other

    math.CA

    Orthogonality of a new family of $q$-Sobolev type polynomials

    Authors: Neha, A. Swaminathan

    Abstract: In this work, we introduce and construct specific $q$-polynomials that are desired from the well-established families of $q$-orthogonal polynomials, namely little $q$-Jacobi polynomials and $q$-Laguerre polynomials, respectively. We examine these newly constructed $q$-polynomials and observe that they possess integral representations of little $q$-Jacobi polynomials and $q$-Laguerre polynomials. T… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 25 pages

    MSC Class: 33D15; 42C05; 33D45;

  20. arXiv:2311.17921  [pdf, other

    cs.CV

    Do text-free diffusion models learn discriminative visual representations?

    Authors: Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Tianyi Zhou, Abhinav Shrivastava

    Abstract: While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We identify diffusion models, a state-of-the-art method for generative tasks, as a prime candidate. Such models involve training a U-Net to iteratively predict and re… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Website: see https://mgwillia.github.io/diffssl/ . Code: see https://github.com/soumik-kanad/diffssl . The first two authors contributed equally. 15 pages, 9 figures, 15 tables. Submission under review. (this article supersedes arXiv:2307.08702)

  21. arXiv:2310.17555  [pdf, other

    cs.RO cs.AI cs.LG

    Interactive Robot Learning from Verbal Correction

    Authors: Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

    Abstract: The ability to learn and refine behavior after deployment has become ever more important for robots as we design them to operate in unstructured environments like households. In this work, we design a new learning system based on large language model (LLM), OLAF, that allows everyday users to teach a robot using verbal corrections when the robot makes mistakes, e.g., by saying "Stop what you're do… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  22. arXiv:2309.14339  [pdf, other

    cs.CV

    Chop & Learn: Recognizing and Generating Object-State Compositions

    Authors: Nirat Saini, Hanyu Wang, Archana Swaminathan, Vinoj Jayasundara, Bo He, Kamal Gupta, Abhinav Shrivastava

    Abstract: Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions. In this paper, we study the task of cutting objects in different styles and the resulting object state changes. We propose a new benchmark suite Chop & Learn, to accommodate the needs of learning objects and different cut styles using multiple viewpoints. We also p… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: To appear at ICCV 2023

  23. arXiv:2308.16014  [pdf, other

    math.CA

    Inequalities involving a measure of Marcellán class and zeros of corresponding orthogonal polynomials

    Authors: Vikash Kumar, A. Swaminathan

    Abstract: Let $\tildeΦ_n$ be a quasi-orthogonal polynomial of order 1 on the unit circle, obtained from an orthogonal polynomial $Φ_n$ with measure $μ$, which is in the Marcellán class, if there exist another measure $\tildeμ$ such that $\tildeΦ_n$ is a monic orthogonal polynomial. This article aims to investigate various properties related to the Marcellán class. At first, we study the behaviour of the zer… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 21 pages

    MSC Class: 42C05; 46E22

  24. arXiv:2308.01937  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Training Data Protection with Compositional Diffusion Models

    Authors: Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data s… ▽ More

    Submitted 13 February, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  25. arXiv:2307.08702  [pdf, other

    cs.CV

    Diffusion Models Beat GANs on Image Classification

    Authors: Soumik Mukhopadhyay, Matthew Gwilliam, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Srinidhi Hegde, Tianyi Zhou, Abhinav Shrivastava

    Abstract: While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 15 pages, 7 figures, 10 tables, submission under review

  26. arXiv:2306.03727  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Towards Visual Foundational Models of Physical Scenes

    Authors: Chethan Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, CJ Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano Soatto

    Abstract: We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represen… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: TLDR: Physical scenes are equivalence classes of sufficient statistics, and can be inferred uniquely by any agent measuring the same finite data; We formalize and implement an approach to representation learning that overturns "naive realism" in favor of an analytical approach of Russell and Koenderink. NeRFs cannot capture the physical scenes, but combined with Diffusion Models they can

  27. A Privacy-Preserving Federated Learning Approach for Kernel methods

    Authors: Anika Hannemann, Ali Burak Ünal, Arjhun Swaminathan, Erik Buchmann, Mete Akgün

    Abstract: It is challenging to implement Kernel methods, if the data sources are distributed and cannot be joined at a trusted third party for privacy reasons. It is even more challenging, if the use case rules out privacy-preserving approaches that introduce noise. An example for such a use case is machine learning on clinical data. To realize exact privacy preserving computation of kernel methods, we prop… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Preprint version of the full paper with supplementary material

    ACM Class: I.2; I.2; K.6.5; E.3

  28. arXiv:2304.13714  [pdf

    cs.AI cs.CL cs.IR

    Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery

    Authors: Debadutta Dash, Rahul Thapa, Juan M. Banda, Akshay Swaminathan, Morgan Cheatham, Mehr Kashyap, Nikesh Kotecha, Jonathan H. Chen, Saurabh Gombar, Lance Downing, Rachel Pedreira, Ethan Goh, Angel Arnaout, Garret Kenn Morris, Honor Magon, Matthew P Lungren, Eric Horvitz, Nigam H. Shah

    Abstract: Despite growing interest in using large language models (LLMs) in healthcare, current explorations do not assess the real-world utility and safety of LLMs in clinical settings. Our objective was to determine whether two LLMs can serve information needs submitted by physicians as questions to an informatics consultation service in a safe and concordant manner. Sixty six questions from an informatic… ▽ More

    Submitted 30 April, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 27 pages including supplemental information

  29. arXiv:2304.13169  [pdf, other

    cs.LG

    SAFE: Machine Unlearning With Shard Graphs

    Authors: Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

    Abstract: We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling… ▽ More

    Submitted 22 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  30. arXiv:2304.11869  [pdf, ps, other

    math.CA

    Generalized co-polynomials of $R_{II}$ type and associated quadrature rules

    Authors: Vinay Shukla, A. Swaminathan

    Abstract: When the co-recursion and co-dilation in the recurrence relation of certain sequences of orthogonal polynomials are not at the same level, the behaviour of the modified orthogonal polynomials is expected to have different properties compared to the situation of the same level of perturbation. This manuscript attempts to derive structural relations between the perturbed and original $R_{II}$ type o… ▽ More

    Submitted 12 May, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 23 pages

    MSC Class: 42C05; 30B70; 15A18

  31. arXiv:2304.01050  [pdf, ps, other

    math.NT math.DS

    Counting integral points on symmetric varieties with applications to arithmetic statistics

    Authors: Arul Shankar, Artane Siad, Ashvin A. Swaminathan

    Abstract: In this article, we combine Bhargava's geometry-of-numbers methods with the dynamical point-counting methods of Eskin--McMullen and Benoist--Oh to develop a new technique for counting integral points on symmetric varieties lying within fundamental domains for coregular representations. As applications, we study the distribution of the $2$-torsion subgroup of the class group in thin families of cub… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: 39 pages, comments welcome!

  32. arXiv:2303.15591  [pdf, other

    cs.CV

    Learning Expressive Prompting With Residuals for Vision Transformers

    Authors: Rajshekhar Das, Yonatan Dukler, Avinash Ravichandran, Ashwin Swaminathan

    Abstract: Prompt learning is an efficient approach to adapt transformers by inserting learnable set of parameters into the input and intermediate representations of a pre-trained model. In this work, we present Expressive Prompts with Residuals (EXPRES) which modifies the prompt learning paradigm specifically for effective adaptation of vision transformers (ViT). Out method constructs downstream representat… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR (2023)

  33. arXiv:2303.04105  [pdf, other

    cs.LG cs.CV

    Your representations are in the network: composable and parallel adaptation for large scale models

    Authors: Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca Zancato, Benjamin Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieve… ▽ More

    Submitted 31 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  34. arXiv:2303.01598  [pdf, other

    cs.CV cs.LG

    A Meta-Learning Approach to Predicting Performance and Data Requirements

    Authors: Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  35. arXiv:2211.10704  [pdf, ps, other

    math.SP math.CA

    Recovering orthogonality from Quasi-type Kernel Polynomials using specific spectral transformations

    Authors: Vikash Kumar, A. Swaminathan

    Abstract: In this work, the concept of quasi-type Kernel polynomials with respect to a moment functional is introduced. Difference equation satisfied by these polynomials along with the criterion for orthogonality conditions are discussed. The process of recovering orthogonality for the linear combination of a quasi-type kernel polynomial with another orthogonal polynomial, which is identified by involving… ▽ More

    Submitted 29 January, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

    Comments: 25 PAGES

    MSC Class: 33C45; 33C05; 42C05

  36. arXiv:2211.08729  [pdf, ps, other

    math.NT

    The mean number of $2$-torsion elements in the class groups of cubic orders

    Authors: Ashvin Swaminathan

    Abstract: We determine the mean number of 2-torsion elements in class groups of cubic orders, when such orders are enumerated by discriminant. Specifically, we prove that when isomorphism classes of totally real (resp., complex) cubic orders are enumerated by discriminant, the average $2$-torsion in the class group is $1 + \frac{1}{4} \times \frac{ζ(2)}{ζ(4)}$ (resp.,… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 22 pages

    MSC Class: 11R29; 11R45; 11E76; 11R16

  37. arXiv:2209.03506  [pdf, ps, other

    math.CA

    Spectral properties related to generalized complementary Romanovski-Routh polynomials

    Authors: Vinay Shukla, A. Swaminathan

    Abstract: Complementary Romanovski-Routh polynomials play an important role in extracting specific properties of orthogonal polynomials. In this work, a generalized form of the Complementary Romanovski-Routh polynomials (GCRR) that has the Gaussian hypergeometric representation and satisfies a particular type of recurrence called $R_{II}$ type three term recurrence relation involving two arbitrary parameter… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 20 pages, 3 figures

    MSC Class: 42C05; 26C10; 15A18; 33C45

  38. arXiv:2207.06272  [pdf, other

    cs.LG stat.ML

    Hindsight Learning for MDPs with Exogenous Inputs

    Authors: Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

    Abstract: Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algo… ▽ More

    Submitted 23 October, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: 52 pages, 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6

  39. Chain sequences and Zeros of a perturbed $R_{II}$ type recurrence relation

    Authors: Vinay Shukla, A. Swaminathan

    Abstract: In this manuscript, new algebraic and analytic aspects of the orthogonal polynomials satisfying $R_{II}$ type recurrence relation given by \begin{align*} \mathcal{P}_{n+1}(x) = (x-c_n)\mathcal{P}_n(x)-λ_n (x-a_n)(x-b_n)\mathcal{P}_{n-1}(x), \quad n \geq 0, \end{align*} where $λ_n$ is a positive chain sequence and $a_n$, $b_n$, $c_n$ are sequences of real or complex numbers with… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

    Comments: 23 pages. arXiv admin note: text overlap with arXiv:2201.05422

    MSC Class: 42C05; 30C15; 15A24

    Journal ref: Journal of Computational and Applied Mathematics 422 (2023) 114916

  40. Spectral transformation associated with a perturbed $R_I$ type recurrence relation

    Authors: Vinay Shukla, A. Swaminathan

    Abstract: In this work, orthogonal polynomials satisfying $R_I$ type recurrence relation %$\mathcal{P}_{n+1}(z) = (z-c_n)\mathcal{P}_n(z)-λ_n (z-a_n)\mathcal{P}_{n-1}(z),$ with $\mathcal{P}_{-1}(z) = 0$ and $\mathcal{P}_0(z) = 1$ are analyzed when the recurrence coefficients are modified. The structural relationship between the perturbed and the unperturbed polynomials along with the spectral properties and… ▽ More

    Submitted 19 May, 2024; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: 23 pages, 6 figures

    MSC Class: 42C05; 30B70; 30C15

  41. arXiv:2110.09466  [pdf, ps, other

    math.NT math.RT

    Geometry-of-numbers methods in the cusp

    Authors: Arul Shankar, Artane Siad, Ashvin Swaminathan, Ila Varma

    Abstract: In this article, we develop new methods for counting integral orbits having bounded invariants that lie inside the cusps of fundamental domains for coregular representations. We illustrate these methods for a representation of cardinal interest in number theory, namely that of the split orthogonal group acting on the space of quadratic forms.

    Submitted 22 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: 29 pages

    MSC Class: 11R29; 11R45; 11H55; 11P21; 11E76

  42. arXiv:2110.09063  [pdf, ps, other

    math.NT

    The second moment of the size of the $2$-Selmer group of elliptic curves

    Authors: Manjul Bhargava, Arul Shankar, Ashvin Swaminathan

    Abstract: In this paper, we prove that when elliptic curves over $\mathbb{Q}$ are ordered by height, the second moment of the size of the $2$-Selmer group is at most $15$. This confirms a conjecture of Poonen and Rains.

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 49 pages

    MSC Class: 11G05; 11R45; 11E76

  43. Hermite equivalence of polynomials

    Authors: Manjul Bhargava, Jan-Hendrik Evertse, Kálmán Győry, László Remete, Ashvin A. Swaminathan

    Abstract: In this paper, we resurrect a long-forgotten notion of equivalence for univariate polynomials with integral coefficients introduced by Hermite in the 1850s. We show that the Hermite equivalence class of a polynomial has a very natural interpretation in terms of the invariant ring and invariant ideal associated with the polynomial. We apply this interpretation to shed light on the relationship betw… ▽ More

    Submitted 14 September, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: Compared with the previous version we have inserted some changes and corrections suggested by the anonymous referee. This is the final version. It will appear in a special volume of Acta Arithmetica to the memory of Professor Andrzej Schinzel

    MSC Class: 11C08

    Journal ref: Acta Arithmetica, on-line first, 2023

  44. arXiv:2108.09817  [pdf, other

    eess.SP cs.AI cs.LG cs.NE

    Electroencephalogram Signal Processing with Independent Component Analysis and Cognitive Stress Classification using Convolutional Neural Networks

    Authors: Venkatakrishnan Sutharsan, Alagappan Swaminathan, Saisrinivasan Ramachandran, Madan Kumar Lakshmanan, Balaji Mahadevan

    Abstract: Electroencephalogram (EEG) is the recording which is the result due to the activity of bio-electrical signals that is acquired from electrodes placed on the scalp. In Electroencephalogram signal(EEG) recordings, the signals obtained are contaminated predominantly by the Electrooculogram(EOG) signal. Since this artifact has higher magnitude compared to EEG signals, these noise signals have to be re… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

    Comments: 16 pages, 10 figures, 2 tables, 8 equations, 16 references

    Journal ref: Lecture Notes in Networks and Systems, vol 341, 01 January 2022

  45. arXiv:2108.09797  [pdf, other

    cs.LG cs.AI

    Wind Power Projection using Weather Forecasts by Novel Deep Neural Networks

    Authors: Alagappan Swaminathan, Venkatakrishnan Sutharsan, Tamilselvi Selvaraj

    Abstract: The transition from conventional methods of energy production to renewable energy production necessitates better prediction models of the upcoming supply of renewable energy. In wind power production, error in forecasting production is impossible to negate owing to the intermittence of wind. For successful power grid integration, it is crucial to understand the uncertainties that arise in predicti… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

    Comments: 27 pages, 12 figures, 12 tables, 7 equations, 22 references

  46. arXiv:2106.02757  [pdf, other

    cs.LG cs.AI

    Heuristic-Guided Reinforcement Learning

    Authors: Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

    Abstract: We provide a framework for accelerating reinforcement learning (RL) algorithms by heuristics constructed from domain knowledge or offline data. Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the sequential decision-making task. Using our framework, we show how heuristic-guided RL induces a much shorter-horizon subproblem that provably solv… ▽ More

    Submitted 22 November, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

  47. arXiv:2106.00589  [pdf, other

    cs.LG

    Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning

    Authors: Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan

    Abstract: We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Targeting immediately measurable proxies… ▽ More

    Submitted 14 September, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  48. Sufficiency for Nephroid Starlikeness using Hypergeometric Functions

    Authors: A. Swaminathan, Lateef Ahmad Wani

    Abstract: Let $\mathcal{A}$ consists of analytic functions $f:\mathbb{D}\to\mathbb{C}$ satisfying $f(0)=f'(0)-1=0$. Let $\mathcal{S}^*_{Ne}$ be the recently introduced Ma-Minda type functions family associated with the $2$-cusped kidney-shaped {\it nephroid} curve $\left((u-1)^2+v^2-\frac{4}{9}\right)^3-\frac{4 v^2}{3}=0$ given by \begin{align*} \mathcal{S}^*_{Ne}:= \left\{f\in\mathcal{A}:\frac{zf'(z)}{… ▽ More

    Submitted 17 April, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

    Comments: 14 pages, 2 tables, 7 figures

    MSC Class: 30C45

  49. arXiv:2011.13578  [pdf, ps, other

    math.NT

    Average $2$-Torsion in Class Groups of Rings Associated to Binary $n$-ic Forms

    Authors: Ashvin Swaminathan

    Abstract: Let $n \geq 3$ be an integer. In this paper, we study the average behavior of the $2$-torsion in class groups of number fields cut out by integral binary $n$-ic forms having any fixed odd leading coefficient. Specifically, we compute upper bounds on the average size of the $2$-torsion in the class groups of such number fields. Conditional on a uniformity estimate, we further prove that each of the… ▽ More

    Submitted 19 October, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: 39 pages

    MSC Class: 11R29; 11R65 (primary); and 11R45; 11E76 (secondary)

  50. arXiv:2007.08202  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Good Batch Reinforcement Learning Without Great Exploration

    Authors: Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

    Abstract: Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies w… ▽ More

    Submitted 22 July, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 36 pages, 7 figures