-
Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation
Authors:
Tao Chen,
Eric Cousineau,
Naveen Kuppuswamy,
Pulkit Agrawal
Abstract:
Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various cons…
▽ More
Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various constraints on the reorientation controller, such as the requirement for the hand to securely hold the object after reorientation for peeling. We propose a simple system for learning a reorientation controller that facilitates the subsequent peeling task. Videos are available at: https://taochenshh.github.io/projects/veg-peeling.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
ROER: Regularized Optimal Experience Replay
Authors:
Changling Li,
Zhang-Wei Hong,
Pulkit Agrawal,
Divyansh Garg,
Joni Pajarinen
Abstract:
Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the conne…
▽ More
Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the connections between the experience prioritization and occupancy optimization. By using a regularized RL objective with $f-$divergence regularizer and employing its dual form, we show that an optimal solution to the objective is obtained by shifting the distribution of off-policy data in the replay buffer towards the on-policy optimal distribution using TD-error-based occupancy ratios. Our derivation results in a new pipeline of TD error prioritization. We specifically explore the KL divergence as the regularizer and obtain a new form of prioritization scheme, the regularized optimal experience replay (ROER). We evaluate the proposed prioritization scheme with the Soft Actor-Critic (SAC) algorithm in continuous control MuJoCo and DM Control benchmark tasks where our proposed scheme outperforms baselines in 6 out of 11 tasks while the results of the rest match with or do not deviate far from the baselines. Further, using pretraining, ROER achieves noticeable improvement on difficult Antmaze environment where baselines fail, showing applicability to offline-to-online fine-tuning. Code is available at \url{https://github.com/XavierChanglingLi/Regularized-Optimal-Experience-Replay}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Focusing Optical Phased Array for Optically Enabled Probing of the Retina with Subcellular Resolution
Authors:
Pedram Hosseini,
Prachi Agrawal,
Alireza Tabatabaei Mashayekh,
Sandra Johnen,
Jeremy Witzens,
Florian Merget
Abstract:
We present a silicon-nitride-based optical phased array with built-in focusing and steering capability, that operates at 522 nm and is aimed at complementing a micro-electrode array for joint electrical and optical probing of retinal tissue. It achieves subcellular resolution with a beam diameter of 1.4 um at a focal point located above the chip. Targeted cellular excitation can be achieved by ste…
▽ More
We present a silicon-nitride-based optical phased array with built-in focusing and steering capability, that operates at 522 nm and is aimed at complementing a micro-electrode array for joint electrical and optical probing of retinal tissue. It achieves subcellular resolution with a beam diameter of 1.4 um at a focal point located above the chip. Targeted cellular excitation can be achieved by steering the beam through a combination of wavelength tuning and simplified thermo-optical phase shifters with a single electrical input for each of transverse beam steering and selection of the focal plane.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient
Authors:
Zechu Li,
Rickmer Krohn,
Tao Chen,
Anurag Ajay,
Pulkit Agrawal,
Georgia Chalvatzaki
Abstract:
Deep reinforcement learning (RL) algorithms typically parameterize the policy as a deep network that outputs either a deterministic action or a stochastic one modeled as a Gaussian distribution, hence restricting learning to a single behavioral mode. Meanwhile, diffusion models emerged as a powerful framework for multimodal learning. However, the use of diffusion policies in online RL is hindered…
▽ More
Deep reinforcement learning (RL) algorithms typically parameterize the policy as a deep network that outputs either a deterministic action or a stochastic one modeled as a Gaussian distribution, hence restricting learning to a single behavioral mode. Meanwhile, diffusion models emerged as a powerful framework for multimodal learning. However, the use of diffusion policies in online RL is hindered by the intractability of policy likelihood approximation, as well as the greedy objective of RL methods that can easily skew the policy to a single mode. This paper presents Deep Diffusion Policy Gradient (DDiffPG), a novel actor-critic algorithm that learns from scratch multimodal policies parameterized as diffusion models while discovering and maintaining versatile behaviors. DDiffPG explores and discovers multiple modes through off-the-shelf unsupervised clustering combined with novelty-based intrinsic motivation. DDiffPG forms a multimodal training batch and utilizes mode-specific Q-learning to mitigate the inherent greediness of the RL objective, ensuring the improvement of the diffusion policy across all modes. Our approach further allows the policy to be conditioned on mode-specific embeddings to explicitly control the learned modes. Empirical studies validate DDiffPG's capability to master multimodal behaviors in complex, high-dimensional continuous control tasks with sparse rewards, also showcasing proof-of-concept dynamic online replanning when navigating mazes with unseen obstacles.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Super Tiny Language Models
Authors:
Dylan Hillier,
Leon Guertler,
Cheston Tan,
Palaash Agrawal,
Chen Ruirui,
Bobby Cheng
Abstract:
The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovativ…
▽ More
The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies. These methods aim to significantly reduce reduce the parameter count compared to traditional models -- in future works, we aim to build on these in a way that maintains and improves upon the performance of base transformer models. This series of papers will explore into various subproblems, including tokenizer-free models, self-play based training, and alternative training objectives. We will target models with 10M, 50M, and 100M parameters. Our ultimate goal is to make high-performance language models more accessible and practical for a wide range of applications.
△ Less
Submitted 26 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Value Augmented Sampling for Language Model Alignment and Personalization
Authors:
Seungwook Han,
Idan Shenfeld,
Akash Srivastava,
Yoon Kim,
Pulkit Agrawal
Abstract:
Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally eff…
▽ More
Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally efficient, but performs worse due to the optimization challenges in co-training the value function and the policy. We present a new framework for reward optimization, Value Augmented Sampling (VAS), that can maximize different reward functions using data sampled from only the initial, frozen LLM. VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function, making the optimization stable, outperforming established baselines, such as PPO and DPO, on standard benchmarks, and achieving comparable results to Best-of-128 with lower inference cost. Unlike existing RL methods that require changing the weights of the LLM, VAS does not require access to the weights of the pre-trained LLM. Thus, it can even adapt LLMs (e.g., ChatGPT), which are available only as APIs. In addition, our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time, paving the road ahead for the future of aligned, personalized LLMs.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Authors:
Chaitanya Malaviya,
Priyanka Agrawal,
Kuzman Ganchev,
Pranesh Srinivasan,
Fantine Huot,
Jonathan Berant,
Mark Yatskar,
Dipanjan Das,
Mirella Lapata,
Chris Alberti
Abstract:
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form o…
▽ More
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge.
△ Less
Submitted 28 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Learning Force Control for Legged Manipulation
Authors:
Tifanny Portela,
Gabriel B. Margolis,
Yandong Ji,
Pulkit Agrawal
Abstract:
Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing.…
▽ More
Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.
△ Less
Submitted 20 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Rank2Reward: Learning Shaped Reward Functions from Passive Video
Authors:
Daniel Yang,
Davin Tjia,
Jacob Berg,
Dima Damen,
Pulkit Agrawal,
Abhishek Gupta
Abstract:
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data…
▽ More
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both "what" to do and "how" to do it. A powerful way to encode both the "what" and the "how" is to infer a well-shaped reward function for reinforcement learning. The challenge is determining how to ground visual demonstration inputs into a well-shaped and informative reward function. We propose a technique Rank2Reward for learning behaviors from videos of tasks being performed without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental "progress" through a task by learning how to temporally rank the video frames in a demonstration. By inferring an appropriate ranking, the reward function is able to guide reinforcement learning by indicating when task progress is being made. This ranking function can be integrated into an adversarial imitation learning scheme resulting in an algorithm that can learn behaviors without exploiting the learned reward function. We demonstrate the effectiveness of Rank2Reward at learning behaviors from raw video on a number of tabletop manipulation tasks in both simulations and on a real-world robotic arm. We also demonstrate how Rank2Reward can be easily extended to be applicable to web-scale video datasets.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels
Authors:
Yukti Makhija,
Priyanka Agrawal,
Rishi Saket,
Aravindan Raghuveer
Abstract:
Large language models (LLMs) are being increasingly tuned to power complex generation tasks such as writing, fact-seeking, querying and reasoning. Traditionally, human or model feedback for evaluating and further tuning LLM performance has been provided at the response level, enabling faster and more cost-effective assessments. However, recent works (Amplayo et al. [2022], Wu et al. [2023]) indica…
▽ More
Large language models (LLMs) are being increasingly tuned to power complex generation tasks such as writing, fact-seeking, querying and reasoning. Traditionally, human or model feedback for evaluating and further tuning LLM performance has been provided at the response level, enabling faster and more cost-effective assessments. However, recent works (Amplayo et al. [2022], Wu et al. [2023]) indicate that sentence-level labels may provide more accurate and interpretable feedback for LLM optimization. In this work, we introduce methods to disaggregate response-level labels into sentence-level (pseudo-)labels. Our approach leverages multiple instance learning (MIL) and learning from label proportions (LLP) techniques in conjunction with prior information (e.g., document-sentence cosine similarity) to train a specialized model for sentence-level scoring. We also employ techniques which use model predictions to pseudo-label the train-set at the sentence-level for model training to further improve performance.
We conduct extensive evaluations of our methods across six datasets and four tasks: retrieval, question answering, summarization, and math reasoning. Our results demonstrate improved performance compared to multiple baselines across most of these tasks. Our work is the first to develop response-level feedback to sentence-level scoring techniques, leveraging sentence-level prior information, along with comprehensive evaluations on multiple tasks as well as end-to-end finetuning evaluation showing performance comparable to a model trained on fine-grained human annotated labels.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Authors:
Lars Ankile,
Anthony Simeonov,
Idan Shenfeld,
Pulkit Agrawal
Abstract:
While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisel…
▽ More
While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines. Project website: https://imitation-juicer.github.io/.
△ Less
Submitted 9 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation
Authors:
Marcel Torne,
Anthony Simeonov,
Zechu Li,
April Chan,
Tao Chen,
Abhishek Gupta,
Pulkit Agrawal
Abstract:
Imitation learning methods need significant human supervision to learn policies robust to changes in object poses, physical disturbances, and visual distractors. Reinforcement learning, on the other hand, can explore the environment autonomously to learn robust behaviors but may require impractical amounts of unsafe real-world data collection. To learn performant, robust policies without the burde…
▽ More
Imitation learning methods need significant human supervision to learn policies robust to changes in object poses, physical disturbances, and visual distractors. Reinforcement learning, on the other hand, can explore the environment autonomously to learn robust behaviors but may require impractical amounts of unsafe real-world data collection. To learn performant, robust policies without the burden of unsafe real-world data collection or extensive human supervision, we propose RialTo, a system for robustifying real-world imitation learning policies via reinforcement learning in "digital twin" simulation environments constructed on the fly from small amounts of real-world data. To enable this real-to-sim-to-real pipeline, RialTo proposes an easy-to-use interface for quickly scanning and constructing digital twins of real-world environments. We also introduce a novel "inverse distillation" procedure for bringing real-world demonstrations into simulated environments for efficient fine-tuning, with minimal human intervention and engineering required. We evaluate RialTo across a variety of robotic manipulation problems in the real world, such as robustly stacking dishes on a rack, placing books on a shelf, and six other tasks. RialTo increases (over 67%) in policy robustness without requiring extensive human data collection. Project website and videos at https://real-to-sim-to-real.github.io/RialTo/
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Curiosity-driven Red-teaming for Large Language Models
Authors:
Zhang-Wei Hong,
Idan Shenfeld,
Tsun-Hsuan Wang,
Yung-Sung Chuang,
Aldo Pareja,
James Glass,
Akash Srivastava,
Pulkit Agrawal
Abstract:
Large language models (LLMs) hold great potential for many natural language applications but risk generating incorrect or toxic content. To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i.e., test cases) that elicit undesirable responses from LLMs. However, relying solely on human testers is expensive…
▽ More
Large language models (LLMs) hold great potential for many natural language applications but risk generating incorrect or toxic content. To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i.e., test cases) that elicit undesirable responses from LLMs. However, relying solely on human testers is expensive and time-consuming. Recent works automate red teaming by training a separate red team LLM with reinforcement learning (RL) to generate test cases that maximize the chance of eliciting undesirable responses from the target LLM. However, current RL methods are only able to generate a small number of effective test cases resulting in a low coverage of the span of prompts that elicit undesirable responses from the target LLM. To overcome this limitation, we draw a connection between the problem of increasing the coverage of generated test cases and the well-studied approach of curiosity-driven exploration that optimizes for novelty. Our method of curiosity-driven red teaming (CRT) achieves greater coverage of test cases while mantaining or increasing their effectiveness compared to existing methods. Our method, CRT successfully provokes toxic responses from LLaMA2 model that has been heavily fine-tuned using human preferences to avoid toxic outputs. Code is available at \url{https://github.com/Improbable-AI/curiosity_redteam}
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Authors:
Minyoung Huh,
Brian Cheung,
Jeremy Bernstein,
Phillip Isola,
Pulkit Agrawal
Abstract:
The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoR…
▽ More
The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training remains largely unexplored. This paper explores extending LoRA to model pre-training, identifying the inherent constraints and limitations of standard LoRA in this context. We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm designed to enable parallel training of multiple low-rank heads across computing nodes, thereby reducing the need for frequent synchronization. Our approach includes extensive experimentation on vision transformers using various vision datasets, demonstrating that LTE is competitive with standard pre-training.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Publicly auditable privacy-preserving electoral rolls
Authors:
Prashant Agrawal,
Mahabir Prasad Jhanwar,
Subodh Vishnu Sharma,
Subhashis Banerjee
Abstract:
While existing literature on electronic voting has extensively addressed verifiability of voting protocols, the vulnerability of electoral rolls in large public elections remains a critical concern. To ensure integrity of electoral rolls, the current practice is to either make electoral rolls public or share them with the political parties. However, this enables construction of detailed voter prof…
▽ More
While existing literature on electronic voting has extensively addressed verifiability of voting protocols, the vulnerability of electoral rolls in large public elections remains a critical concern. To ensure integrity of electoral rolls, the current practice is to either make electoral rolls public or share them with the political parties. However, this enables construction of detailed voter profiles and selective targeting and manipulation of voters, thereby undermining the fundamental principle of free and fair elections. In this paper, we study the problem of designing publicly auditable yet privacy-preserving electoral rolls. We first formulate a threat model and provide formal security definitions. We then present a protocol for creation, maintenance and usage of electoral rolls that mitigates the threats. Eligible voters can verify their inclusion, whereas political parties and auditors can statistically audit the electoral roll. Further, the audit can also detect polling-day ballot stuffing and denials to eligible voters by malicious polling officers. The entire electoral roll is never revealed, which prevents any large-scale systematic voter targeting and manipulation.
△ Less
Submitted 2 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Can LLMs perform structured graph reasoning?
Authors:
Palaash Agrawal,
Shavak Vasania,
Cheston Tan
Abstract:
Pretrained Large Language Models (LLMs) have demonstrated various reasoning capabilities through language-based prompts alone, particularly in unstructured task settings (tasks purely based on language semantics). However, LLMs often struggle with structured tasks, because of the inherent incompatibility of input representation. Reducing structured tasks to uni-dimensional language semantics often…
▽ More
Pretrained Large Language Models (LLMs) have demonstrated various reasoning capabilities through language-based prompts alone, particularly in unstructured task settings (tasks purely based on language semantics). However, LLMs often struggle with structured tasks, because of the inherent incompatibility of input representation. Reducing structured tasks to uni-dimensional language semantics often renders the problem trivial. Keeping the trade-off between LLM compatibility and structure complexity in mind, we design various graph reasoning tasks as a proxy to semi-structured tasks in this paper, in order to test the ability to navigate through representations beyond plain text in various LLMs. Particularly, we design 10 distinct problems of graph traversal, each representing increasing levels of complexity, and benchmark 5 different instruct-finetuned LLMs (GPT-4, GPT-3.5, Claude-2, Llama-2 and Palm-2) on the aforementioned tasks. Further, we analyse the performance of models across various settings such as varying sizes of graphs as well as different forms of k-shot prompting. We highlight various limitations, biases and properties of LLMs through this benchmarking process, such as an inverse relation to the average degrees of freedom of traversal per node in graphs, the overall negative impact of k-shot prompting on graph reasoning tasks, and a positive response bias which prevents LLMs from identifying the absence of a valid solution. Finally, we introduce a new prompting technique specially designed for graph traversal tasks (PathCompare), which demonstrates a notable increase in the performance of LLMs in comparison to standard prompting techniques such as Chain-of-Thought (CoT).
△ Less
Submitted 18 April, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
Authors:
Aditya Patil,
Vikas Joshi,
Purvi Agrawal,
Rupesh Mehta
Abstract:
Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support…
▽ More
Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Authors:
Prabhav Agrawal,
Thilo Koehler,
Zhiping Xiu,
Prashant Serai,
Qing He
Abstract:
Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o…
▽ More
Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract. In this paper, we propose an ultra-lightweight differential DSP (DDSP) vocoder that uses a jointly optimized acoustic model with a DSP vocoder, and learns without an extracted spectral feature for the vocal tract. The model achieves audio quality comparable to neural vocoders with a high average MOS of 4.36 while being efficient as a DSP vocoder. Our C++ implementation, without any hardware-specific optimization, is at 15 MFLOPS, surpasses MB-MelGAN by 340 times in terms of FLOPS, and achieves a vocoder-only RTF of 0.003 and overall RTF of 0.044 while running single-threaded on a 2GHz Intel Xeon CPU.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
MultiSiam: A Multiple Input Siamese Network For Social Media Text Classification And Duplicate Text Detection
Authors:
Sudhanshu Bhoi,
Swapnil Markhedkar,
Shruti Phadke,
Prashant Agrawal
Abstract:
Social media accounts post increasingly similar content, creating a chaotic experience across platforms, which makes accessing desired information difficult. These posts can be organized by categorizing and grouping duplicates across social handles and accounts. There can be more than one duplicate of a post, however, a conventional Siamese neural network only considers a pair of inputs for duplic…
▽ More
Social media accounts post increasingly similar content, creating a chaotic experience across platforms, which makes accessing desired information difficult. These posts can be organized by categorizing and grouping duplicates across social handles and accounts. There can be more than one duplicate of a post, however, a conventional Siamese neural network only considers a pair of inputs for duplicate text detection. In this paper, we first propose a multiple-input Siamese network, MultiSiam. This condensed network is then used to propose another model, SMCD (Social Media Classification and Duplication Model) to perform both duplicate text grouping and categorization. The MultiSiam network, just like the Siamese, can be used in multiple applications by changing the sub-network appropriately.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Electroweak Phase Transition with a Double Well Done Doubly Well
Authors:
Prateek Agrawal,
Simone Blasi,
Alberto Mariotti,
Michael Nee
Abstract:
We revisit the electroweak phase transition in the scalar singlet extension of the standard model with a $\mathbb{Z}_2$ symmetry. In significant parts of the parameter space the phase transition occurs in two steps - including canonical benchmarks used in experimental projections for gravitational waves. Domain walls produced in the first step of the transition seed the final step to the electrowe…
▽ More
We revisit the electroweak phase transition in the scalar singlet extension of the standard model with a $\mathbb{Z}_2$ symmetry. In significant parts of the parameter space the phase transition occurs in two steps - including canonical benchmarks used in experimental projections for gravitational waves. Domain walls produced in the first step of the transition seed the final step to the electroweak vacuum, an effect which is typically neglected but leads to an exponentially enhanced tunnelling rate. We improve previous results obtained for the seeded transition, which made use of the thin-wall or high temperature approximations, by using the mountain pass algorithm that was recently proposed as a useful tool for seeded processes. We then determine the predictions of the seeded transition for the latent heat, bubble size and characteristic time scale of the transition. Differences compared to homogeneous transitions are most pronounced when there are relatively few domain walls per hubble patch, potentially leading to an enhanced gravitational wave signal. We also provide a derivation of the percolation criteria for a generic seeded transition, which applies to the domain wall seeds we consider as well as to strings and monopoles.
△ Less
Submitted 29 February, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Authors:
Alexandra Chronopoulou,
Jonas Pfeiffer,
Joshua Maynez,
Xinyi Wang,
Sebastian Ruder,
Priyanka Agrawal
Abstract:
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task spec…
▽ More
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters. Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data. We extend our approach to cases where labeled data from more languages is available and propose to arithmetically compose PEFT modules trained on languages related to the target. Empirical results on summarization demonstrate that our method is an effective strategy that obtains consistent gains using minimal training of PEFT modules.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers
Authors:
Sohini Roychowdhury,
Andres Alvarez,
Brian Moore,
Marko Krema,
Maria Paz Gelpi,
Federico Martin Rodriguez,
Angel Rodriguez,
Jose Ramon Cabrejas,
Pablo Martinez Serrano,
Punit Agrawal,
Arijit Mukherjee
Abstract:
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framewor…
▽ More
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering. First, the user-queries are classified by intention followed by automated retrieval of the most relevant data chunks to generate customized LLM prompts per query. Next, the custom prompts and their responses undergo multi-metric scoring to assess for hallucinations and response confidence. The proposed system is optimized with user-query intention classification, advanced prompting, data scaling capabilities and it achieves over 90% confidence scores for a variety of user-queries responses ranging from {What, Where, Why, How, predict, trend, anomalies, exceptions} that are crucial for financial decision making applications. The proposed data to answers framework can be extended to other analytical domains such as sales and payroll to ensure optimal hallucination control guardrails.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Learning to See Physical Properties with Active Sensing Motor Policies
Authors:
Gabriel B. Margolis,
Xiang Fu,
Yandong Ji,
Pulkit Agrawal
Abstract:
Knowledge of terrain's physical properties inferred from color images can aid in making efficient robotic locomotion plans. However, unlike image classification, it is unintuitive for humans to label image patches with physical properties. Without labeled data, building a vision system that takes as input the observed terrain and predicts physical properties remains challenging. We present a metho…
▽ More
Knowledge of terrain's physical properties inferred from color images can aid in making efficient robotic locomotion plans. However, unlike image classification, it is unintuitive for humans to label image patches with physical properties. Without labeled data, building a vision system that takes as input the observed terrain and predicts physical properties remains challenging. We present a method that overcomes this challenge by self-supervised labeling of images captured by robots during real-world traversal with physical property estimators trained in simulation. To ensure accurate labeling, we introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters. For instance, the quadruped robot learns to swipe its foot against the ground to estimate the friction coefficient accurately. We show that the visual system trained with a small amount of real-world traversal data accurately predicts physical parameters. The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback
Authors:
Max Balsells,
Marcel Torne,
Zihan Wang,
Samedh Desai,
Pulkit Agrawal,
Abhishek Gupta
Abstract:
Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the d…
▽ More
Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website https://guided-exploration-autonomous-rl.github.io/GEAR/.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Human-Guided Complexity-Controlled Abstractions
Authors:
Andi Peng,
Mycal Tucker,
Eoin Kenny,
Noga Zaslavsky,
Pulkit Agrawal,
Julie Shah
Abstract:
Neural networks often learn task-specific latent representations that fail to generalize to novel settings or tasks. Conversely, humans learn discrete representations (i.e., concepts or words) at a variety of abstraction levels (e.g., "bird" vs. "sparrow") and deploy the appropriate abstraction based on task. Inspired by this, we train neural models to generate a spectrum of discrete representatio…
▽ More
Neural networks often learn task-specific latent representations that fail to generalize to novel settings or tasks. Conversely, humans learn discrete representations (i.e., concepts or words) at a variety of abstraction levels (e.g., "bird" vs. "sparrow") and deploy the appropriate abstraction based on task. Inspired by this, we train neural models to generate a spectrum of discrete representations, and control the complexity of the representations (roughly, how many bits are allocated for encoding inputs) by tuning the entropy of the distribution over representations. In finetuning experiments, using only a small number of labeled examples for a new task, we show that (1) tuning the representation to a task-appropriate complexity level supports the highest finetuning performance, and (2) in a human-participant study, users were able to identify the appropriate complexity level for a downstream task using visualizations of discrete representations. Our results indicate a promising direction for rapid model finetuning by leveraging human insight.
△ Less
Submitted 27 October, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity
Authors:
Jaedong Hwang,
Zhang-Wei Hong,
Eric Chen,
Akhilan Boopathy,
Pulkit Agrawal,
Ila Fiete
Abstract:
Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewa…
▽ More
Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewards can help agents solve hard exploration tasks, they can suffer from catastrophic forgetting and actually increase at visited states. We first examine the conditions and causes of catastrophic forgetting in grid world environments. We then propose a new method FARCuriosity, inspired by how humans and animals learn. The method depends on fragmentation and recall: an agent fragments an environment based on surprisal, and uses different local curiosity modules (prediction-based intrinsic reward functions) for each fragment so that modules are not trained on the entire environment. At each fragmentation event, the agent stores the current module in long-term memory (LTM) and either initializes a new module or recalls a previously stored module based on its match with the current state. With fragmentation and recall, FARCuriosity achieves less forgetting and better overall performance in games with varied and heterogeneous environments in the Atari benchmark suite of tasks. Thus, this work highlights the problem of catastrophic forgetting in prediction-based curiosity methods and proposes a solution.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Beam propagation in an active nonlinear graded-index fiber
Authors:
Anuj P. Lara,
Samudra Roy,
Govind P. Agrawal
Abstract:
A theoretical model is developed by exploiting the variational technique to investigate the evolution of an optical beam inside an optically pumped graded-index fiber amplifier. The variational analysis is a semi-analytical method that provides us with a set of coupled ordinary differential equations for the beam's four parameters. Numerical solution of these equations is much faster compared to t…
▽ More
A theoretical model is developed by exploiting the variational technique to investigate the evolution of an optical beam inside an optically pumped graded-index fiber amplifier. The variational analysis is a semi-analytical method that provides us with a set of coupled ordinary differential equations for the beam's four parameters. Numerical solution of these equations is much faster compared to the underlying multidimensional nonlinear wave equation. We compare the results of the variational and full numerical simulations for the two pumping schemes used commonly for high-power fiber amplifiers. In the clad-pumping scheme, the use of a relatively wide pump beam results in a nearly uniform gain all along the fiber. In the case of edge pumping, a narrower pump beam provides gain that varies both radially and axially along the fiber's length. In both cases, the variational results are found to be in good agreement with time-consuming full numerical simulations. We also derive a single equation for the beam's width that can predict amplification-induced narrowing of the signal beam in most cases of practical interest.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science
Authors:
Palaash Agrawal,
Cheston Tan,
Heena Rathore
Abstract:
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provid…
▽ More
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provide a fresh perspective to building fundamental blocks in AI research, which can lead to improved performance and efficiency. In this review paper, we focus on the cognitive functions of perception, which is the process of taking signals from one's surroundings as input, and processing them to understand the environment. Particularly, we study and compare its various processes through the lens of both cognitive sciences and AI. Through this study, we review all current major theories from various sub-disciplines of cognitive science (specifically neuroscience, psychology and linguistics), and draw parallels with theories and techniques from current practices in AI. We, hence, present a detailed collection of methods in AI for researchers to build AI systems inspired by cognitive science. Further, through the process of reviewing the state of cognitive-inspired AI, we point out many gaps in the current state of AI (with respect to the performance of the human brain), and hence present potential directions for researchers to develop better perception systems in AI.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
Authors:
Zhang-Wei Hong,
Aviral Kumar,
Sathwik Karnik,
Abhishek Bhandwaldar,
Akash Srivastava,
Joni Pajarinen,
Romain Laroche,
Abhishek Gupta,
Pulkit Agrawal
Abstract:
Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirica…
▽ More
Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.
△ Less
Submitted 11 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
On Memorization and Privacy Risks of Sharpness Aware Minimization
Authors:
Young In Kim,
Pratiksha Agrawal,
Johannes O. Royset,
Rajiv Khanna
Abstract:
In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify wh…
▽ More
In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.
△ Less
Submitted 3 January, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
InFER: A Multi-Ethnic Indian Facial Expression Recognition Dataset
Authors:
Syed Sameen Ahmad Rizvi,
Preyansh Agrawal,
Jagat Sesh Challa,
Pratik Narang
Abstract:
The rapid advancement in deep learning over the past decade has transformed Facial Expression Recognition (FER) systems, as newer methods have been proposed that outperform the existing traditional handcrafted techniques. However, such a supervised learning approach requires a sufficiently large training dataset covering all the possible scenarios. And since most people exhibit facial expressions…
▽ More
The rapid advancement in deep learning over the past decade has transformed Facial Expression Recognition (FER) systems, as newer methods have been proposed that outperform the existing traditional handcrafted techniques. However, such a supervised learning approach requires a sufficiently large training dataset covering all the possible scenarios. And since most people exhibit facial expressions based upon their age group, gender, and ethnicity, a diverse facial expression dataset is needed. This becomes even more crucial while developing a FER system for the Indian subcontinent, which comprises of a diverse multi-ethnic population. In this work, we present InFER, a real-world multi-ethnic Indian Facial Expression Recognition dataset consisting of 10,200 images and 4,200 short videos of seven basic facial expressions. The dataset has posed expressions of 600 human subjects, and spontaneous/acted expressions of 6000 images crowd-sourced from the internet. To the best of our knowledge InFER is the first of its kind consisting of images from 600 subjects from very diverse ethnicity of the Indian Subcontinent. We also present the experimental results of baseline & deep FER methods on our dataset to substantiate its usability in real-world practical applications.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Lifelong Robot Learning with Human Assisted Language Planners
Authors:
Meenal Parakh,
Alisha Fong,
Anthony Simeonov,
Tao Chen,
Abhishek Gupta,
Pulkit Agrawal
Abstract:
Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-ef…
▽ More
Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-efficient manner for rigid object manipulation. Our system can re-use newly acquired skills for future tasks, demonstrating the potential of open world and lifelong learning. We evaluate the proposed framework on multiple tasks in simulation and the real world. Videos are available at: https://sites.google.com/mit.edu/halp-robot-learning.
△ Less
Submitted 24 October, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Searching for axion forces with spin precession in atoms and molecules
Authors:
Prateek Agrawal,
Nicholas R. Hutzler,
David E. Kaplan,
Surjeet Rajendran,
Mario Reig
Abstract:
We propose to use atoms and molecules as quantum sensors of axion-mediated monopole-dipole forces. We show that electron spin precession experiments using atomic and molecular beams are well-suited for axion searches thanks to the presence of co-magnetometer states and single-shot temporal resolution. Experimental strategies to detect axion gradients from localised sources and the earth are presen…
▽ More
We propose to use atoms and molecules as quantum sensors of axion-mediated monopole-dipole forces. We show that electron spin precession experiments using atomic and molecular beams are well-suited for axion searches thanks to the presence of co-magnetometer states and single-shot temporal resolution. Experimental strategies to detect axion gradients from localised sources and the earth are presented, taking ACME III as a prototype example. Other possibilities including atomic beams, and laser-cooled atoms and molecules are discussed.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Compositional Foundation Models for Hierarchical Planning
Authors:
Anurag Ajay,
Seungwook Han,
Yilun Du,
Shuang Li,
Abhi Gupta,
Tommi Jaakkola,
Josh Tenenbaum,
Leslie Kaelbling,
Akash Srivastava,
Pulkit Agrawal
Abstract:
To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarc…
▽ More
To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in three different long-horizon table-top manipulation tasks.
△ Less
Submitted 21 September, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Authors:
Palaash Agrawal,
Haidi Azaman,
Cheston Tan
Abstract:
Understanding relations between objects is crucial for understanding the semantics of a visual scene. It is also an essential step in order to bridge visual and language models. However, current state-of-the-art computer vision models still lack the ability to perform spatial reasoning well. Existing datasets mostly cover a relatively small number of spatial relations, all of which are static rela…
▽ More
Understanding relations between objects is crucial for understanding the semantics of a visual scene. It is also an essential step in order to bridge visual and language models. However, current state-of-the-art computer vision models still lack the ability to perform spatial reasoning well. Existing datasets mostly cover a relatively small number of spatial relations, all of which are static relations that do not intrinsically involve motion. In this paper, we propose the Spatial and Temporal Understanding of Prepositions Dataset (STUPD) -- a large-scale video dataset for understanding static and dynamic spatial relationships derived from prepositions of the English language. The dataset contains 150K visual depictions (videos and images), consisting of 30 distinct spatial prepositional senses, in the form of object interaction simulations generated synthetically using Unity3D. In addition to spatial relations, we also propose 50K visual depictions across 10 temporal relations, consisting of videos depicting event/time-point interactions. To our knowledge, no dataset exists that represents temporal relations through visual settings. In this dataset, we also provide 3D information about object interactions such as frame-wise coordinates, and descriptions of the objects used. The goal of this synthetic dataset is to help models perform better in visual relationship detection in real-world settings. We demonstrate an increase in the performance of various models over 2 real-world datasets (ImageNet-VidVRD and Spatial Senses) when pretrained on the STUPD dataset, in comparison to other pretraining datasets.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Can humans help BERT gain "confidence"?
Authors:
Piyush Agrawal
Abstract:
The advancements in artificial intelligence over the last decade have opened a multitude of avenues for interdisciplinary research. Since the idea of artificial intelligence was inspired by the working of neurons in the brain, it seems pretty practical to combine the two fields and take the help of cognitive data to train AI models. Not only it will help to get a deeper understanding of the techno…
▽ More
The advancements in artificial intelligence over the last decade have opened a multitude of avenues for interdisciplinary research. Since the idea of artificial intelligence was inspired by the working of neurons in the brain, it seems pretty practical to combine the two fields and take the help of cognitive data to train AI models. Not only it will help to get a deeper understanding of the technology, but of the brain as well. In this thesis, I conduct novel experiments to integrate cognitive features from the Zurich Cognitive Corpus (ZuCo) (Hollenstein et al., 2018) with a transformer-based encoder model called BERT. I show how EEG and eye-tracking features from ZuCo can help to increase the performance of the NLP model. I confirm the performance increase with the help of a robustness-checking pipeline and derive a word-EEG lexicon to use in benchmarking on an external dataset that does not have any cognitive features associated with it. Further, I analyze the internal working mechanism of BERT and explore a potential method for model explainability by correlating it with a popular model-agnostic explainability framework called LIME (Ribeiro et al., 2016). Finally, I discuss the possible directions to take this research forward.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
The Monodromic Axion-Photon Coupling
Authors:
Prateek Agrawal,
Arthur Platschorre
Abstract:
We consider the general form of the axion coupling to photons in the axion-Maxwell theory. On general grounds this coupling takes the form of a monodromic function of the axion, which we call $g(a)$, multiplying the Chern-Pontryagin density $F \widetilde{F}$ of the photon. We show that the non-linearity of $g(a)$ is a spurion for the shift symmetry of the axion. In this context, when…
▽ More
We consider the general form of the axion coupling to photons in the axion-Maxwell theory. On general grounds this coupling takes the form of a monodromic function of the axion, which we call $g(a)$, multiplying the Chern-Pontryagin density $F \widetilde{F}$ of the photon. We show that the non-linearity of $g(a)$ is a spurion for the shift symmetry of the axion. In this context, when $g(a) \neq \mathbb{Z}a$, the linearized coupling of the axion $g'(a)$ is not quantized and there is a correlated mass term for the axion. Singularities in $g(a)$ due to the fast rearrangement of degrees of freedom are shown to have corresponding cusps and singularities in the axion potential. We derive the general form of $g(a)$ for the QCD axion, axions with perturbatively broken shift symmetries and axions descending from extra dimensions. In all cases, we show that there is a uniform general form of the monodromic function $g(a)$ and it is connected to the axion potential.
△ Less
Submitted 10 October, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Functional Consistency across Retail Central Bank Digital Currency and Commercial Bank Money
Authors:
Lee Braine,
Shreepad Shukla,
Piyush Agrawal
Abstract:
Central banks are actively exploring central bank digital currencies (CBDCs) by conducting research, proofs of concept and pilots. However, adoption of a retail CBDC can risk fragmenting both payments markets and retail deposits if the retail CBDC and commercial bank money do not have common operational characteristics. In this paper, we focus on a potential UK retail CBDC, the 'digital pound', an…
▽ More
Central banks are actively exploring central bank digital currencies (CBDCs) by conducting research, proofs of concept and pilots. However, adoption of a retail CBDC can risk fragmenting both payments markets and retail deposits if the retail CBDC and commercial bank money do not have common operational characteristics. In this paper, we focus on a potential UK retail CBDC, the 'digital pound', and the Bank of England's 'platform model'. We first explore how the concept of functional consistency could mitigate the risk of fragmentation. We next identify the common operational characteristics that are required to achieve functional consistency across all forms of regulated retail digital money. We identify four design options based on the provision of these common operational characteristics by the central bank, payment interface providers (PIPs), technical service providers (TSPs) or a financial market infrastructure (FMI). We next identify architecturally-significant use cases and select key capabilities that support these use cases and the common operational characteristics. We evaluate the suitability of the design options to provide these key capabilities and draw insights. We conclude that no single design option could provide functional consistency across digital pounds and commercial bank money and, instead, a complete solution would need to combine the suitable design option(s) for each key capability and include common ecosystem services provided by an FMI and TSPs.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Towards Benchmarking Power-Performance Characteristics of Federated Learning Clients
Authors:
Pratik Agrawal,
Philipp Wiesner,
Odej Kao
Abstract:
Federated Learning (FL) is a decentralized machine learning approach where local models are trained on distributed clients, allowing privacy-preserving collaboration by sharing model updates instead of raw data. However, the added communication overhead and increased training time caused by heterogenous data distributions results in higher energy consumption and carbon emissions for achieving simi…
▽ More
Federated Learning (FL) is a decentralized machine learning approach where local models are trained on distributed clients, allowing privacy-preserving collaboration by sharing model updates instead of raw data. However, the added communication overhead and increased training time caused by heterogenous data distributions results in higher energy consumption and carbon emissions for achieving similar model performance than traditional machine learning. At the same time, efficient usage of available energy is an important requirement for battery constrained devices. Because of this, many different approaches on energy-efficient and carbon-efficient FL scheduling and client selection have been published in recent years. However, most of this research oversimplifies power performance characteristics of clients by assuming that they always require the same amount of energy per processed sample throughout training. This overlooks real-world effects arising from operating devices under different power modes or the side effects of running other workloads in parallel. In this work, we take a first look on the impact of such factors and discuss how better power-performance estimates can improve energy-efficient and carbon-efficient FL scheduling.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Trie-NLG: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes
Authors:
Kaushal Kumar Maurya,
Maunendra Sankar Desarkar,
Manish Gupta,
Puneet Agrawal
Abstract:
Query auto-completion (QAC) aims to suggest plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personaliz…
▽ More
Query auto-completion (QAC) aims to suggest plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personalized Natural Language Generation (NLG) models have been proposed to leverage previous session queries as context for addressing these two challenges. However, such NLG models suffer from two drawbacks: (1) some of the previous session queries could be noisy and irrelevant to the user intent for the current prefix, and (2) NLG models cannot directly incorporate historical query popularity. This motivates us to propose a novel NLG model for QAC, Trie-NLG, which jointly leverages popularity signals from trie and personalization signals from previous session queries. We train the Trie-NLG model by augmenting the prefix with rich context comprising of recent session queries and top trie completions. This simple modeling approach overcomes the limitations of trie-based and NLG-based approaches and leads to state-of-the-art performance. We evaluate the Trie-NLG model using two large QAC datasets. On average, our model achieves huge ~57% and ~14% boost in MRR over the popular trie-based lookup and the strong BART-based baseline methods, respectively. We make our code publicly available.
△ Less
Submitted 23 October, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
Authors:
Zechu Li,
Tao Chen,
Zhang-Wei Hong,
Anurag Ajay,
Pulkit Agrawal
Abstract:
Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale…
▽ More
Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
Authors:
Marcel Torne,
Max Balsells,
Zihan Wang,
Samedh Desai,
Tao Chen,
Pulkit Agrawal,
Abhishek Gupta
Abstract:
Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to…
▽ More
Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation
Authors:
Andi Peng,
Aviv Netanyahu,
Mark Ho,
Tianmin Shu,
Andreea Bobu,
Julie Shah,
Pulkit Agrawal
Abstract:
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences a…
▽ More
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.
△ Less
Submitted 13 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
Authors:
Jaedong Hwang,
Zhang-Wei Hong,
Eric Chen,
Akhilan Boopathy,
Pulkit Agrawal,
Ila Fiete
Abstract:
Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping o…
▽ More
Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping of large spaces. Agents solve the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a "fragmentation event" that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM) and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We demonstrate that FARMap replicates the fragmentation points observed in animal studies. We evaluate FARMap on complex procedurally-generated spatial environments and realistic simulations to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance. https://jd730.github.io/projects/FARMap/
△ Less
Submitted 8 July, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement
Authors:
Anthony Simeonov,
Ankit Goyal,
Lucas Manuelli,
Lin Yen-Chen,
Alina Sarmiento,
Alberto Rodriguez,
Pulkit Agrawal,
Dieter Fox
Abstract:
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of…
▽ More
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
TGRL: An Algorithm for Teacher Guided Reinforcement Learning
Authors:
Idan Shenfeld,
Zhang-Wei Hong,
Aviv Tamar,
Pulkit Agrawal
Abstract:
Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without…
▽ More
Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.
△ Less
Submitted 19 February, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Benchmarking Large Language Model Capabilities for Conditional Generation
Authors:
Joshua Maynez,
Priyanka Agrawal,
Sebastian Gehrmann
Abstract:
Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classifica…
▽ More
Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classification or regression. Despite their ubiquitous use, the generation quality of language models is rarely evaluated when these models are introduced. Additionally, it is unclear how existing generation tasks--while they can be used to compare systems at a high level--relate to the real world use cases for which people have been adopting them. In this work, we discuss how to adapt existing application-specific generation benchmarks to PLMs and provide an in-depth, empirical study of the limitations and capabilities of PLMs in natural language generation tasks along dimensions such as scale, architecture, input and output language. Our results show that PLMs differ in their applicability to different data regimes and their generalization to multiple languages and inform which PLMs to use for a given generation task setup. We share best practices to be taken into consideration when benchmarking generation capabilities during the development of upcoming PLMs.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Spatial beam dynamics in graded-index multimode fibers under Raman amplification:a variational approach
Authors:
Ashis Paul,
Anuj P. Lara,
Samudra Roy,
Govind P. Agrawal
Abstract:
We investigate the spatial beam dynamics inside a multimode graded-index fiber under Raman amplification by adopting a semi-analytical variational approach. The variational analysis provides us with four coupled ordinary differential equations that govern the beam's dynamics under Raman gain and are much faster to solve numerically compared to the full nonlinear wave equation. Their solution also…
▽ More
We investigate the spatial beam dynamics inside a multimode graded-index fiber under Raman amplification by adopting a semi-analytical variational approach. The variational analysis provides us with four coupled ordinary differential equations that govern the beam's dynamics under Raman gain and are much faster to solve numerically compared to the full nonlinear wave equation. Their solution also provides considerable physical insight and allows us to study the impact of important nonlinear phenomena such as self-focusing and cross-phase modulation. We first show that the variational results corroborate well with full numerical simulations and then use them to investigate the signal's dynamics under different initial conditions such as the initial widths of the pump and signal beams. This allows us to quantify the conditions under which the quality of a signal beam can improve, without collapse of the beam owing to self-focusing. While time-consuming full simulations may be needed when gain saturation and pump depletion must be included, the variational method is useful for gaining valuable physical insight and for studying dependence of the amplified beam's width and amplitude on various physical parameters in a faster fashion.
△ Less
Submitted 28 June, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.