-
Gemma: Open Models Based on Gemini Research and Technology
Authors:
Gemma Team,
Thomas Mesnard,
Cassidy Hardin,
Robert Dadashi,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti,
Léonard Hussenot,
Pier Giuseppe Sessa,
Aakanksha Chowdhery,
Adam Roberts,
Aditya Barua,
Alex Botev,
Alex Castro-Ros,
Ambrose Slone,
Amélie Héliou,
Andrea Tacchetti,
Anna Bulanova,
Antonia Paterson,
Beth Tsai,
Bobak Shahriari
, et al. (83 additional authors not shown)
Abstract:
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge…
▽ More
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.
△ Less
Submitted 16 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yanping Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yujing Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Natural Language to Code Generation in Interactive Data Science Notebooks
Authors:
Pengcheng Yin,
Wen-Ding Li,
Kefan Xiao,
Abhishek Rao,
Yeming Wen,
Kensen Shi,
Joshua Howland,
Paige Bailey,
Michele Catasta,
Henryk Michalewski,
Alex Polozov,
Charles Sutton
Abstract:
Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using…
▽ More
Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PaChiNCo, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanation, showing the potential to improve the diversity and explainability of model predictions.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Imagining Future Digital Assistants at Work: A Study of Task Management Needs
Authors:
Yonchanok Khaokaew,
Indigo Holcombe-James,
Mohammad Saiedur Rahaman,
Jonathan Liono,
Johanne R. Trippas,
Damiano Spina,
Nicholas Belkin,
Peter Bailey,
Paul N. Bennett,
Yongli Ren,
Mark Sanderson,
Falk Scholer,
Ryen W. White,
Flora D. Salim
Abstract:
Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs using data from a user study of 40 workers over a f…
▽ More
Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs using data from a user study of 40 workers over a four-week period. Our qualitative analysis confirms existing research and generates new insight on the role of DAs in managing people's time, tasks, and information. Placing these insights in relation to quantitative analysis of self-reported task data, we highlight how different occupation roles require DAs to take varied approaches to these domains and the effect of task characteristics on the imagined features. Our findings have implications for the design of future DAs in work settings, and we offer some recommendations for reduction to practice.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
$O\left(1/T\right)$ Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games
Authors:
James P. Bailey
Abstract:
We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate $O(1/T)$ for this set of games. Alternating gradient descent obtains this convergence guarantee while using fixed learning rates that are four times larger than the optimistic variant of gradient descent. Experimentally, we show with…
▽ More
We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate $O(1/T)$ for this set of games. Alternating gradient descent obtains this convergence guarantee while using fixed learning rates that are four times larger than the optimistic variant of gradient descent. Experimentally, we show with 97.5% confidence that, on average, these larger learning rates result in time-averaged strategies that are 2.585 times closer to the set of Nash equilibria than optimistic gradient descent.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Stochastic Multiplicative Weights Updates in Zero-Sum Games
Authors:
James P. Bailey,
Sai Ganesh Nagarajan,
Georgios Piliouras
Abstract:
We study agents competing against each other in a repeated network zero-sum game while applying the multiplicative weights update (MWU) algorithm with fixed learning rates. In our implementation, agents select their strategies probabilistically in each iteration and update their weights/strategies using the realized vector payoff of all strategies, i.e., stochastic MWU with full information. We sh…
▽ More
We study agents competing against each other in a repeated network zero-sum game while applying the multiplicative weights update (MWU) algorithm with fixed learning rates. In our implementation, agents select their strategies probabilistically in each iteration and update their weights/strategies using the realized vector payoff of all strategies, i.e., stochastic MWU with full information. We show that the system results in an irreducible Markov chain where agent strategies diverge from the set of Nash equilibria. Further, we show that agents will play pure strategies with probability 1 in the limit.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Conditions for Stability in Strategic Matching
Authors:
James P. Bailey,
Craig A. Tovey
Abstract:
We consider the stability of matchings when individuals strategically submit preference information to a publicly known algorithm. Most pure Nash equilibria of the ensuing game yield a matching that is unstable with respect to the individuals' sincere preferences. We introduce a well-supported minimal dishonesty constraint, and obtain conditions under which every pure Nash equilibrium yields a mat…
▽ More
We consider the stability of matchings when individuals strategically submit preference information to a publicly known algorithm. Most pure Nash equilibria of the ensuing game yield a matching that is unstable with respect to the individuals' sincere preferences. We introduce a well-supported minimal dishonesty constraint, and obtain conditions under which every pure Nash equilibrium yields a matching that is stable with respect to the sincere preferences. The conditions on the matching algorithm are to be either fully-randomized, or monotonic and independent of non-spouses (INS), an IIA-like property. These conditions are significant because they support the use of algorithms other than the Gale-Shapley (man-optimal) algorithm for kidney exchange and other applications. We prove that the Gale-Shapley algorithm always yields the woman-optimal matching when individuals are minimally dishonest. However, we give a negative answer to one of Gusfield and Irving's open questions: there is no monotonic INS or fully-randomized stable matching algorithm that is certain to yield the egalitarian-optimal matching when individuals are strategic and minimally dishonest. Finally, we show that these results extend to the student placement problem, where women are polyandrous but must be honest but do not extend to the admissions problem, where women are both polyandrous and strategic.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method
Authors:
Shengjun Zhang,
Yunlong Dong,
Dong Xie,
Lisha Yao,
Colleen P. Bailey,
Shengli Fu
Abstract:
This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate thei…
▽ More
This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate their own local stochastic ZO oracle along with coordinates with an adaptive smoothing parameter. We show that the proposed algorithm achieves the convergence rate of $\mathcal{O}(\sqrt{p}/\sqrt{T})$ for general nonconvex cost functions. We demonstrate the efficiency of proposed algorithms through a numerical example in comparison with the existing state-of-the-art centralized and distributed ZO algorithms.
△ Less
Submitted 13 October, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
Authors:
Daniel Zhang,
Colleen P. Bailey
Abstract:
In this paper, we investigate the obstacle avoidance and navigation problem in the robotic control area. For solving such a problem, we propose revised Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization algorithms with an improved reward shaping technique. We compare the performances between the original DDPG and PPO with the revised version of both on simulations with a re…
▽ More
In this paper, we investigate the obstacle avoidance and navigation problem in the robotic control area. For solving such a problem, we propose revised Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization algorithms with an improved reward shaping technique. We compare the performances between the original DDPG and PPO with the revised version of both on simulations with a real mobile robot and demonstrate that the proposed algorithms achieve better results.
△ Less
Submitted 9 April, 2020; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Extremal Region Analysis based Deep Learning Framework for Detecting Defects
Authors:
Zelin Deng,
Xiaolong Yan,
Shengjun Zhang,
Colleen P. Bailey
Abstract:
A maximally stable extreme region (MSER) analysis based convolutional neural network (CNN) for unified defect detection framework is proposed in this paper. Our proposed framework utilizes the generality and stability of MSER to generate the desired defect candidates. Then a specific trained binary CNN classifier is adopted over the defect candidates to produce the final defect set. Defect dataset…
▽ More
A maximally stable extreme region (MSER) analysis based convolutional neural network (CNN) for unified defect detection framework is proposed in this paper. Our proposed framework utilizes the generality and stability of MSER to generate the desired defect candidates. Then a specific trained binary CNN classifier is adopted over the defect candidates to produce the final defect set. Defect datasets over different categories \blue{are used} in the experiments. More generally, the parameter settings in MSER can be adjusted to satisfy different requirements in various industries (high precision, high recall, etc). Extensive experimental results have shown the efficacy of the proposed framework.
△ Less
Submitted 22 May, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent
Authors:
James P. Bailey,
Gauthier Gidel,
Georgios Piliouras
Abstract:
Gradient descent is arguably one of the most popular online optimization methods with a wide array of applications. However, the standard implementation where agents simultaneously update their strategies yields several undesirable properties; strategies diverge away from equilibrium and regret grows over time. In this paper, we eliminate these negative properties by introducing a different implem…
▽ More
Gradient descent is arguably one of the most popular online optimization methods with a wide array of applications. However, the standard implementation where agents simultaneously update their strategies yields several undesirable properties; strategies diverge away from equilibrium and regret grows over time. In this paper, we eliminate these negative properties by introducing a different implementation to obtain finite regret via arbitrary fixed step-size. We obtain this surprising property by having agents take turns when updating their strategies. In this setting, we show that an agent that uses gradient descent obtains bounded regret -- regardless of how their opponent updates their strategies. Furthermore, we show that in adversarial settings that agents' strategies are bounded and cycle when both are using the alternating gradient descent algorithm.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes
Authors:
James P. Bailey,
Georgios Piliouras
Abstract:
We show for the first time, to our knowledge, that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin ``fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our ana…
▽ More
We show for the first time, to our knowledge, that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes. This phenomenon, that we coin ``fast and furious" learning in games, sets a new benchmark about what is possible both in max-min optimization as well as in multi-agent systems. Our analysis does not depend on introducing a carefully tailored dynamic. Instead we focus on the most well studied online dynamic, gradient descent. Similarly, we focus on the simplest textbook class of games, two-agent two-strategy zero-sum games, such as Matching Pennies. Even for this simplest of benchmarks the best known bound for total regret, prior to our work, was the trivial one of $O(T)$, which is immediately applicable even to a non-learning agent. Based on a tight understanding of the geometry of the non-equilibrating trajectories in the dual space we prove a regret bound of $Θ(\sqrt{T})$ matching the well known optimal bound for adaptive step sizes in the online setting. This guarantee holds for all fixed step-sizes without having to know the time horizon in advance and adapt the fixed step-size accordingly. As a corollary, we establish that even with fixed learning rates the time-average of mixed strategies, utilities converge to their exact Nash equilibrium values.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Diversifying Reply Suggestions using a Matching-Conditional Variational Autoencoder
Authors:
Budhaditya Deb,
Peter Bailey,
Milad Shokouhi
Abstract:
We consider the problem of diversifying automated reply suggestions for a commercial instant-messaging (IM) system (Skype). Our conversation model is a standard matching based information retrieval architecture, which consists of two parallel encoders to project messages and replies into a common feature representation. During inference, we select replies from a fixed response set using nearest ne…
▽ More
We consider the problem of diversifying automated reply suggestions for a commercial instant-messaging (IM) system (Skype). Our conversation model is a standard matching based information retrieval architecture, which consists of two parallel encoders to project messages and replies into a common feature representation. During inference, we select replies from a fixed response set using nearest neighbors in the feature space. To diversify responses, we formulate the model as a generative latent variable model with Conditional Variational Auto-Encoder (M-CVAE). We propose a constrained-sampling approach to make the variational inference in M-CVAE efficient for our production system. In offline experiments, M-CVAE consistently increased diversity by ~30-40% without significant impact on relevance. This translated to a 5% gain in click-rate in our online production system.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Multi-Agent Learning in Network Zero-Sum Games is a Hamiltonian System
Authors:
James P. Bailey,
Georgios Piliouras
Abstract:
Zero-sum games are natural, if informal, analogues of closed physical systems where no energy/utility can enter or exit. This analogy can be extended even further if we consider zero-sum network (polymatrix) games where multiple agents interact in a closed economy. Typically, (network) zero-sum games are studied from the perspective of Nash equilibria. Nevertheless, this comes in contrast with the…
▽ More
Zero-sum games are natural, if informal, analogues of closed physical systems where no energy/utility can enter or exit. This analogy can be extended even further if we consider zero-sum network (polymatrix) games where multiple agents interact in a closed economy. Typically, (network) zero-sum games are studied from the perspective of Nash equilibria. Nevertheless, this comes in contrast with the way we typically think about closed physical systems, e.g., Earth-moon systems which move perpetually along recurrent trajectories of constant energy.
We establish a formal and robust connection between multi-agent systems and Hamiltonian dynamics -- the same dynamics that describe conservative systems in physics. Specifically, we show that no matter the size, or network structure of such closed economies, even if agents use different online learning dynamics from the standard class of Follow-the-Regularized-Leader, they yield Hamiltonian dynamics. This approach generalizes the known connection to Hamiltonians for the special case of replicator dynamics in two agent zero-sum games developed by Hofbauer. Moreover, our results extend beyond zero-sum settings and provide a type of a Rosetta stone (see e.g. Table 1) that helps to translate results and techniques between online optimization, convex analysis, games theory, and physics.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.