Skip to main content

Showing 1–25 of 25 results for author: Devlin, J

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  4. arXiv:2211.07730  [pdf, other

    cs.LG cs.AI cs.CL

    QueryForm: A Simple Zero-shot Form Entity Query Framework

    Authors: Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

    Abstract: Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific… ▽ More

    Submitted 27 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted to Findings of ACL 2023

  5. arXiv:2211.05102  [pdf, other

    cs.LG cs.CL

    Efficiently Scaling Transformer Inference

    Authors: Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean

    Abstract: We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a sim… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  6. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  7. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  8. arXiv:2008.06341  [pdf, other

    nlin.CG cs.OH

    Probabilistic Cellular Automata for Granular Media in Video Games

    Authors: Jonathan Devlin, Micah D. Schuster

    Abstract: Granular materials are very common in the everyday world. Media such as sand, soil, gravel, food stuffs, pharmaceuticals, etc. all have similar irregular flow since they are composed of numerous small solid particles. In video games, simulating these materials increases immersion and can be used for various game mechanics. Computationally, full scale simulation is not typically feasible except o… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Cellular Automata, Sandpile

  9. arXiv:1906.07348  [pdf, other

    cs.CL cs.LG

    Zero-Shot Entity Linking by Reading Entity Descriptions

    Authors: Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee

    Abstract: We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. Fir… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  10. arXiv:1906.05416  [pdf, other

    cs.CL

    Synthetic QA Corpora Generation with Roundtrip Consistency

    Authors: Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins

    Abstract: We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency. By pretraining on the resulting corpora we obtain significant improvements on SQuAD2 and NQ, establishing a new state-of-the-art on the latter. Our synthetic data generation models, for both qu… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  11. arXiv:1901.09128  [pdf, other

    cs.CL

    Language Model Pre-training for Hierarchical Document Representations

    Authors: Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin

    Abstract: Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such a large context can be difficult to learn, especially in the case where there is limited labeled data available. Building on the recent success of language mod… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

  12. arXiv:1810.04805  [pdf, other

    cs.CL

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

    Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with… ▽ More

    Submitted 24 May, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

  13. arXiv:1805.04276  [pdf, other

    cs.LG stat.ML

    Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

    Authors: Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli

    Abstract: Program synthesis is the task of automatically generating a program consistent with a specification. Recent years have seen proposal of a number of neural approaches for program synthesis, many of which adopt a sequence generation paradigm similar to neural machine translation, in which sequence-to-sequence models are trained to maximize the likelihood of known reference programs. While achieving… ▽ More

    Submitted 22 May, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: ICLR 2018

  14. arXiv:1802.05368  [pdf, other

    cs.CL

    Universal Neural Machine Translation for Extremely Low Resource Languages

    Authors: Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O. K. Li

    Abstract: In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual wo… ▽ More

    Submitted 16 April, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: NAACL-HLT 2018

  15. arXiv:1710.11054  [pdf, other

    cs.AI cs.PL cs.SE

    Semantic Code Repair using Neuro-Symbolic Transformation Networks

    Authors: Jacob Devlin, Jonathan Uesato, Rishabh Singh, Pushmeet Kohli

    Abstract: We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code. The majority of past work in semantic code repair assumed access to unit tests against which candidate repairs could be validated. In contrast, the goal here is to develop a strong statistical model to accurately predict both bug locations and exact fixes without ac… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

  16. arXiv:1710.04157  [pdf, other

    cs.AI

    Neural Program Meta-Induction

    Authors: Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli

    Abstract: Most recently proposed methods for Neural Program Induction work under the assumption of having a large set of input/output (I/O) examples for learning any underlying input-output mapping. This paper aims to address the problem of data and computation efficiency of program induction by leveraging information from related tasks. Specifically, we propose two approaches for cross-task knowledge trans… ▽ More

    Submitted 11 October, 2017; originally announced October 2017.

    Comments: 8 Pages + 1 page appendix

  17. arXiv:1705.01991  [pdf, other

    cs.CL

    Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

    Authors: Jacob Devlin

    Abstract: Attentional sequence-to-sequence models have become the new standard for machine translation, but one challenge of such models is a significant increase in training and decoding cost compared to phrase-based systems. Here, we focus on efficient decoding, with a goal of achieving accuracy close the state-of-the-art in neural machine translation (NMT), while achieving CPU decoding speed/throughput c… ▽ More

    Submitted 4 May, 2017; originally announced May 2017.

  18. arXiv:1703.07469  [pdf, other

    cs.AI

    RobustFill: Neural Program Learning under Noisy I/O

    Authors: Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

    Abstract: The problem of automatically generating a computer program from some specification has been studied since the early days of AI. Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, wh… ▽ More

    Submitted 21 March, 2017; originally announced March 2017.

    Comments: 8 pages + 9 pages of supplementary material

  19. arXiv:1604.03968  [pdf, other

    cs.CL cs.AI cs.CV

    Visual Storytelling

    Authors: Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell

    Abstract: We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The first release of this dataset, SIND v.1, includes 81,743 unique photos in 20,211 sequences, aligned to both descriptive (caption) and story language. We establish several strong baselines for the storytelling task, and motivate an automatic metric to benc… ▽ More

    Submitted 13 April, 2016; originally announced April 2016.

    Comments: to appear in NAACL 2016

  20. arXiv:1603.06059  [pdf, other

    cs.CL cs.AI cs.CV

    Generating Natural Questions About an Image

    Authors: Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende

    Abstract: There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by… ▽ More

    Submitted 8 June, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

    Comments: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

  21. arXiv:1511.01042  [pdf, other

    cs.CL cs.LG cs.NE

    Detecting Interrogative Utterances with Recurrent Neural Networks

    Authors: Junyoung Chung, Jacob Devlin, Hany Hassan Awadalla

    Abstract: In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement. We com- pare the outcomes of regularization methods that are popularly used to train deep neural networks and study how different context functions can affect the classification performance. We also compare the efficacy of gated activation… ▽ More

    Submitted 15 November, 2015; v1 submitted 3 November, 2015; originally announced November 2015.

    Comments: 6 pages, accepted to NIPS 2015 Workshop on Machine Learning for Spoken Language Understanding and Interaction

  22. arXiv:1506.06833  [pdf, other

    cs.CL cs.AI cs.CV

    A Survey of Current Datasets for Vision and Language Research

    Authors: Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao, Huang, Lucy Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell

    Abstract: Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the… ▽ More

    Submitted 19 August, 2015; v1 submitted 22 June, 2015; originally announced June 2015.

    Comments: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this work

  23. arXiv:1506.00698  [pdf, other

    cs.CL

    Statistical Machine Translation Features with Multitask Tensor Networks

    Authors: Hendra Setiawan, Zhongqiang Huang, Jacob Devlin, Thomas Lamar, Rabih Zbib, Richard Schwartz, John Makhoul

    Abstract: We present a three-pronged approach to improving Statistical Machine Translation (SMT), building on recent success in the application of neural networks to SMT. First, we propose new features based on neural networks to model various non-local translation phenomena. Second, we augment the architecture of the neural network with tensor layers that capture important higher-order interaction among th… ▽ More

    Submitted 1 June, 2015; originally announced June 2015.

    Comments: 11 pages (9 content + 2 references), 2 figures, accepted to ACL 2015 as a long paper

  24. arXiv:1505.04467  [pdf, other

    cs.CV

    Exploring Nearest Neighbor Approaches for Image Captioning

    Authors: Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick

    Abstract: We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When mea… ▽ More

    Submitted 17 May, 2015; originally announced May 2015.

  25. arXiv:1505.01809  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Language Models for Image Captioning: The Quirks and What Works

    Authors: Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell

    Abstract: Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a re… ▽ More

    Submitted 14 October, 2015; v1 submitted 7 May, 2015; originally announced May 2015.

    Comments: See http://research.microsoft.com/en-us/projects/image_captioning for project information