subscribe to arXiv mailings

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2302.05617 [pdf, other]

Towards Human-Centred Crowd Computing: Software for Better Use of Computational Resources

Authors: Niroshinie Fernando, Chetan Arora, Seng W. Loke, Lubna Alam, Stephen La Macchia, Helen Graesser

Abstract: Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for ``crowd computing'' (CC) has been studied in the recent past from an infrastructural feasibility perspect… ▽ More Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for ``crowd computing'' (CC) has been studied in the recent past from an infrastructural feasibility perspective. However, for the CC paradigm to be successful, numerous socio-technical and software engineering (SE), specifically the requirements engineering (RE)-related factors are at play and have not been investigated in the literature. In this paper, we motivate the SE-related aspects of CC and the ideas for implementing mobile apps required for CC scenarios. We present the results of a preliminary study on understanding the human aspects, incentives that motivate users, and CC app requirements, and present our future development plan in this relatively new field of research for SE applications. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2209.14375 [pdf, other]

Improving alignment of dialogue agents via targeted human judgements

Authors: Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu , et al. (9 additional authors not shown)

Abstract: We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into na… ▽ More We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2109.05971 [pdf, other]

Internet of Things in Space: A Review of Opportunities and Challenges from Satellite-Aided Computing to Digitally-Enhanced Space Living

Authors: Jonathan Kua, Chetan Arora, Seng W. Loke, Niroshinie Fernando, Chathurika Ranaweera

Abstract: Recent scientific and technological advancements driven by the Internet of Things (IoT), Machine Learning (ML) and Artificial Intelligence (AI), distributed computing and data communication technologies have opened up a vast range of opportunities in many scientific fields - spanning from fast, reliable and efficient data communication to large-scale cloud/edge computing and intelligent big data a… ▽ More Recent scientific and technological advancements driven by the Internet of Things (IoT), Machine Learning (ML) and Artificial Intelligence (AI), distributed computing and data communication technologies have opened up a vast range of opportunities in many scientific fields - spanning from fast, reliable and efficient data communication to large-scale cloud/edge computing and intelligent big data analytics. Technological innovations and developments in these areas have also enabled many opportunities in the space industry. The successful Mars landing of NASA's Perseverance rover on February 18, 2021 represents another giant leap for mankind in space exploration. Emerging research and developments of connectivity and computing technologies in IoT for space/non-terrestrial environments is expected to yield significant benefits in the near future. This survey paper presents a broad overview of the area and provides a look-ahead of the opportunities made possible by IoT and space-based technologies. We first survey the current developments of IoT and space industry, and identify key challenges and opportunities in these areas. We then review the state-of-the-art and discuss future opportunities for IoT developments, deployment and integration to support future endeavours in space exploration. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:1112.0057 [pdf, ps, other]

Flip-OFDM for Unipolar Communication Systems

Authors: Nirmal Fernando, Yi Hong, Emanuele Viterbo

Abstract: Unipolar communications systems can transmit information using only real and positive signals. This includes a variety of physical channels ranging from optical (fiber or free-space), to RF wireless using amplitude modulation with non-coherent reception, to baseband single wire communications. Unipolar OFDM techniques enable to efficiently compensate frequency selective distortion in the unipolar… ▽ More Unipolar communications systems can transmit information using only real and positive signals. This includes a variety of physical channels ranging from optical (fiber or free-space), to RF wireless using amplitude modulation with non-coherent reception, to baseband single wire communications. Unipolar OFDM techniques enable to efficiently compensate frequency selective distortion in the unipolar communication systems. One of the leading examples of unipolar OFDM is asymmetric clipped optical OFDM (ACO-OFDM) originally proposed for optical communications. Flip-OFDM is an alternative approach that was proposed in a patent, but its performance and full potentials have never been investigated in the literature. In this paper, we first compare Flip-OFDM and ACO-OFDM, and show that both techniques have the same performance but different complexities (Flip-OFDM offers 50% saving). We then propose a new detection scheme, which enables to reduce the noise at the Flip-OFDM receiver by almost 3dB. The analytical performance of the noise filtering schemes is supported by the simulation results. △ Less

Submitted 13 December, 2011; v1 submitted 30 November, 2011; originally announced December 2011.

Comments: 19 pages, 8 pages (re-uploaded with corrected Fig 2a)

arXiv:1111.5682 [pdf, ps, other]

doi 10.1109/ITW.2011.6089566

Flip-OFDM for Optical Wireless Communications

Authors: Nirmal Fernando, Yi Hong, Emanuele Viterbo

Abstract: We consider two uniploar OFDM techniques for optical wireless communications: asymmetric clipped optical OFDM (ACO-OFDM) and Flip-OFDM. Both techniques can be used to compensate multipath distortion effects in optical wireless channels. However, ACO-OFDM has been widely studied in the literature, while the performance of Flip-OFDM has never been investigated. In this paper, we conduct the performa… ▽ More We consider two uniploar OFDM techniques for optical wireless communications: asymmetric clipped optical OFDM (ACO-OFDM) and Flip-OFDM. Both techniques can be used to compensate multipath distortion effects in optical wireless channels. However, ACO-OFDM has been widely studied in the literature, while the performance of Flip-OFDM has never been investigated. In this paper, we conduct the performance analysis of Flip-OFDM and propose additional modification to the original scheme in order to compare the performance of both techniques. Finally, it is shown by simulation that both techniques have the same performance but different hardware complexities. In particular, for slow fading channels, Flip-OFDM offers 50% saving in hardware complexity over ACO-OFDM at the receiver. △ Less

Submitted 24 November, 2011; originally announced November 2011.

Comments: published in IEEE Information Theory Workshop, Paraty Brazil, Sept 2011

Showing 1–7 of 7 results for author: Fernando, N