Skip to main content

Showing 1–24 of 24 results for author: Vodrahalli, K

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2310.01783  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    Can large language models provide useful feedback on research papers? A large-scale empirical analysis

    Authors: Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, Daniel McFarland, James Zou

    Abstract: Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production and intricate knowledge specialization challenge the conventional scientific feedback mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who are more junior or from under-resourced settings have especially hard times getting timely feedback. With the brea… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2306.08141  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

    Authors: Kailas Vodrahalli, James Zou

    Abstract: As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image a… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 31 pages, 27 figures, ICML 2024

  5. arXiv:2305.19496  [pdf, ps, other

    cs.GT cs.LG

    Is Learning in Games Good for the Learners?

    Authors: William Brown, Jon Schneider, Kiran Vodrahalli

    Abstract: We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of $\textit{generalized equilibrium}$ which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair… ▽ More

    Submitted 16 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 22 pages

  6. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  7. arXiv:2209.09105  [pdf

    cs.CV cs.AI eess.IV

    Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

    Authors: Kailas Vodrahalli, Justin Ko, Albert S. Chiou, Roberto Novoa, Abubakar Abid, Michelle Phung, Kiana Yekrang, Paige Petrone, James Zou, Roxana Daneshjou

    Abstract: Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: 24 pages, 7 figures

  8. arXiv:2205.14519  [pdf, other

    cs.LG cs.GT stat.ML

    Online Learning with Bounded Recall

    Authors: Jon Schneider, Kiran Vodrahalli

    Abstract: We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constr… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: 13 pages, 2 figures, accepted at ICML 2024

  9. arXiv:2203.08807  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set

    Authors: Roxana Daneshjou, Kailas Vodrahalli, Roberto A Novoa, Melissa Jenkins, Weixin Liang, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, Johan A. C. Allerup, Utako Okata-Karigane, James Zou, Albert Chiou

    Abstract: Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology I… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  10. arXiv:2202.05983  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    Uncalibrated Models Can Improve Human-AI Collaboration

    Authors: Kailas Vodrahalli, Tobias Gerstenberg, James Zou

    Abstract: In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of "confidence" that the human can use to calibrate how much they depend on or trust the advice. In this paper, we present an initial exploration that suggests… ▽ More

    Submitted 27 October, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: 21 pages, 12 figures, NeurIPS 2022

  11. arXiv:2202.00834  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Nonlinear Initialization Methods for Low-Rank Neural Networks

    Authors: Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy, Sagar Jain, Ed H. Chi

    Abstract: We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initializat… ▽ More

    Submitted 19 May, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 32 pages, 4 figures, in submission. fixed some errors in previous versions and re-structured/re-focused the paper

  12. arXiv:2111.08006  [pdf, other

    eess.IV cs.CV cs.LG

    Disparities in Dermatology AI: Assessments Using Diverse Clinical Images

    Authors: Roxana Daneshjou, Kailas Vodrahalli, Weixin Liang, Roberto A Novoa, Melissa Jenkins, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, James Zou, Albert Chiou

    Abstract: More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show tha… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  13. arXiv:2107.07015  [pdf, other

    cs.AI cs.HC

    Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions

    Authors: Kailas Vodrahalli, Roxana Daneshjou, Tobias Gerstenberg, James Zou

    Abstract: In decision support applications of AI, the AI algorithm's output is framed as a suggestion to a human user. The user may ignore this advice or take it into consideration to modify their decision. With the increasing prevalence of such human-AI interactions, it is important to understand how users react to AI advice. In this paper, we recruited over 1100 crowdworkers to characterize how humans use… ▽ More

    Submitted 1 June, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Conference on Artificial Intelligence, Ethics, and Society (AIES 2022)

  14. arXiv:2106.10189  [pdf, other

    cs.LG

    Adversarial Training Helps Transfer Learning via Better Representations

    Authors: Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou

    Abstract: Transfer learning aims to leverage models pre-trained on source data to efficiently adapt to target setting, where only limited data are available for model fine-tuning. Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains. However, why this happens is not known. In this paper, we provide a theoretical model… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  15. arXiv:2102.12571  [pdf, other

    cs.AI cs.LG cs.RO

    The Logical Options Framework

    Authors: Brandon Araki, Xiao Li, Kiran Vodrahalli, Jonathan DeCastro, Micah J. Fry, Daniela Rus

    Abstract: Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: 23 pages, 19 figures

    ACM Class: I.2.9; I.2.6; G.3; I.5.1

  16. arXiv:2011.13149  [pdf, other

    cs.LG cs.NE

    Better Knowledge Retention through Metric Learning

    Authors: Ke Li, Shichong Peng, Kailas Vodrahalli, Jitendra Malik

    Abstract: In continual learning, new categories may be introduced over time, and an ideal learning system should perform well on both the original categories and the new categories. While deep neural nets have achieved resounding success in the classical supervised setting, they are known to forget about knowledge acquired in prior episodes of learning if the examples encountered in the current episode of l… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

  17. arXiv:2010.02086  [pdf, other

    cs.CV cs.CY cs.LG eess.SP

    TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos

    Authors: Kailas Vodrahalli, Roxana Daneshjou, Roberto A Novoa, Albert Chiou, Justin M Ko, James Zou

    Abstract: Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework prop… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 12 pages, 5 figures, Preprint of an article published in Pacific Symposium on Biocomputing \c{opyright} 2020 World Scientific Publishing Co., Singapore, http://psb.stanford.edu/

  18. arXiv:2009.06117  [pdf, other

    cs.GT cs.CC cs.LG cs.MA econ.TH

    The Platform Design Problem

    Authors: Christos Papadimitriou, Kiran Vodrahalli, Mihalis Yannakakis

    Abstract: On-line firms deploy suites of software platforms, where each platform is designed to interact with users during a certain activity, such as browsing, chatting, socializing, emailing, driving, etc. The economic and incentive structure of this exchange, as well as its algorithmic nature, have not been explored to our knowledge. We model this interaction as a Stackelberg game between a Designer and… ▽ More

    Submitted 12 July, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

    Comments: updated with more results

  19. Blind interactive learning of modulation schemes: Multi-agent cooperation without co-design

    Authors: Anant Sahai, Joshua Sanz, Vignesh Subramanian, Caryn Tran, Kailas Vodrahalli

    Abstract: We examine the problem of learning to cooperate in the context of wireless communication. In our setting, two agents must learn modulation schemes that enable them to communicate across a power-constrained additive white Gaussian noise channel. We investigate whether learning is possible under different levels of information sharing between distributed agents which are not necessarily co-designed.… ▽ More

    Submitted 1 April, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: 33 pages, 25 figures, code can be found at https://github.com/ml4wireless/echo, accepted for publication in IEEE Access

  20. arXiv:1909.01502  [pdf, other

    stat.ML cs.CR cs.LG

    Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

    Authors: Mathias Lecuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu

    Abstract: Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of t… ▽ More

    Submitted 6 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: Extended version of a paper presented at the 27th ACM Symposium on Operating Systems Principles (SOSP '19)

  21. arXiv:1903.09139  [pdf, other

    cs.LG stat.ML

    Harmless interpolation of noisy data in regression

    Authors: Vidya Muthukumar, Kailas Vodrahalli, Vignesh Subramanian, Anant Sahai

    Abstract: A continuing mystery in understanding the empirical success of deep neural networks is their ability to achieve zero training error and generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this overparameterized regime in linear regression, where all solutions that minimize training error interpolate the data, including noise. We char… ▽ More

    Submitted 9 September, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

    Comments: 52 pages, expanded version of the paper presented at ITA in San Diego in Feb 2019, ISIT in Paris in July 2019, at Simons in July, and as a plenary at ITW in Visby in August 2019

  22. arXiv:1811.12569  [pdf, other

    cs.LG cs.CV stat.ML

    Are All Training Examples Created Equal? An Empirical Study

    Authors: Kailas Vodrahalli, Ke Li, Jitendra Malik

    Abstract: Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subs… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: 12 pages, 12 figures

  23. arXiv:1704.05579  [pdf, other

    cs.CL cs.AI cs.LG

    A Large Self-Annotated Corpus for Sarcasm

    Authors: Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

    Abstract: We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is further… ▽ More

    Submitted 22 March, 2018; v1 submitted 18 April, 2017; originally announced April 2017.

    Comments: 6 pages, 4 Figures. To Appear in LREC 2018

  24. arXiv:1610.03914  [pdf, other

    q-bio.NC cs.CL cs.LG

    Mapping Between fMRI Responses to Movies and their Natural Language Annotations

    Authors: Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora

    Abstract: Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock [1], and learn bidirectional mappings between fMRI responses and natural language… ▽ More

    Submitted 10 April, 2017; v1 submitted 12 October, 2016; originally announced October 2016.

    Comments: 19 pages, 9 figures, in submission to NeuroImage. Prior version presented at MLINI-2016 workshop, 2016 (arXiv:1701.01437) and ICML 2016 Workshop on Multi-view Representation Learning