Skip to main content

Showing 1–50 of 133 results for author: Chi, E

  1. arXiv:2406.17038  [pdf, other

    cs.CL

    modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

    Authors: Nathan A. Chi, Teodor Malchev, Riley Kong, Ryan A. Chi, Lucas Huang, Ethan A. Chi, R. Thomas McCoy, Dragomir Radev

    Abstract: We introduce modeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language's grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting s… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.06196  [pdf, other

    cs.CL

    LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

    Authors: Andrew M. Bean, Simi Hellsten, Harry Mayne, Jabez Magomere, Ethan A. Chi, Ryan Chi, Scott A. Hale, Hannah Rose Kirk

    Abstract: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark cover… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, 16 pages supplemental materials

  3. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2405.19706  [pdf, other

    cs.SE cs.CE cs.ET

    Bridging eResearch Infrastructure and Experimental Materials Science Process in the Quantum Data Hub

    Authors: Amarnath Gupta, Shweta Purawat, Subhasis Dasgupta, Pratyush Karmakar, Elaine Chi, Ilkay Altintas

    Abstract: Experimental materials science is experiencing significant growth due to automated experimentation and AI techniques. Integrated autonomous platforms are emerging, combining generative models, robotics, simulations, and automated systems for material synthesis. However, two major challenges remain: democratizing access to these technologies and creating accessible infrastructure for under-resource… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  5. arXiv:2405.16363  [pdf, other

    cs.IR cs.AI

    LLMs for User Interest Exploration in Large-scale Recommendation Systems

    Authors: Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, Minmin Chen

    Abstract: Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2405.12327  [pdf, other

    cs.IR cs.LG

    Diversifying by Intent in Recommender Systems

    Authors: Yuyan Wang, Cheenar Banerjee, Samer Chucri, Fabio Soldo, Sriraj Badam, Ed H. Chi, Minmin Chen

    Abstract: It has become increasingly clear that recommender systems overly focusing on short-term engagement can inadvertently hurt long-term user experience. However, it is challenging to optimize long-term user experience directly as the desired signal is sparse, noisy and manifests over a long horizon. In this work, we show the benefits of incorporating higher-level user understanding, specifically user… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  7. arXiv:2404.00245  [pdf, other

    cs.IR

    Aligning Large Language Models with Recommendation Knowledge

    Authors: Yuwei Cao, Nikhil Mehta, Xinyang Yi, Raghunandan Keshavan, Lukasz Heldt, Lichan Hong, Ed H. Chi, Maheswaran Sathiamoorthy

    Abstract: Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted to the NAACL 2024 Findings

  8. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2402.14035  [pdf, other

    cs.LG cs.AI

    Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model

    Authors: Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

    Abstract: Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient fo… ▽ More

    Submitted 15 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  10. arXiv:2402.11724  [pdf, other

    cs.IR

    Large Language Models as Data Augmenters for Cold-Start Item Recommendation

    Authors: Jianling Wang, Haokai Lu, James Caverlee, Ed Chi, Minmin Chen

    Abstract: The reasoning and generalization capabilities of LLMs can help us better understand user preferences and item characteristics, offering exciting prospects to enhance recommendation systems. Though effective while user-item interactions are abundant, conventional recommendation systems struggle to recommend cold-start items without historical interactions. To address this, we propose utilizing LLMs… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  11. arXiv:2402.09668  [pdf, other

    cs.LG cs.AI cs.CL

    How to Train Data-Efficient LLMs

    Authors: Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed H. Chi, James Caverlee, Julian McAuley, Derek Zhiyuan Cheng

    Abstract: The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximizati… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Under review. 44 pages, 30 figures

  12. arXiv:2402.04644  [pdf, other

    cs.LG cs.AI

    LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

    Authors: Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

    Abstract: Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitation… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  13. arXiv:2402.03620  [pdf, other

    cs.AI cs.CL

    Self-Discover: Large Language Models Self-Compose Reasoning Structures

    Authors: Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

    Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasonin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 17 pages, 11 figures, 5 tables

  14. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  15. arXiv:2312.00763  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

    Authors: Xiao Ma, Swaroop Mishra, Ariel Liu, Sophie Su, Jilin Chen, Chinmay Kulkarni, Heng-Tze Cheng, Quoc Le, Ed Chi

    Abstract: Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational "scent", or ability to specify high-level preferences or goals. We i… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 19 pages, 11 figures

  16. arXiv:2312.00163  [pdf, other

    cs.CR cs.NI

    Just add WATER: WebAssembly-based Circumvention Transports

    Authors: Erik Chi, Gaukas Wang, J. Alex Halderman, Eric Wustrow, Jack Wampler

    Abstract: As Internet censors rapidly evolve new blocking techniques, circumvention tools must also adapt and roll out new strategies to remain unblocked. But new strategies can be time consuming for circumventors to develop and deploy, and usually an update to one tool often requires significant additional effort to be ported to others. Moreover, distributing the updated application across different platfo… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: FOCI 2024

    Journal ref: FOCI issue 1 (2024) 22-28

  17. arXiv:2311.05884  [pdf, other

    cs.IR cs.LG

    Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

    Authors: Huan Gui, Ruoxi Wang, Ke Yin, Long Jin, Maciej Kula, Taibai Xu, Lichan Hong, Ed H. Chi

    Abstract: Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with atten… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  18. arXiv:2310.06117  [pdf, other

    cs.LG cs.AI cs.CL

    Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, Denny Zhou

    Abstract: We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  19. arXiv:2310.03188  [pdf, other

    cs.AI

    Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

    Authors: Zhe Zhao, Qingyun Liu, Huan Gui, Bang An, Lichan Hong, Ed H. Chi

    Abstract: Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the state-of-the-art in many applications. However, it is still an open question of how to use these models to perform downstream tasks efficiently. Knowledge distillation (KD) h… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  20. arXiv:2310.01714  [pdf, other

    cs.LG

    Large Language Models as Analogical Reasoners

    Authors: Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou

    Abstract: Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which human… ▽ More

    Submitted 9 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  21. arXiv:2309.13733  [pdf, other

    stat.ML cs.LG stat.CO

    Towards Tuning-Free Minimum-Volume Nonnegative Matrix Factorization

    Authors: Duc Toan Nguyen, Eric C. Chi

    Abstract: Nonnegative Matrix Factorization (NMF) is a versatile and powerful tool for discovering latent structures in data matrices, with many variations proposed in the literature. Recently, Leplat et al.\@ (2019) introduced a minimum-volume NMF for the identifiable recovery of rank-deficient matrices in the presence of noise. The performance of their formulation, however, requires the selection of a tuni… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  22. arXiv:2308.01563  [pdf, other

    cs.IR

    Density Weighting for Multi-Interest Personalized Recommendation

    Authors: Nikhil Mehta, Anima Singh, Xinyang Yi, Sagar Jain, Lichan Hong, Ed H. Chi

    Abstract: Using multiple user representations (MUR) to model user behavior instead of a single user representation (SUR) has been shown to improve personalization in recommendation systems. However, the performance gains observed with MUR can be sensitive to the skewness in the item and/or user interest distribution. When the data distribution is highly skewed, the gains observed by learning multiple repres… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  23. arXiv:2307.15893  [pdf, other

    cs.LG

    Online Matching: A Real-time Bandit System for Large-scale Recommendations

    Authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran, Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi

    Abstract: The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution sh… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: RecSys 2023

  24. arXiv:2306.08121  [pdf, other

    cs.IR cs.LG

    Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

    Authors: Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, Ed H. Chi, Xinyang Yi

    Abstract: Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for ra… ▽ More

    Submitted 30 May, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  25. arXiv:2306.01720  [pdf, other

    cs.IR

    Fresh Content Needs More Attention: Multi-funnel Fresh Content Recommendation

    Authors: Jianling Wang, Haokai Lu, Sai zhang, Bart Locanthi, Haoting Wang, Dylan Greaves, Benjamin Lipshitz, Sriraj Badam, Ed H. Chi, Cristos Goodrow, Su-Lin Wu, Lexi Baugher, Minmin Chen

    Abstract: Recommendation system serves as a conduit connecting users to an incredibly large, diverse and ever growing collection of contents. In practice, missing information on fresh (and tail) contents needs to be filled in order for them to be exposed and discovered by their audience. We here share our success stories in building a dedicated fresh content recommendation stack on a large commercial platfo… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by KDD 2023

  26. arXiv:2306.01476  [pdf, other

    cs.IR cs.LG

    Hierarchical Reinforcement Learning for Modeling User Novelty-Seeking Intent in Recommender Systems

    Authors: Pan Li, Yuyan Wang, Ed H. Chi, Minmin Chen

    Abstract: Recommending novel content, which expands user horizons by introducing them to new interests, has been shown to improve users' long-term experience on recommendation platforms \cite{chen2021values}. Users however are not constantly looking to explore novel content. It is therefore crucial to understand their novelty-seeking intent and adjust the recommendation policy accordingly. Most existing lit… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  27. arXiv:2306.01475  [pdf, other

    cs.IR cs.LG

    Prompt Tuning Large Language Models on Personalized Aspect Extraction for Recommendations

    Authors: Pan Li, Yuyan Wang, Ed H. Chi, Minmin Chen

    Abstract: Existing aspect extraction methods mostly rely on explicit or ground truth aspect information, or using data mining or machine learning approaches to extract aspects from implicit user feedback such as user reviews. It however remains under-explored how the extracted aspects can help generate more meaningful recommendations to the users. Meanwhile, existing research on aspect-based recommendations… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  28. arXiv:2305.17386  [pdf, other

    cs.IR cs.LG

    HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer

    Authors: Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, Derek Zhiyuan Cheng

    Abstract: Learning expressive representations for high-dimensional yet sparse features has been a longstanding problem in information retrieval. Though recent deep learning methods can partially solve the problem, they often fail to handle the numerous sparse features, particularly those tail feature values with infrequent occurrences in the training data. Worse still, existing methods cannot explicitly lev… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by SIGIR 2023

  29. arXiv:2305.15498  [pdf, other

    cs.CL cs.AI cs.IR

    Large Language Models for User Interest Journeys

    Authors: Konstantina Christakopoulou, Alberto Lalama, Cj Adams, Iris Qu, Yifat Amir, Samer Chucri, Pierce Vollucci, Fabio Soldo, Dina Bseiso, Sarah Scodel, Lucas Dixon, Ed H. Chi, Minmin Chen

    Abstract: Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation. Their potential for deeper user understanding and improved personalized user experience on recommendation platforms is, however, largely untapped. This paper aims to address this gap. Recommender systems today capture users' interests through encoding their historical activities on the… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  30. arXiv:2305.13535  [pdf, other

    cs.CL cs.LG

    Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

    Authors: Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

    Abstract: Counterfactual Data Augmentation (CDA) is a commonly used technique for improving robustness in natural language classifiers. However, one fundamental challenge is how to discover meaningful counterfactuals and efficiently label them, with minimal human labeling cost. Most existing methods either completely rely on human-annotated labels, an expensive process which limits the scale of counterfactu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  31. arXiv:2305.12102  [pdf, other

    cs.LG cs.IR

    Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

    Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng

    Abstract: Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely h… ▽ More

    Submitted 14 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: NeurIPS'23 Spotlight

    Journal ref: Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023) 56234-56255

  32. arXiv:2305.07764  [pdf, other

    cs.IR

    Long-Term Value of Exploration: Measurements, Findings and Algorithms

    Authors: Yi Su, Xiangyu Wang, Elaine Ya Le, Liang Liu, Yuening Li, Haokai Lu, Benjamin Lipshitz, Sriraj Badam, Lukasz Heldt, Shuchao Bi, Ed Chi, Cristos Goodrow, Su-Lin Wu, Lexi Baugher, Minmin Chen

    Abstract: Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term valu… ▽ More

    Submitted 25 February, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 11 pages, WSDM 2024

  33. arXiv:2305.06474  [pdf, other

    cs.IR cs.LG

    Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

    Authors: Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed Chi, Derek Zhiyuan Cheng

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior remains an emerging and still unclear research question. Traditionally, Collaborative Filtering (CF) has been the most effective method for these tasks, predominantl… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  34. arXiv:2305.05065  [pdf, other

    cs.IR cs.LG

    Recommender Systems with Generative Retrieval

    Authors: Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy

    Abstract: Modern recommender systems perform large-scale retrieval by first embedding queries and item candidates in the same unified space, followed by approximate nearest neighbor search to select top candidates given a query embedding. In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end,… ▽ More

    Submitted 3 November, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: To appear in The 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  35. arXiv:2304.13940  [pdf, other

    stat.ML cs.LG

    A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion

    Authors: Xiaoqian Liu, Xu Han, Eric C. Chi, Boaz Nadler

    Abstract: In 1-bit matrix completion, the aim is to estimate an underlying low-rank matrix from a partial set of binary observations. We propose a novel method for 1-bit matrix completion called MMGN. Our method is based on the majorization-minimization (MM) principle, which converts the original optimization problem into a sequence of standard low-rank matrix completion problems. We solve each of these sub… ▽ More

    Submitted 22 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 28 pages, 7 figures

  36. arXiv:2302.11188  [pdf, other

    cs.LG

    What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel

    Authors: Yao Qin, Xuezhi Wang, Balaji Lakshminarayanan, Ed H. Chi, Alex Beutel

    Abstract: A wide breadth of research has devised data augmentation approaches that can improve both accuracy and generalization performance for neural networks. However, augmented data can end up being far from the clean training data and what is the appropriate label is less clear. Despite this, most existing work simply uses one-hot labels for augmented data. In this paper, we show re-using one-hot labels… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted to SaTML-2023

  37. Improving Training Stability for Multitask Ranking Models in Recommender Systems

    Authors: Jiaxi Tang, Yoel Drori, Daryl Chang, Maheswaran Sathiamoorthy, Justin Gilmer, Li Wei, Xinyang Yi, Lichan Hong, Ed H. Chi

    Abstract: Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss… ▽ More

    Submitted 15 June, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted at KDD 2023; 12 pages

  38. arXiv:2302.00093  [pdf, other

    cs.CL cs.AI

    Large Language Models Can Be Easily Distracted by Irrelevant Context

    Authors: Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou

    Abstract: Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant c… ▽ More

    Submitted 6 June, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: Published in ICML 2023

  39. Latent User Intent Modeling for Sequential Recommenders

    Authors: Bo Chang, Alexandros Karatzoglou, Yuyan Wang, Can Xu, Ed H. Chi, Minmin Chen

    Abstract: Sequential recommender models are essential components of modern industrial recommender systems. These models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for unde… ▽ More

    Submitted 27 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: The Web Conference 2023, Industry Track

  40. arXiv:2210.14309  [pdf, other

    cs.IR

    Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)

    Authors: Yin Zhang, Ruoxi Wang, Tiansheng Yao, Xinyang Yi, Lichan Hong, James Caverlee, Ed H. Chi, Derek Zhiyuan Cheng

    Abstract: Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most of the user feedback. This skew hurts recommender quality especially for the item slices without much user feedback. While there have been many research advances made in academia, deploying these methods in production is very difficult and very few improveme… ▽ More

    Submitted 3 September, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted by KDD 2023 Applied Data Science (ADS) track

  41. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  42. arXiv:2210.09261  [pdf, other

    cs.CL cs.AI

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

    Authors: Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

    Abstract: BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: GitHub repository: https://github.com/suzgunmirac/BIG-Bench-Hard

  43. arXiv:2210.07755  [pdf, other

    cs.IR cs.AI cs.LG

    Simpson's Paradox in Recommender Fairness: Reconciling differences between per-user and aggregated evaluations

    Authors: Flavien Prost, Ben Packer, Jilin Chen, Li Wei, Pierre Kremp, Nicholas Blumm, Susan Wang, Tulsee Doshi, Tonia Osadebe, Lukasz Heldt, Ed H. Chi, Alex Beutel

    Abstract: There has been a flurry of research in recent years on notions of fairness in ranking and recommender systems, particularly on how to evaluate if a recommender allocates exposure equally across groups of relevant items (also known as provider fairness). While this research has laid an important foundation, it gave rise to different approaches depending on whether relevant items are compared per-us… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  44. arXiv:2209.15166  [pdf, other

    cs.IR cs.AI cs.LG

    Reward Shaping for User Satisfaction in a REINFORCE Recommender

    Authors: Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo Dixon, Ed H. Chi, Minmin Chen

    Abstract: How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explici… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted in Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 38th International Conference on Machine Learning, 2021

  45. arXiv:2208.11806  [pdf, other

    stat.ME stat.AP

    Robust Low-rank Tensor Decomposition with the $\operatorname{L_2}$ Criterion

    Authors: Qiang Heng, Eric C. Chi, Yufeng Liu

    Abstract: The growing prevalence of tensor data, or multiway arrays, in science and engineering applications motivates the need for tensor decompositions that are robust against outliers. In this paper, we present a robust Tucker decomposition estimator based on the $\operatorname{L_2}$ criterion, called the Tucker-$\operatorname{L_2E}$. Our numerical experiments demonstrate that Tucker-… ▽ More

    Submitted 12 April, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

  46. arXiv:2207.12021  [pdf, other

    cs.CL

    Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

    Authors: Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D. Manning

    Abstract: We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: SIGDIAL '22

  47. arXiv:2207.00747  [pdf, other

    cs.CL

    Rationale-Augmented Ensembles in Language Models

    Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

    Abstract: Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks. We reconsider rationale-augmented prompting for few-shot in-context learning, where (input -> output) prompts are expanded to (input, rationale -> output) prompts. For rationale-augmented prompting we demonstrate how existing approaches, which rely on manu… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  48. arXiv:2206.07682  [pdf, other

    cs.CL

    Emergent Abilities of Large Language Models

    Authors: Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

    Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Transactions on Machine Learning Research (TMLR), 2022

  49. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  50. arXiv:2205.10625  [pdf, other

    cs.AI cs.CL

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

    Authors: Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

    Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to brea… ▽ More

    Submitted 16 April, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: ICLR 2023