Skip to main content

Showing 1–9 of 9 results for author: Scharli, N

  1. arXiv:2304.05128  [pdf, other

    cs.CL cs.AI

    Teaching Large Language Models to Self-Debug

    Authors: Xinyun Chen, Maxwell Lin, Nathanael Schärli, Denny Zhou

    Abstract: Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose Self-Debugging, which teaches a large language model to debug its predicted program… ▽ More

    Submitted 5 October, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

  2. arXiv:2302.00093  [pdf, other

    cs.CL cs.AI

    Large Language Models Can Be Easily Distracted by Irrelevant Context

    Authors: Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou

    Abstract: Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant c… ▽ More

    Submitted 6 June, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: Published in ICML 2023

  3. arXiv:2212.13138  [pdf, other

    cs.CL

    Large Language Models Encode Clinical Knowledge

    Authors: Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Nathaneal Scharli, Aakanksha Chowdhery, Philip Mansfield, Blaise Aguera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu , et al. (5 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To a… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

  4. arXiv:2210.09261  [pdf, other

    cs.CL cs.AI

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

    Authors: Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

    Abstract: BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: GitHub repository: https://github.com/suzgunmirac/BIG-Bench-Hard

  5. arXiv:2209.15003  [pdf, other

    cs.CL cs.AI

    Compositional Semantic Parsing with Large Language Models

    Authors: Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, Denny Zhou

    Abstract: Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. O… ▽ More

    Submitted 29 September, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Fixed metadata. No other changes

  6. arXiv:2205.10625  [pdf, other

    cs.AI cs.CL

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

    Authors: Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

    Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to brea… ▽ More

    Submitted 16 April, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: ICLR 2023

  7. arXiv:2012.08266  [pdf, other

    cs.LG cs.CL

    *-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task

    Authors: Dmitry Tsarkov, Tibor Tihon, Nathan Scales, Nikola Momchev, Danila Sinopalnikov, Nathanael Schärli

    Abstract: We present *-CFQ ("star-CFQ"): a suite of large-scale datasets of varying scope based on the CFQ semantic parsing benchmark, designed for principled investigation of the scalability of machine learning systems in a realistic compositional task setting. Using this suite, we conduct a series of experiments investigating the ability of Transformers to benefit from increased training size under condit… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted, AAAI-21

  8. arXiv:2007.08970  [pdf, other

    cs.CL cs.LG

    Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures

    Authors: Daniel Furrer, Marc van Zee, Nathan Scales, Nathanael Schärli

    Abstract: While mainstream machine learning methods are known to have limited ability to compositionally generalize, new architectures and techniques continue to be proposed to address this limitation. We investigate state-of-the-art techniques and architectures in order to assess their effectiveness in improving compositional generalization in semantic parsing tasks based on the SCAN and CFQ datasets. We s… ▽ More

    Submitted 22 September, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

  9. arXiv:1912.09713  [pdf, other

    cs.LG cs.CL stat.ML

    Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

    Authors: Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

    Abstract: State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence… ▽ More

    Submitted 25 June, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: Accepted for publication at ICLR 2020