Skip to main content

Showing 1–45 of 45 results for author: Gehrmann, S

  1. arXiv:2404.01701  [pdf, other

    cs.CL

    On the Role of Summary Content Units in Text Summarization Evaluation

    Authors: Marcel Nawrath, Agnieszka Nowak, Tristan Ratz, Danilo C. Walenta, Juri Opitz, Leonardo F. R. Ribeiro, João Sedoc, Daniel Deutsch, Simon Mille, Yixin Liu, Lining Zhang, Sebastian Gehrmann, Saad Mahamood, Miruna Clinciu, Khyathi Chandu, Yufang Hou

    Abstract: At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs). These SCUs are concise sentences that decompose a summary into small facts. Such SCUs can be used to judge the quality of a candidate summary, possibly partially automated via natural language inference (NLI) systems. Interestingly, with the aim to fully automate the Pyramid evaluat… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 Pages, 3 Figures, 3 Tables, camera ready version accepted at NAACL 2024

  2. arXiv:2402.06619  [pdf, other

    cs.CL cs.AI

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

    Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  3. arXiv:2306.16793  [pdf, other

    cs.CL

    Benchmarking Large Language Model Capabilities for Conditional Generation

    Authors: Joshua Maynez, Priyanka Agrawal, Sebastian Gehrmann

    Abstract: Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classifica… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  4. arXiv:2305.13194  [pdf, other

    cs.CL

    SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

    Authors: Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P. Parikh

    Abstract: Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensi… ▽ More

    Submitted 1 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  5. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2303.17564  [pdf, other

    cs.LG cs.AI cs.CL q-fin.GN

    BloombergGPT: A Large Language Model for Finance

    Authors: Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann

    Abstract: The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion pa… ▽ More

    Submitted 21 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Updated to include Training Chronicles (Appendix C)

  7. arXiv:2212.10397  [pdf, other

    cs.CL

    Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

    Authors: Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

    Abstract: To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization. Thus, we investigate the recruitment of high-quality Amazon Mechanical Turk workers via a two-step pipeline. We show that we can successfully filter out subpar worke… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  8. arXiv:2211.09070  [pdf, other

    cs.CL

    Towards Computationally Verifiable Semantic Grounding for Language Models

    Authors: Chris Alberti, Kuzman Ganchev, Michael Collins, Sebastian Gehrmann, Ciprian Chelba

    Abstract: The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseli… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  9. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  10. Intriguing Properties of Compression on Multilingual Models

    Authors: Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer

    Abstract: Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingu… ▽ More

    Submitted 25 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: Accepted to EMNLP 2022

  11. arXiv:2211.00922  [pdf, other

    cs.CL

    Dialect-robust Evaluation of Generated Text

    Authors: Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

    Abstract: Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  12. arXiv:2211.00142  [pdf, other

    cs.CL cs.LG

    TaTa: A Multilingual Table-to-Text Dataset for African Languages

    Authors: Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur Parikh, Clara Rivera

    Abstract: Existing data-to-text generation datasets are mostly limited to English. To address this lack of data, we create Table-to-Text in African languages (TaTa), the first large multilingual table-to-text dataset with a focus on African languages. We created TaTa by transcribing figures and accompanying text in bilingual reports by the Demographic and Health Surveys Program, followed by professional tra… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: 24 pages, 6 figures

  13. arXiv:2210.09261  [pdf, other

    cs.CL cs.AI

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

    Authors: Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

    Abstract: BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: GitHub repository: https://github.com/suzgunmirac/BIG-Bench-Hard

  14. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  15. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  16. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  17. arXiv:2202.06935  [pdf, other

    cs.CL cs.AI cs.LG

    Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

    Authors: Sebastian Gehrmann, Elizabeth Clark, Thibault Sellam

    Abstract: Evaluation practices in natural language generation (NLG) have many known flaws, but improved evaluation approaches are rarely widely adopted. This issue has become more urgent, since neural NLG models have improved to the point where they can often no longer be distinguished based on the surface-level features that older metrics rely on. This paper surveys the issues with human and automatic mode… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  18. Diagnosing AI Explanation Methods with Folk Concepts of Behavior

    Authors: Alon Jacovi, Jasmijn Bastings, Sebastian Gehrmann, Yoav Goldberg, Katja Filippova

    Abstract: We investigate a formalism for the conditions of a successful explanation of AI. We consider "success" to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us wi… ▽ More

    Submitted 15 November, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

    Comments: Accepted to JAIR (Vol. 78, 2023)

    Journal ref: Journal of Artificial Intelligence Research 73 (2023) 459-489

  19. arXiv:2112.02721  [pdf, other

    cs.CL cs.AI cs.LG

    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

    Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

    Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

  20. arXiv:2111.06467  [pdf, other

    cs.CL cs.AI cs.LG

    SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

    Authors: Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann

    Abstract: NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this wor… ▽ More

    Submitted 12 January, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: 10 pages, 2 figures, accepted to NeurIPS 2021 Datasets and Benchmarks Track

  21. arXiv:2111.01582  [pdf, other

    cs.CL cs.HC

    LMdiff: A Visual Diff Tool to Compare Language Models

    Authors: Hendrik Strobelt, Benjamin Hoover, Arvind Satyanarayan, Sebastian Gehrmann

    Abstract: While different language models are ubiquitous in NLP, it is hard to contrast their outputs and identify which contexts one can handle better than the other. To address this question, we introduce LMdiff, a tool that visually compares probability distributions of two models that differ, e.g., through finetuning, distillation, or simply training with different parameter sizes. LMdiff allows the gen… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: EMNLP 2021 Demo Paper

  22. arXiv:2110.06341  [pdf, other

    cs.CL

    Learning Compact Metrics for MT

    Authors: Amy Pu, Hyung Won Chung, Ankur P. Parikh, Sebastian Gehrmann, Thibault Sellam

    Abstract: Recent developments in machine translation and multilingual text generation have led researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation as a regression problem and use representations from multilingual pre-trained models such as XLM-RoBERTa or mBERT. Yet studies on related tasks suggest that these models are most efficient when they are large, which is costly and… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted at EMNLP 2021

  23. Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards

    Authors: Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite

    Abstract: Developing documentation guidelines and easy-to-use templates for datasets and models is a challenging task, especially given the variety of backgrounds, skills, and incentives of the people involved in the building of natural language processing (NLP) tools. Nevertheless, the adoption of standard documentation practices across the field of NLP promotes more accessible and detailed descriptions of… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: 15 pages; in Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

  24. arXiv:2106.09069  [pdf, other

    cs.CL cs.LG

    Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

    Authors: Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

    Abstract: Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepres… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  25. arXiv:2106.06087  [pdf, other

    cs.CL

    Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

    Authors: Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov

    Abstract: Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether ne… ▽ More

    Submitted 22 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL-IJCNLP 2021

    MSC Class: 68T50 ACM Class: I.2.7

  26. arXiv:2102.01672  [pdf, other

    cs.CL cs.AI cs.LG

    The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

    Authors: Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak , et al. (31 additional authors not shown)

    Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it… ▽ More

    Submitted 1 April, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  27. arXiv:2010.04297  [pdf, other

    cs.CL

    Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

    Authors: Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh

    Abstract: The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published metric based on transfer learn… ▽ More

    Submitted 19 October, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

  28. arXiv:2008.05122  [pdf, other

    cs.CL

    The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

    Authors: Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan

    Abstract: We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamline… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

  29. arXiv:2005.11248  [pdf, other

    cs.LG q-bio.QM stat.ML

    Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics

    Authors: Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, Aleksandra Mojsilovic

    Abstract: De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e.g., high broad-spectrum potency and low toxicity. We propose CLaSS (Controlled Latent attribute Space Sampling) - an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled u… ▽ More

    Submitted 25 February, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

    Journal ref: Nature Biomedical Engineering (2021)

  30. arXiv:2004.14373  [pdf, other

    cs.CL cs.LG

    ToTTo: A Controlled Table-To-Text Generation Dataset

    Authors: Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

    Abstract: We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revis… ▽ More

    Submitted 6 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted to EMNLP 2020

  31. arXiv:2004.12265  [pdf, other

    cs.CL

    Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

    Authors: Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber

    Abstract: Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output thr… ▽ More

    Submitted 22 November, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: Expanded version

    MSC Class: 68T50 ACM Class: I.2.7

  32. arXiv:2003.03044  [pdf, other

    cs.CL cs.CY cs.LG

    A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted Patients

    Authors: Edward T. Moseley, Joy T. Wu, Jonathan Welt, John Foote, Patrick D. Tyler, David W. Grant, Eric T. Carlson, Sebastian Gehrmann, Franck Dernoncourt, Leo Anthony Celi

    Abstract: A crucial step within secondary analysis of electronic health records (EHRs) is to identify the patient cohort under investigation. While EHRs contain medical billing codes that aim to represent the conditions and treatments patients may have, much of the information is only present in the patient notes. Therefore, it is critical to develop robust algorithms to infer patients' conditions and treat… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted at LREC 2020

  33. arXiv:1911.03329  [pdf, other

    cs.CL cs.LG cs.NE

    Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages

    Authors: Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

    Abstract: We introduce three memory-augmented Recurrent Neural Networks (MARNNs) and explore their capabilities on a series of simple language modeling tasks whose solutions require stack-based mechanisms. We provide the first demonstration of neural networks recognizing the generalized Dyck languages, which express the core of what it means to be a language with hierarchical structure. Our memory-augmented… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

  34. arXiv:1910.05276  [pdf, other

    cs.CL cs.LG

    exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models

    Authors: Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann

    Abstract: Large language models can produce powerful contextual representations that lead to improvements across many NLP tasks. Since these models are typically guided by a sequence of learned self attention mechanisms and may comprise undesired inductive biases, it is paramount to be able to explore what the attention has learned. While static analyses of these models lead to targeted insights, interactiv… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  35. arXiv:1908.06938  [pdf, other

    cs.CL

    Encoder-Agnostic Adaptation for Conditional Language Generation

    Authors: Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush

    Abstract: Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks. However it is an open-question how to use similar techniques for language generation. Early results in the encoder-agnostic setting have been mostly negative. In this work… ▽ More

    Submitted 10 September, 2019; v1 submitted 19 August, 2019; originally announced August 2019.

  36. arXiv:1907.10739  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Visual Interaction with Deep Learning Models through Collaborative Semantic Inference

    Authors: Sebastian Gehrmann, Hendrik Strobelt, Robert Krüger, Hanspeter Pfister, Alexander M. Rush

    Abstract: Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inferen… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: IEEE VIS 2019 (VAST)

  37. Evaluating an Automated Mediator for Joint Narratives in a Conflict Situation

    Authors: Massimo Zancanaro, Oliviero Stock, Gianluca Schiavo, Alessandro Cappelletti, Sebastian Gehrmann, Daphna Canetti, Ohad Shaked, Shani Fachter, Rachel Yifat, Ravit Mimran, Patrice L., Weiss

    Abstract: Joint narratives are often used in the context of reconciliation interventions for people in social conflict situations, which arise, for example, due to ethnic or religious differences. The interventions aim to encourage a change in attitudes of the participants towards each other. Typically, a human mediator is fundamental for achieving a successful intervention. In this work, we present an auto… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Journal ref: https://www.tandfonline.com/eprint/5HVSAEZ7NW2AGUSHBNA3/full?target=10.1080/0144929X.2019.1637940

  38. arXiv:1906.04043  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    GLTR: Statistical Detection and Visualization of Generated Text

    Authors: Sebastian Gehrmann, Hendrik Strobelt, Alexander M. Rush

    Abstract: The rapid improvement of language models has raised the specter of abuse of text generation systems. This progress motivates the development of simple methods for detecting generated text that can be used by and explained to non-experts. We develop GLTR, a tool to support humans in detecting whether a text was generated by a model. GLTR applies a suite of baseline statistical methods that can dete… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Demo Track

  39. arXiv:1906.03648  [pdf, other

    cs.CL cs.FL cs.LG

    LSTM Networks Can Perform Dynamic Counting

    Authors: Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

    Abstract: In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be small-sized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time. Our results demonstrate that the Long Short-Term… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Workshop on Deep Learning and Formal Languages

    ACM Class: F.4.3; I.2.6; I.2.7

  40. arXiv:1904.07142  [pdf, other

    cs.CL

    Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

    Authors: Sebastian Gehrmann, Steven Layne, Franck Dernoncourt

    Abstract: Titles of short sections within long documents support readers by guiding their focus towards relevant passages and by providing anchor-points that help to understand the progression of the document. The positive effects of section titles are even more pronounced when measured on readers with less developed reading abilities, for example in communities with limited labeled text resources. We, th… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  41. arXiv:1810.04700  [pdf, other

    cs.CL cs.AI

    End-to-End Content and Plan Selection for Data-to-Text Generation

    Authors: Sebastian Gehrmann, Falcon Z. Dai, Henry Elder, Alexander M. Rush

    Abstract: Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and c… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

    Comments: INLG 2018

  42. arXiv:1808.10792  [pdf, other

    cs.CL cs.AI cs.LG

    Bottom-Up Abstractive Summarization

    Authors: Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush

    Abstract: Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step… ▽ More

    Submitted 8 October, 2018; v1 submitted 31 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  43. arXiv:1804.09299  [pdf, other

    cs.CL cs.AI cs.NE

    Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

    Authors: Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush

    Abstract: Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning meth… ▽ More

    Submitted 16 October, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: VAST - IEEE VIS 2018

  44. arXiv:1703.08705  [pdf

    cs.CL cs.AI cs.NE stat.ML

    Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

    Authors: Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi

    Abstract: Objective: We investigate whether deep learning techniques for natural language processing (NLP) can be used efficiently for patient phenotyping. Patient phenotyping is a classification task for determining whether a patient has a medical condition, and is a crucial part of secondary analysis of healthcare data. We assess the performance of deep learning algorithms and compare them with classical… ▽ More

    Submitted 25 March, 2017; originally announced March 2017.

  45. arXiv:1606.07461  [pdf, other

    cs.CL cs.AI cs.NE

    LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

    Authors: Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush

    Abstract: Recurrent neural networks, and in particular long short-term memory (LSTM) networks, are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also signifi… ▽ More

    Submitted 30 October, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

    Comments: InfoVis 2017