Skip to main content

Showing 1–29 of 29 results for author: Roller, S

  1. arXiv:2308.08169  [pdf, other

    cs.CL cs.AI

    Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

    Authors: Jianguo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD systems with more flexibility through a simple cache. The cache provides the flexibility to dynamically update the TOD systems and handle both existing and unseen… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by SIGDIAL 2023 as a long paper

  2. arXiv:2307.14117  [pdf, other

    cs.CL

    Leveraging Implicit Feedback from Deployment Data in Dialogue

    Authors: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

    Abstract: We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployme… ▽ More

    Submitted 31 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: EACL 2024

  3. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  4. arXiv:2301.03728  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Laws for Generative Mixed-Modal Language Models

    Authors: Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

    Abstract: Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modaliti… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  5. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  6. arXiv:2205.01068  [pdf, other

    cs.CL cs.LG

    OPT: Open Pre-trained Transformer Language Models

    Authors: Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

    Abstract: Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open… ▽ More

    Submitted 21 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  7. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  8. arXiv:2201.04723  [pdf, other

    cs.CL cs.AI

    Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

    Authors: Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston

    Abstract: At the heart of improving conversational AI is the open problem of how to evaluate conversations. Issues with automatic metrics are well known (Liu et al., 2016, arXiv:1603.08023), with human evaluations still considered the gold standard. Unfortunately, how to perform human evaluations is also an open problem: differing data collection methods have varying levels of human agreement and statistica… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  9. arXiv:2110.06905  [pdf, other

    cs.CL

    Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue

    Authors: Moya Chen, Paul A. Crook, Stephen Roller

    Abstract: We demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals. We show these simulations can formulate online, automatic metrics that correlate well with human evaluations. Furthermore, by checking for whether the User's goals are met, we can use simulation to repeatedly generate training data a… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  10. arXiv:2106.04426  [pdf, other

    cs.LG cs.CL

    Hash Layers For Large Sparse Models

    Authors: Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

    Abstract: We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert meth… ▽ More

    Submitted 20 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  11. arXiv:2106.04279  [pdf, other

    cs.LG cs.CL

    Staircase Attention for Recurrent Processing of Sequences

    Authors: Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  12. arXiv:2105.06548  [pdf, other

    cs.LG cs.AI

    Not All Memories are Created Equal: Learning to Forget by Expiring

    Authors: Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

    Abstract: Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant info… ▽ More

    Submitted 13 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

  13. arXiv:2103.01869  [pdf, other

    cs.DC

    Parallel Machine Learning of Partial Differential Equations

    Authors: Amin Totounferoush, Neda Ebrahimi Pour, Sabine Roller, Miriam Mehl

    Abstract: In this work, we present a parallel scheme for machine learning of partial differential equations. The scheme is based on the decomposition of the training data corresponding to spatial subdomains, where an individual neural network is assigned to each data subset. Message Passing Interface (MPI) is used for parallelization and data communication. We use convolutional neural network layers (CNN) t… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: Submitted to PDSEC workshop, IPDPS conference 2021. We will replace with the final version as soon as we have the DOI

  14. arXiv:2010.12757  [pdf, other

    cs.CL

    Adding Chit-Chat to Enhance Task-Oriented Dialogues

    Authors: Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

    Abstract: Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtu… ▽ More

    Submitted 1 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: To appear in NAACL-HLT 2021

  15. arXiv:2006.12442  [pdf, other

    cs.CL cs.AI

    Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

    Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

    Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of cont… ▽ More

    Submitted 13 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  16. arXiv:2004.13637  [pdf, other

    cs.CL cs.AI

    Recipes for building an open-domain chatbot

    Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

    Abstract: Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a… ▽ More

    Submitted 30 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  17. arXiv:1911.03860  [pdf, other

    cs.CL

    Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

    Authors: Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston

    Abstract: Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addre… ▽ More

    Submitted 6 May, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  18. arXiv:1911.03768  [pdf, other

    cs.CL cs.AI

    The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

    Authors: Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston

    Abstract: We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producin… ▽ More

    Submitted 28 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: ACL 2020

  19. arXiv:1909.03087  [pdf, other

    cs.CL

    ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

    Authors: Margaret Li, Jason Weston, Stephen Roller

    Abstract: While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution. Evaluation difficulties are actually two-fold: not only do automatic metrics not correlate well with human judgments, but also human judgments themselves are in fact difficult to measure. The two most… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  20. arXiv:1908.04319  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Text Generation with Unlikelihood Training

    Authors: Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

    Abstract: Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the mode… ▽ More

    Submitted 26 September, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Sean Welleck and Ilia Kulikov contributed equally

  21. arXiv:1902.08654  [pdf, other

    cs.CL

    What makes a good conversation? How controllable attributes affect human judgments

    Authors: Abigail See, Stephen Roller, Douwe Kiela, Jason Weston

    Abstract: A good conversation requires balance -- between simplicity and detail; staying on topic and changing it; asking questions and answering them. Although dialogue agents are commonly evaluated via human judgments of overall quality, the relationship between quality and these individual factors is less well-studied. In this work, we examine two controllable neural text generation methods, conditional… ▽ More

    Submitted 10 April, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

    Comments: Accepted to NAACL 2019

  22. arXiv:1902.00913  [pdf, other

    cs.CL

    Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

    Authors: Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel

    Abstract: We consider the task of inferring is-a relationships from large text corpora. For this purpose, we propose a new method combining hyperbolic embeddings and Hearst patterns. This approach allows us to set appropriate constraints for inferring concept hierarchies from distributional contexts while also being able to predict missing is-a relationships and to correct wrong extractions. Moreover -- and… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

  23. arXiv:1811.01241  [pdf, other

    cs.CL

    Wizard of Wikipedia: Knowledge-Powered Conversational agents

    Authors: Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

    Abstract: In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically "generate and hope" generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of kno… ▽ More

    Submitted 21 February, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

  24. arXiv:1806.03191  [pdf, other

    cs.CL

    Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

    Authors: Stephen Roller, Douwe Kiela, Maximilian Nickel

    Abstract: Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods. In this paper, we study the performance of both approaches on several hypernymy tasks and find that simple pattern-based methods consistently outperform distributional methods on common benchmark datasets. Our results show that pattern-based models provide im… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: Accepted as a short paper to ACL 2018

  25. arXiv:1704.04550  [pdf, ps, other

    cs.CL

    Distributional Modeling on a Diet: One-shot Word Learning from Text Only

    Authors: Su Wang, Stephen Roller, Katrin Erk

    Abstract: We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overarching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative. Our experiments show that our model can learn properties from a… ▽ More

    Submitted 13 October, 2017; v1 submitted 14 April, 2017; originally announced April 2017.

    Comments: The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)

  26. arXiv:1605.05433  [pdf, other

    cs.CL cs.AI

    Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

    Authors: Stephen Roller, Katrin Erk

    Abstract: We consider the task of predicting lexical entailment using distributional vectors. We perform a novel qualitative analysis of one existing model which was previously shown to only measure the prototypicality of word pairs. We find that the model strongly learns to identify hypernyms using Hearst patterns, which are well known to be predictive of lexical relations. We present a novel model which e… ▽ More

    Submitted 23 September, 2016; v1 submitted 18 May, 2016; originally announced May 2016.

    Comments: EMNLP 2016

  27. arXiv:1603.00968  [pdf, other

    cs.CL

    MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

    Authors: Ye Zhang, Stephen Roller, Byron Wallace

    Abstract: We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization s… ▽ More

    Submitted 26 March, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: This paper got accepted by NAACL 2016

  28. arXiv:1512.02504  [pdf, other

    physics.flu-dyn cs.CE

    Transitional flow in intracranial aneurysms - a space and time refinement study below the Kolmogorov scales using Lattice Boltzmann Method

    Authors: Kartik Jain, Sabine Roller, Kent-Andre Mardal

    Abstract: Most Computational Fluid Dynamics (CFD) studies of hemodynamics in intracranial aneurysms are based on the assumption of laminar flow due to a relatively low (below 500) parent artery Reynolds number. A few studies have recently demonstrated the occurrence of transitional flow in aneurysms, but these studies employed special finite element schemes tailored to capture transitional nature of flow. I… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  29. arXiv:1505.06816  [pdf, other

    cs.CL

    Representing Meaning with a Combination of Logical and Distributional Models

    Authors: I. Beltagy, Stephen Roller, Pengxiang Cheng, Katrin Erk, Raymond J. Mooney

    Abstract: NLP tasks differ in the semantic information they require, and at this time no single se- mantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based app… ▽ More

    Submitted 8 June, 2016; v1 submitted 26 May, 2015; originally announced May 2015.

    Comments: Special issue of Computational Linguistics on Formal Distributional Semantics, 2016