Skip to main content

Showing 1–50 of 53 results for author: Rieser, V

  1. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.11757  [pdf, other

    cs.AI cs.CL cs.CY cs.HC

    STAR: SocioTechnical Approach to Red Teaming Language Models

    Authors: Laura Weidinger, John Mellor, Bernat Guillen Pegueroles, Nahema Marchal, Ravin Kumar, Kristian Lum, Canfer Akbulut, Mark Diaz, Stevie Bergman, Mikel Rodriguez, Verena Rieser, William Isaac

    Abstract: This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerability by generating parameterised instructions for human red teamers, leading to improved coverage of the risk surface. Parameterised instructions also provide more detailed insights into model failur… ▽ More

    Submitted 10 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, 5 pages appendix. * denotes equal contribution

  3. arXiv:2404.16244  [pdf, other

    cs.CY

    The Ethics of Advanced AI Assistants

    Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

    Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2403.10144  [pdf, other

    cs.CL cs.AI cs.LG cs.LO cs.PL

    NLP Verification: Towards a General Methodology for Certifying Robustness

    Authors: Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya, Luca Arnaboldi, Matthew L. Daggitt, Omri Isac, Guy Katz, Verena Rieser, Oliver Lemon

    Abstract: Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literatu… ▽ More

    Submitted 31 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2311.04067  [pdf, other

    cs.LG cs.AI cs.CV

    Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

    Authors: Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

    Abstract: Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action predi… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  6. arXiv:2310.11986  [pdf, other

    cs.AI cs.CL cs.CY

    Sociotechnical Safety Evaluation of Generative AI Systems

    Authors: Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac

    Abstract: Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main… ▽ More

    Submitted 31 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: main paper p.1-29, 5 figures, 2 tables

  7. arXiv:2309.14499  [pdf, other

    cs.RO

    FurNav: Development and Preliminary Study of a Robot Direction Giver

    Authors: Bruce W. Wilson, Yann Schlosser, Rayane Tarkany, Meriam Moujahid, Birthe Nesset, Tanvi Dinkar, Verena Rieser

    Abstract: When giving directions to a lost-looking tourist, would you first reference the street-names, cardinal directions, landmarks, or simply tell them to walk five hundred metres in one direction then turn left? Depending on the circumstances, one could reasonably make use of any of these direction giving styles. However, research on direction giving with a robot does not often look at how these differ… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Author Accepted Manuscript, 4 pages, LBR Track, RO-MAN'23, 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), August 2023, Busan, South Korea

    ACM Class: H.5; I.2

  8. arXiv:2308.15214  [pdf, other

    cs.CL cs.AI cs.HC cs.RO

    FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions

    Authors: Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, Oliver Lemon

    Abstract: We demonstrate an embodied conversational agent that can function as a receptionist and generate a mixture of open and closed-domain dialogue along with facial expressions, by using a large language model (LLM) to develop an engaging conversation. We deployed the system onto a Furhat robot, which is highly expressive and capable of using both verbal and nonverbal cues during interaction. The syste… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 5 pages, 2 figures, Accepted at SIGDIAL 2023 (24th Meeting of the Special Interest Group on Discourse and Dialogue), for the demo video, see https://youtu.be/fwtUl1kl22s

  9. arXiv:2307.04761  [pdf, other

    cs.CL cs.AI cs.CY

    Understanding Counterspeech for Online Harm Mitigation

    Authors: Yi-Ling Chung, Gavin Abercrombie, Florence Enock, Jonathan Bright, Verena Rieser

    Abstract: Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 21 pages, 2 figures, 2 tables

  10. arXiv:2305.16519  [pdf, other

    cs.CL

    The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering

    Authors: Sabrina Chiesurin, Dimitris Dimakopoulos, Marco Antonio Sobrevilla Cabezudo, Arash Eshghi, Ioannis Papaioannou, Verena Rieser, Ioannis Konstas

    Abstract: Large language models are known to produce output which sounds fluent and convincing, but is also often wrong, e.g. "unfaithful" with respect to a rationale as retrieved from a knowledge base. In this paper, we show that task-based systems which exhibit certain advanced linguistic dialog behaviors, such as lexical alignment (repeating what the user said), are in fact preferred and trusted more, wh… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 5 pages, ACL Findings 2023

  11. arXiv:2305.09800  [pdf, other

    cs.CL

    Mirages: On Anthropomorphism in Dialogue Systems

    Authors: Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, Zeerak Talat

    Abstract: Automated dialogue or conversational systems are anthropomorphised by developers and personified by users. While a degree of anthropomorphism may be inevitable due to the choice of medium, conscious and unconscious design choices can guide users to personify such systems to varying degrees. Encouraging users to relate to automated systems as if they were human can lead to high risk scenarios cause… ▽ More

    Submitted 23 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at EMNLP. See ACL Anthology for published version

  12. arXiv:2305.06074  [pdf, other

    cs.CL cs.LG

    iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or Modelling Perspectives?

    Authors: Nikolas Vitsakis, Amit Parekh, Tanvi Dinkar, Gavin Abercrombie, Ioannis Konstas, Verena Rieser

    Abstract: There are two competing approaches for modelling annotator disagreement: distributional soft-labelling approaches (which aim to capture the level of disagreement) or modelling perspectives of individual annotators or groups thereof. We adapt a multi-task architecture -- which has previously shown success in modelling perspectives -- to evaluate its performance on the SEMEVAL Task 11. We do so by c… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: To appear in the Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023). Association for Computational Linguistics, 2023

  13. arXiv:2305.04003  [pdf, other

    cs.CL cs.AI cs.LG

    ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

    Authors: Marco Casadio, Luca Arnaboldi, Matthew L. Daggitt, Omri Isac, Tanvi Dinkar, Daniel Kienitz, Verena Rieser, Ekaterina Komendantskaya

    Abstract: Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP… ▽ More

    Submitted 15 August, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: To appear in proceedings of 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (Affiliated with CAV 2023)

  14. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  15. arXiv:2304.14803  [pdf

    cs.CL

    SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

    Authors: Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio

    Abstract: NLP datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the NLP community has come to realize that the approach of 'reconciling' these different subjective interpretations is inappropriate. Many NLP r… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  16. arXiv:2304.14623  [pdf, other

    cs.CV

    Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

    Authors: Lu Yu, Malvina Nikandrou, Jiali Jin, Verena Rieser

    Abstract: Automated image captioning has the potential to be a useful tool for people with vision impairments. Images taken by this user group are often noisy, which leads to incorrect and even unsafe model predictions. In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people. We address this problem from three a… ▽ More

    Submitted 1 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: To appear in IJCAI 2023

  17. arXiv:2301.10684  [pdf, other

    cs.CL

    Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

    Authors: Gavin Abercrombie, Verena Rieser, Dirk Hovy

    Abstract: We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks. While inter-annotator agreement is frequently used as an indication of label reliability by measuring consistency between annotators, we argue for the additional use of intra-annotator agreement to measure label stability over time. However, in a systematic re… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  18. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  19. arXiv:2211.04534  [pdf, other

    cs.CV cs.CL

    Going for GOAL: A Resource for Grounded Football Commentaries

    Authors: Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser

    Abstract: Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer')… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Preprint formatted using the ACM Multimedia template (8 pages + appendix)

  20. arXiv:2210.00572  [pdf, other

    cs.CL

    Risk-graded Safety for Handling Medical Queries in Conversational AI

    Authors: Gavin Abercrombie, Verena Rieser

    Abstract: Conversational AI systems can engage in unsafe behaviour when handling users' medical queries that can have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of dif… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at AACL 2022

  21. arXiv:2210.00044  [pdf, other

    cs.LG

    Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering

    Authors: Mavina Nikandrou, Lu Yu, Alessandro Suglia, Ioannis Konstas, Verena Rieser

    Abstract: Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. Although continual learning has been widely studied in computer vision, its application to Vision+Language tasks is not that straightforward, as settings can be parameterized in multiple ways according to their input modalities. In this paper, we present a detailed study of how diff… ▽ More

    Submitted 20 January, 2024; v1 submitted 30 September, 2022; originally announced October 2022.

  22. arXiv:2207.02639  [pdf, other

    cs.CV cs.MM

    Adversarial Robustness of Visual Dialog

    Authors: Lu Yu, Verena Rieser

    Abstract: Adversarial robustness evaluates the worst-case performance scenario of a machine learning model to ensure its safety and reliability. This study is the first to investigate the robustness of visually grounded dialog models towards textual attacks. These attacks represent a worst-case scenario where the input question contains a synonym which causes the previously correct model to return a wrong a… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

  23. arXiv:2206.14575  [pdf, other

    cs.CL cs.AI

    Why Robust Natural Language Understanding is a Challenge

    Authors: Marco Casadio, Ekaterina Komendantskaya, Verena Rieser, Matthew L. Daggitt, Daniel Kienitz, Luca Arnaboldi, Wen Kokke

    Abstract: With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in… ▽ More

    Submitted 13 July, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

  24. arXiv:2203.10012  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

    Authors: Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang

    Abstract: This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research.

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Report from the NSF AED Workshop (http://dialrc.org/AED/)

  25. arXiv:2109.10650  [pdf, other

    cs.CL

    MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

    Authors: Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

    Abstract: One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Findings (EMNLP2021 Findings)

  26. arXiv:2109.09483  [pdf, other

    cs.CL cs.HC

    ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI

    Authors: Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser

    Abstract: We present the first English corpus study on abusive language towards three conversational AI systems gathered "in the wild": an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more `nuanced' approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: To be published in the 2021 Conference on Empirical Methods for Natural Language Processing (EMNLP2021)

  27. arXiv:2107.03451  [pdf, other

    cs.CL cs.AI

    Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

    Authors: Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

    Abstract: Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from the internet, and as a result, may learn undesirable behaviors from this data, such as toxic or otherwise harmful language. Researchers must thus wrestle with the issue of how and whe… ▽ More

    Submitted 23 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

  28. arXiv:2106.05580  [pdf, other

    cs.CL

    AGGGEN: Ordering and Aggregating while Generating

    Authors: Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas

    Abstract: We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation. In contrast to previous work using sentence planning, our model is still end-to-end: AGGGEN performs sentence planning at the same time as generating text by learning latent alignments (via semantic facts) bet… ▽ More

    Submitted 17 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Correct the first citation in the Zero-shot Few-shot scenarios paragraph in Section 7

    Journal ref: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL2021)

  29. arXiv:2106.02578  [pdf, other

    cs.AI

    Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants

    Authors: Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya, Verena Rieser

    Abstract: Technology companies have produced varied responses to concerns about the effects of the design of their conversational AI systems. Some have claimed that their voice assistants are in fact not gendered or human-like -- despite design features suggesting the contrary. We compare these claims to user perceptions by analysing the pronouns they use when referring to AI assistants. We also examine sys… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: To be presented at the 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021)

  30. arXiv:2105.13710  [pdf, other

    cs.CL

    OTTers: One-turn Topic Transitions for Open-Domain Dialogue

    Authors: Karin Sevegnani, David M. Howcroft, Ioannis Konstas, Verena Rieser

    Abstract: Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics. The one-turn topic transition task explores how a system connects two topics in a cooperative and coherent manner. The goal of the task is to generate a "bridging" utterance connecting the new topic to the topic of the previous conversation turn. We are especially interested in commonsense explanations… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Journal ref: ACL2021

  31. arXiv:2011.13205  [pdf, other

    cs.CL cs.LG

    SLURP: A Spoken Language Understanding Resource Package

    Authors: Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser

    Abstract: Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger an… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

    Comments: Published at the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP-2020)

  32. arXiv:2005.07493  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    History for Visual Dialog: Do we really need it?

    Authors: Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser

    Abstract: Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we show that co-attention models which explicitly encode dialog history outperform models that don't, achieving state-of-the-art performance (72 % NDCG on val set)… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: ACL'20

  33. arXiv:1911.03905  [pdf, ps, other

    cs.CL

    Semantic Noise Matters for Neural Natural Language Generation

    Authors: Ondřej Dušek, David M. Howcroft, Verena Rieser

    Abstract: Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. W… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: In Proceedings of INLG 2019, Tokyo, Japan

    ACM Class: I.2.7

  34. arXiv:1910.04731  [pdf, other

    cs.CL

    Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

    Authors: Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

    Abstract: We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs. The latter is trained using pairwise hinge loss over scores from two copies of the rating network. We use learning to rank and synthetic d… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted as a short paper at INLG 2019

    ACM Class: I.2.7

  35. arXiv:1909.04387  [pdf, other

    cs.HC cs.CL

    A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents

    Authors: Amanda Cercas Curry, Verena Rieser

    Abstract: How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by current state-of-the-art systems. Our results show that some strategies, such as "polite refusal" score highly across the board, while for other strategies demographic factors, such as age, as well as the seve… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  36. arXiv:1909.02965  [pdf, other

    cs.CL cs.AI

    User Evaluation of a Multi-dimensional Statistical Dialogue System

    Authors: Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser

    Abstract: We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager. This framework has been shown to substantially reduce data needs by leveraging domain-independent dimensions, such as social obligations or feedback, which (as we show) can be transferred between domains. In this paper, we conduct a user study and show that the performance of a multi-di… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: SIGdial 2019

  37. arXiv:1903.05566  [pdf, ps, other

    cs.CL cs.LG

    Benchmarking Natural Language Understanding Services for building Conversational Agents

    Authors: Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser

    Abstract: We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a l… ▽ More

    Submitted 26 March, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted by IWSDS2019

  38. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomen… ▽ More

    Submitted 24 July, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: Computer Speech and Language, final accepted manuscript (in press)

    ACM Class: I.2.7

  39. arXiv:1810.11955  [pdf, other

    cs.CL

    Improving Context Modelling in Multimodal Dialogue Generation

    Authors: Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

    Abstract: In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of… ▽ More

    Submitted 20 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 11th International Conference on Natural Language Generation, pages 129-134, Tilburg, The Netherlands, 2018

  40. arXiv:1810.11954  [pdf, other

    cs.CL cs.AI

    A Knowledge-Grounded Multimodal Search-Based Conversational Agent

    Authors: Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

    Abstract: Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversation… ▽ More

    Submitted 20 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 59-66, Brussels, Belgium, October 2018

  41. arXiv:1810.01170  [pdf, other

    cs.CL

    Findings of the E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approach… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: Accepted to INLG 2018

    Journal ref: Proceedings of the 11th International Conference on Natural Language Generation, pages 322-328, Tilburg, The Netherlands, November 2018

  42. arXiv:1809.06873  [pdf, other

    cs.CL

    Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity

    Authors: Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser

    Abstract: We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically cohe… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3981-3991, Brussels, Belgium, November 2018

  43. arXiv:1804.00146  [pdf, other

    cs.CL

    Towards Learning Transferable Conversational Skills using Multi-dimensional Dialogue Modelling

    Authors: Simon Keizer, Verena Rieser

    Abstract: Recent statistical approaches have improved the robustness and scalability of spoken dialogue systems. However, despite recent progress in domain adaptation, their reliance on in-domain data still limits their cross-domain scalability. In this paper, we argue that this problem can be addressed by extending current models to reflect and exploit the multi-dimensional nature of human dialogue. We pre… ▽ More

    Submitted 31 March, 2018; originally announced April 2018.

    Comments: A short version of this paper has been published in Proc. 21st Workshop on the Semantics and Pragmatics of Dialogue (SemDial/SaarDial)

  44. RankME: Reliable Human Ratings for Natural Language Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relat… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

    Comments: Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics)

    Journal ref: Proceedings of NAACL-HLT 2018, pages 72-78, New Orleans, Louisiana, June 1-6, 2018

  45. arXiv:1712.07558  [pdf, other

    cs.CL

    An Ensemble Model with Ranking for Social Dialogue

    Authors: Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon

    Abstract: Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence. This year, the Amazon Alexa Prize challenge was announced for the first time, where real customers get to rate systems developed by leading universities worldwide. The aim of the challenge is to converse "coherently and engagingly with humans on popular topics for 20 minutes". We describe our Alexa Prize syst… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

    Comments: NIPS 2017 Workshop on Conversational AI

  46. A Review of Evaluation Techniques for Social Dialogue Systems

    Authors: Amanda Cercas Curry, Helen Hastie, Verena Rieser

    Abstract: In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

    Comments: 2 pages

    MSC Class: 68T50

  47. arXiv:1708.01759  [pdf, other

    cs.CL

    Referenceless Quality Estimation for Natural Language Generation

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output. In this paper, we propose a referenceless quality estimation (QE) approach based on recurrent neural networks, which predicts a quality score for a NLG system output by comparing it to the source meaning representation only. Our method out… ▽ More

    Submitted 5 August, 2017; originally announced August 2017.

    Comments: Accepted as a regular paper to 1st Workshop on Learning to Generate Natural Language (LGNL), Sydney, 10 August 2017

    ACM Class: I.2.7

  48. Why We Need New Evaluation Metrics for NLG

    Authors: Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

    Abstract: The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, e… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: accepted to EMNLP 2017

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2231-2242, Copenhagen, Denmark, September 7-11, 2017

  49. arXiv:1706.09433  [pdf, ps, other

    cs.CL

    Data-driven Natural Language Generation: Paving the Road to Success

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more r… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.

    Comments: WiNLP workshop at ACL 2017

  50. arXiv:1706.09254  [pdf, other

    cs.CL

    The E2E Dataset: New Challenges For End-to-End Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from t… ▽ More

    Submitted 6 July, 2017; v1 submitted 28 June, 2017; originally announced June 2017.

    Comments: Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material)

    ACM Class: I.2.7

    Journal ref: Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany, 15-17 August 2017