Skip to main content

Showing 1–50 of 52 results for author: Dušek, O

  1. arXiv:2406.05885  [pdf

    cs.CL

    Are Large Language Models Actually Good at Text Style Transfer?

    Authors: Sourabrata Mukherjee, Atul Kr. Ojha, Ondřej Dušek

    Abstract: We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali. Text Style Transfer involves modifying the linguistic style of a text while preserving its core content. We evaluate the capabilities of pre-trained LLMs using zero-shot and few-shot prompti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2405.20805  [pdf

    cs.CL

    Multilingual Text Style Transfer: Datasets & Models for Indian Languages

    Authors: Sourabrata Mukherjee, Atul Kr. Ojha, Akanksha Bansal, Deepak Alok, John P. McCrae, Ondřej Dušek

    Abstract: Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content. This paper focuses on sentiment transfer, a vital TST subtask (Mukherjee et al., 2022a), across a spectrum of Indian languages: Hindi, Magahi, Malayalam, Marathi, Punjabi, Odia, Telugu, and Urdu, expanding upon previous work on English-Bangla sentiment transfer (Mukherjee et al., 2023). We… ▽ More

    Submitted 9 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2402.07767  [pdf

    cs.CL

    Text Detoxification as Style Transfer in English and Hindi

    Authors: Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

    Abstract: This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combin… ▽ More

    Submitted 9 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted and presented at the 20th International Conference on Natural Language Processing (ICON-2023) during December 14-17, 2023

  4. arXiv:2402.03927  [pdf, other

    cs.CL cs.AI

    Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

    Authors: Simone Balloccu, Patrícia Schmidtová, Mateusz Lango, Ondřej Dušek

    Abstract: Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but… ▽ More

    Submitted 22 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024 - main conference

  5. arXiv:2401.10186  [pdf, other

    cs.CL

    Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

    Authors: Zdeněk Kasner, Ondřej Dušek

    Abstract: We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i.e., generating coherent and relevant text from structured data. To avoid the issue of LLM training data contamination with standard benchmarks, we design Quintd - a tool for collecting novel structured data records from public APIs. We find that open LLMs (Llama 2, Mistral, and Zephyr) can… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 Main Conference

  6. arXiv:2312.14708  [pdf, other

    cs.CL

    Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising

    Authors: Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek

    Abstract: Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content. Although current models show good results at changing the sentiment, content preservation in transferred sentences is insufficient. In this paper, we present a sentiment transfer model based on polarity-aware denoising, which accurately… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Published in 25th International Conference on Text, Speech and Dialogue (TSD 2022)

  7. arXiv:2311.09390  [pdf, other

    cs.CL

    LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

    Authors: Nalin Kumar, Ondřej Dušek

    Abstract: Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While entrainment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue entrainment in a GPT-2-based end-to-end task-oriented di… ▽ More

    Submitted 4 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL Findings 2024

  8. arXiv:2310.16964  [pdf, other

    cs.CL

    Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

    Authors: Mateusz Lango, Ondřej Dušek

    Abstract: Hallucination of text ungrounded in the input is a well-known problem in neural data-to-text generation. Many methods have been proposed to mitigate it, but they typically require altering model architecture or collecting additional data, and thus cannot be easily applied to an existing model. In this paper, we explore a new way to mitigate hallucinations by combining the probabilistic output of a… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

    ACM Class: I.2.7

  9. arXiv:2308.06527  [pdf, other

    cs.CL

    With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

    Authors: Ondřej Plátek, Mateusz Lango, Ondřej Dušek

    Abstract: This work presents our efforts to reproduce the results of the human evaluation experiment presented in the paper of Vamvas and Sennrich (2022), which evaluated an automatic system detecting over- and undertranslations (translations containing more or less information than the original) in machine translation (MT) outputs. Despite the high quality of the documentation and code provided by the auth… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Submitted to https://www.aclweb.org/portal/content/repronlp-shared-task-reproducibility-evaluations-nlp-2023

  10. arXiv:2308.06502  [pdf, other

    cs.CL cs.AI

    Three Ways of Using Large Language Models to Evaluate Chat

    Authors: Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek

    Abstract: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other tw… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted to DSTC11 workshop https://dstc11.dstc.community/

  11. arXiv:2308.00399  [pdf, other

    cs.CL cs.LG

    Tackling Hallucinations in Neural Chart Summarization

    Authors: Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg

    Abstract: Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to p… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: To be presented in INLG 2023

  12. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  13. arXiv:2304.06556  [pdf, other

    cs.CL

    Are LLMs All You Need for Task-Oriented Dialogue?

    Authors: Vojtěch Hudeček, Ondřej Dušek

    Abstract: Instructions-tuned Large Language Models (LLMs) gained recently huge popularity thanks to their ability to interact with users through conversation. In this work we aim to evaluate their ability to complete multi-turn tasks and interact with external databases in the context of established task-oriented dialogue benchmarks. We show that for explicit belief state tracking, LLMs underperform compare… ▽ More

    Submitted 3 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted to SIGDial 2023

  14. TabGenie: A Toolkit for Table-to-Text Generation

    Authors: Zdeněk Kasner, Ekaterina Garanina, Ondřej Plátek, Ondřej Dušek

    Abstract: Heterogenity of data-to-text generation datasets limits the research on data-to-text generation systems. We present TabGenie - a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets through the unified framework of table-to-text generation. In TabGenie, all the inputs are represented as tables with associated metadata. The tables can b… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Submitted to ACL 2023 System Demonstration Track

  15. arXiv:2301.07087  [pdf, other

    cs.CL cs.SD eess.AS

    MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

    Authors: Ondřej Plátek, Ondřej Dušek

    Abstract: We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion Score (MOS). We propose a novel approach where the Probabilistic Linear Discriminative Analysis (PLDA) generative model is used on top of an embedding obtained from a self-supervised learning (SSL) neural network (NN) model. We show that PLDA works well with a non-finetuned SSL model when trained only on 136 u… ▽ More

    Submitted 29 June, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: Accepted to SSW 12: https://openreview.net/forum?id=V6RZk6RzSu

  16. Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models

    Authors: Zdeněk Kasner, Ioannis Konstas, Ondřej Dušek

    Abstract: Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples. However, the models are well-known in producing semantically inaccurate outputs if these labels are ambiguous or incomplete, which is often the case in D2T datasets. In this paper, we expose this issue on th… ▽ More

    Submitted 16 October, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Long paper at EACL '23. Code and data: https://github.com/kasnerz/rel2text

    ACM Class: I.2.7

  17. arXiv:2209.11128  [pdf, other

    cs.CL

    Learning Interpretable Latent Dialogue Actions With Less Supervision

    Authors: Vojtěch Hudeček, Ondřej Dušek

    Abstract: We present a novel architecture for explainable modeling of task-oriented dialogues with discrete latent variables to represent dialogue actions. Our model is based on variational recurrent neural networks (VRNN) and requires no explicit annotation of semantic information. Unlike previous works, our approach models the system and user turns separately and performs database query modeling, which ma… ▽ More

    Submitted 12 October, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: 9 pages, accepted to AACL-IJCNLP 2022. Available online at https://github.com/vojtsek/to-vrnn

  18. arXiv:2209.03632  [pdf, other

    cs.CL

    AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

    Authors: Tomáš Nekvinda, Ondřej Dušek

    Abstract: We introduce AARGH, an end-to-end task-oriented dialog system combining retrieval and generative approaches in a single model, aiming at improving dialog management and lexical diversity of outputs. The model features a new response selection method based on an action-aware training objective and a simplified single-encoder retrieval architecture which allow us to build an end-to-end retrieval-enh… ▽ More

    Submitted 25 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: SIGDIAL 2022, with updated examples in Table 4

  19. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  20. arXiv:2206.08425  [pdf, other

    cs.CL

    DialogueScript: Using Dialogue Agents to Produce a Script

    Authors: Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek

    Abstract: We present a novel approach to generating scripts by using agents with different personality types. To manage character interaction in the script, we employ simulated dramatic networks. Automatic and human evaluation on multiple criteria shows that our approach outperforms a vanilla-GPT2-based baseline. We further introduce a new metric to evaluate dialogue consistency based on natural language in… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Non-archival paper at the 4th Workshop on Narrative Understanding (WNU 2022)

  21. arXiv:2203.16279  [pdf, other

    cs.CL

    Neural Pipeline for Zero-Shot Data-to-Text Generation

    Authors: Zdeněk Kasner, Ondřej Dušek

    Abstract: In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets while still taking advantage of surface realization capabilities of PLMs. Inspired by pipeline approaches, we propose to generate text by transforming single-it… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022 Main Conference

  22. arXiv:2112.02721  [pdf, other

    cs.CL cs.AI cs.LG

    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

    Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

    Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

  23. arXiv:2109.10650  [pdf, other

    cs.CL

    MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

    Authors: Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

    Abstract: One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Findings (EMNLP2021 Findings)

  24. arXiv:2108.01182  [pdf, other

    cs.CL

    Underreporting of errors in NLG output, and what to do about it

    Authors: Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen

    Abstract: We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Ne… ▽ More

    Submitted 8 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: Prefinal version, accepted for publication in the Proceedings of the 14th International Conference on Natural Language Generation (INLG 2021, Aberdeen). Comments welcome

  25. arXiv:2106.05580  [pdf, other

    cs.CL

    AGGGEN: Ordering and Aggregating while Generating

    Authors: Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas

    Abstract: We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation. In contrast to previous work using sentence planning, our model is still end-to-end: AGGGEN performs sentence planning at the same time as generating text by learning latent alignments (via semantic facts) bet… ▽ More

    Submitted 17 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Correct the first citation in the Zero-shot Few-shot scenarios paragraph in Section 7

    Journal ref: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL2021)

  26. arXiv:2106.05555  [pdf, other

    cs.CL

    Shades of BLEU, Flavours of Success: The Case of MultiWOZ

    Authors: Tomáš Nekvinda, Ondřej Dušek

    Abstract: The MultiWOZ dataset (Budzianowski et al.,2018) is frequently used for benchmarking context-to-response abilities of task-oriented dialogue systems. In this work, we identify inconsistencies in data preprocessing and reporting of three corpus-based metrics used on this dataset, i.e., BLEU score and Inform & Success rates. We point out a few problems of the MultiWOZ benchmark such as unsatisfactory… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted to GEM Workshop at ACL 2021; for the source files, see https://github.com/Tomiinek/MultiWOZ_Evaluation

  27. arXiv:2102.08892  [pdf, ps, other

    cs.CL cs.HC

    THEaiTRE 1.0: Interactive generation of theatre play scripts

    Authors: Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

    Abstract: We present the first version of a system for interactive generation of theatre play scripts. The system is based on a vanilla GPT-2 model with several adjustments, targeting specific issues we encountered in practice. We also list other issues we encountered but plan to only solve in a future version of the system. The presented system was used to generate a theatre play script planned for premier… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Submitted to Text2Story workshop 2021

    Journal ref: Proc. Text2Story (2021) 71-76

  28. AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

    Authors: Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek

    Abstract: Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for task-oriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via back-transla… ▽ More

    Submitted 14 January, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

    Journal ref: Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI (2021), 198-210

  29. arXiv:2102.01672  [pdf, other

    cs.CL cs.AI cs.LG

    The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

    Authors: Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak , et al. (31 additional authors not shown)

    Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it… ▽ More

    Submitted 1 April, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  30. arXiv:2011.10819  [pdf, other

    cs.CL

    Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

    Authors: Ondřej Dušek, Zdeněk Kasner

    Abstract: A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    Comments: Accepted as a short paper for INLG 2020

  31. arXiv:2011.01694  [pdf, other

    cs.CL

    Data-to-Text Generation with Iterative Text Editing

    Authors: Zdeněk Kasner, Ondřej Dušek

    Abstract: We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and t… ▽ More

    Submitted 28 January, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted for INLG 2020

    ACM Class: I.2.7

  32. arXiv:2008.03802  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SpeedySpeech: Efficient Neural Speech Synthesis

    Authors: Jan Vainer, Ondřej Dušek

    Abstract: While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time. We propose a student-teacher network capable of high-quality faster-than-real-time spectrogram synthesis, with low requirements on computational resources and fast training time… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

    Comments: 5 pages, 3 figures, Interspeech 2020

  33. arXiv:2008.00768  [pdf, other

    eess.AS cs.CL cs.LG

    One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

    Authors: Tomáš Nekvinda, Ondřej Dušek

    Abstract: We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches. Our model is based on Tacotron 2 with a fully convolutional input text encoder whose weights are predicted by a separate parameter generator network.… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020; for the source files, see https://github.com/Tomiinek/Multilingual_Text_to_Speech

  34. arXiv:2006.14668  [pdf, ps, other

    cs.CL

    THEaiTRE: Artificial Intelligence to Write a Theatre Play

    Authors: Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

    Abstract: We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts. This paper reviews related work and drafts an approach we intend to follow. We plan to adopt generative neural language models and hierarchical generation approaches, supported by summarization and machine translation methods, and complemented with a human-in-the-loop approach.

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: accepted to AI4Narratives2020

    Journal ref: Proc. AI4Narratives (2020) 9-13

  35. arXiv:1911.03905  [pdf, ps, other

    cs.CL

    Semantic Noise Matters for Neural Natural Language Generation

    Authors: Ondřej Dušek, David M. Howcroft, Verena Rieser

    Abstract: Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. W… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: In Proceedings of INLG 2019, Tokyo, Japan

    ACM Class: I.2.7

  36. arXiv:1910.05298  [pdf, other

    cs.CL

    Neural Generation for Czech: Data and Baselines

    Authors: Ondřej Dušek, Filip Jurčíček

    Abstract: We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-of-… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

    Comments: Accepted as a long paper at INLG 2019

    ACM Class: I.2.7

  37. arXiv:1910.04731  [pdf, other

    cs.CL

    Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

    Authors: Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

    Abstract: We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs. The latter is trained using pairwise hinge loss over scores from two copies of the rating network. We use learning to rank and synthetic d… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted as a short paper at INLG 2019

    ACM Class: I.2.7

  38. arXiv:1909.02965  [pdf, other

    cs.CL cs.AI

    User Evaluation of a Multi-dimensional Statistical Dialogue System

    Authors: Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser

    Abstract: We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager. This framework has been shown to substantially reduce data needs by leveraging domain-independent dimensions, such as social obligations or feedback, which (as we show) can be transferred between domains. In this paper, we conduct a user study and show that the performance of a multi-di… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: SIGdial 2019

  39. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomen… ▽ More

    Submitted 24 July, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: Computer Speech and Language, final accepted manuscript (in press)

    ACM Class: I.2.7

  40. Neural Response Ranking for Social Conversation: A Data-Efficient Approach

    Authors: Igor Shalyminov, Ondřej Dušek, Oliver Lemon

    Abstract: The overall objective of 'social' dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogue systems can be trained effectively from raw unan… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI. Brussels, Belgium, October 31, 2018

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 1-8. ISBN 978-1-948087-75-9

  41. arXiv:1810.11955  [pdf, other

    cs.CL

    Improving Context Modelling in Multimodal Dialogue Generation

    Authors: Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

    Abstract: In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of… ▽ More

    Submitted 20 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 11th International Conference on Natural Language Generation, pages 129-134, Tilburg, The Netherlands, 2018

  42. arXiv:1810.11954  [pdf, other

    cs.CL cs.AI

    A Knowledge-Grounded Multimodal Search-Based Conversational Agent

    Authors: Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

    Abstract: Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversation… ▽ More

    Submitted 20 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 59-66, Brussels, Belgium, October 2018

  43. arXiv:1810.01170  [pdf, other

    cs.CL

    Findings of the E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approach… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: Accepted to INLG 2018

    Journal ref: Proceedings of the 11th International Conference on Natural Language Generation, pages 322-328, Tilburg, The Netherlands, November 2018

  44. arXiv:1809.06873  [pdf, other

    cs.CL

    Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity

    Authors: Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser

    Abstract: We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically cohe… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3981-3991, Brussels, Belgium, November 2018

  45. RankME: Reliable Human Ratings for Natural Language Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relat… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

    Comments: Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics)

    Journal ref: Proceedings of NAACL-HLT 2018, pages 72-78, New Orleans, Louisiana, June 1-6, 2018

  46. arXiv:1712.07558  [pdf, other

    cs.CL

    An Ensemble Model with Ranking for Social Dialogue

    Authors: Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon

    Abstract: Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence. This year, the Amazon Alexa Prize challenge was announced for the first time, where real customers get to rate systems developed by leading universities worldwide. The aim of the challenge is to converse "coherently and engagingly with humans on popular topics for 20 minutes". We describe our Alexa Prize syst… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

    Comments: NIPS 2017 Workshop on Conversational AI

  47. arXiv:1708.01759  [pdf, other

    cs.CL

    Referenceless Quality Estimation for Natural Language Generation

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output. In this paper, we propose a referenceless quality estimation (QE) approach based on recurrent neural networks, which predicts a quality score for a NLG system output by comparing it to the source meaning representation only. Our method out… ▽ More

    Submitted 5 August, 2017; originally announced August 2017.

    Comments: Accepted as a regular paper to 1st Workshop on Learning to Generate Natural Language (LGNL), Sydney, 10 August 2017

    ACM Class: I.2.7

  48. Why We Need New Evaluation Metrics for NLG

    Authors: Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

    Abstract: The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, e… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: accepted to EMNLP 2017

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2231-2242, Copenhagen, Denmark, September 7-11, 2017

  49. arXiv:1706.09433  [pdf, ps, other

    cs.CL

    Data-driven Natural Language Generation: Paving the Road to Success

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more r… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.

    Comments: WiNLP workshop at ACL 2017

  50. arXiv:1706.09254  [pdf, other

    cs.CL

    The E2E Dataset: New Challenges For End-to-End Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from t… ▽ More

    Submitted 6 July, 2017; v1 submitted 28 June, 2017; originally announced June 2017.

    Comments: Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material)

    ACM Class: I.2.7

    Journal ref: Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany, 15-17 August 2017