Skip to main content

Showing 1–39 of 39 results for author: Kreutzer, J

  1. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01490  [pdf, other

    cs.CL cs.AI cs.LG

    LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

    Authors: Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs) via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying the consequences of synthetic data integration. We provide one of the most comprehensive studies to-date… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.18682  [pdf, other

    cs.CL cs.AI cs.LG

    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

    Authors: Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches… ▽ More

    Submitted 8 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2405.19462  [pdf, other

    cs.CL

    Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning

    Authors: Everlyn Asiko Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer, Bruce A. Bassett, Sara Hooker

    Abstract: Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT),… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 Findings

  5. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2402.14740  [pdf, other

    cs.LG

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

    Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 27 pages, 7 figures, 2 tables

    ACM Class: I.2.7

  7. arXiv:2402.07827  [pdf, other

    cs.CL

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

    Authors: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

    Abstract: Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  8. arXiv:2402.06619  [pdf, other

    cs.CL cs.AI

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

    Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  9. arXiv:2312.00854  [pdf, other

    physics.med-ph cs.AI cs.LG math.NA stat.CO

    A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis

    Authors: John D. Lee, Jakob Richter, Martin R. Pfaller, Jason M. Szafron, Karthik Menon, Andrea Zanoni, Michael R. Ma, Jeffrey A. Feinstein, Jacqueline Kreutzer, Alison L. Marsden, Daniele E. Schiavazzi

    Abstract: The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  10. Intriguing Properties of Compression on Multilingual Models

    Authors: Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer

    Abstract: Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingu… ▽ More

    Submitted 25 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: Accepted to EMNLP 2022

  11. arXiv:2210.02545  [pdf, other

    cs.CL cs.SD eess.AS

    JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

    Authors: Mayumi Ohta, Julia Kreutzer, Stefan Riezler

    Abstract: JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integr… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 demo track

  12. arXiv:2205.03983  [pdf, other

    cs.CL cs.AI cs.LG

    Building Machine Translation Systems for the Next Thousand Languages

    Authors: Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

    Abstract: In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: V2: updated with some details from 24-language Google Translate launch in May 2022 V3: spelling corrections, additional acknowledgements

  13. arXiv:2205.02022  [pdf, other

    cs.CL

    A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

    Authors: David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, Ernie Chang, Tajuddeen Gwadabe, Freshia Sackey, Bonaventure F. P. Dossou, Chris Chinenye Emezue, Colin Leong, Michael Beukman, Shamsuddeen Hassan Muhammad, Guyo Dub Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir, Benjamin Ayoade Ajibade, Tunde Oluwaseyi Ajayi , et al. (20 additional authors not shown)

    Abstract: Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models… ▽ More

    Submitted 22 August, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022 (added evaluation data for amh, kin, nya, sna, xho)

  14. arXiv:2112.08570  [pdf, other

    cs.CL

    Can Multilinguality benefit Non-autoregressive Machine Translation?

    Authors: Sweta Agrawal, Julia Kreutzer, Colin Cherry

    Abstract: Non-autoregressive (NAR) machine translation has recently achieved significant improvements, and now outperforms autoregressive (AR) models on some benchmarks, providing an efficient alternative to AR inference. However, while AR translation is often implemented using multilingual models that benefit from transfer between languages and from improved serving efficiency, multilingual NAR models rema… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  15. arXiv:2110.06997  [pdf, other

    cs.CL cs.AI

    Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

    Authors: Julia Kreutzer, David Vilar, Artem Sokolov

    Abstract: Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do not occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: EMNLP Findings 2021

  16. arXiv:2110.03036  [pdf, other

    cs.CL cs.AI

    The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

    Authors: Orevaoghene Ahia, Julia Kreutzer, Sara Hooker

    Abstract: A "bigger is better" explosion in the number of parameters in deep neural networks has made it increasingly challenging to make state-of-the-art networks accessible in compute-restricted environments. Compression techniques have taken on renewed importance as a way to bridge the gap. However, evaluation of the trade-offs incurred by popular compression techniques has been centered on high-resource… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of EMNLP 2021

  17. arXiv:2109.06262  [pdf, other

    cs.CL

    Evaluating Multiway Multilingual NMT in the Turkic Languages

    Authors: Jamshidbek Mirzakhalov, Anoop Babu, Aigiz Kunafin, Ahsan Wahab, Behzod Moydinboyev, Sardana Ivanova, Mokhiyakhon Uzokova, Shaxnoza Pulatova, Duygu Ataman, Julia Kreutzer, Francis Tyers, Orhan Firat, John Licato, Sriram Chellappan

    Abstract: Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: 9 pages, 3 figures, 7 tables. To be presented at WMT 2021

  18. arXiv:2107.11353  [pdf, other

    cs.CL

    Modelling Latent Translations for Cross-Lingual Transfer

    Authors: Edoardo Maria Ponti, Julia Kreutzer, Ivan Vulić, Siva Reddy

    Abstract: While achieving state-of-the-art results in multiple tasks and languages, translation-based cross-lingual transfer is often overlooked in favour of massively multilingual pre-trained encoders. Arguably, this is due to its main limitations: 1) translation errors percolating to the classification phase and 2) the insufficient expressiveness of the maximum-likelihood translation. To remedy this, we p… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

  19. Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

    Authors: Samuel Kiegeland, Julia Kreutzer

    Abstract: Policy gradient algorithms have found wide adoption in NLP, but have recently become subject to criticism, doubting their suitability for NMT. Choshen et al. (2020) identify multiple weaknesses and suspect that their success is determined by the shape of output distributions rather than the reward. In this paper, we revisit these claims and study them under a wider range of configurations. Our exp… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Journal ref: North American Chapter of the Association for Computational Linguistics, 2021, 1673-1681

  20. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Authors: Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller , et al. (27 additional authors not shown)

    Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system… ▽ More

    Submitted 21 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted at TACL; pre-MIT Press publication version

    Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 50-72

  21. arXiv:2103.11811  [pdf

    cs.CL cs.AI

    MasakhaNER: Named Entity Recognition for African Languages

    Authors: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi , et al. (36 additional authors not shown)

    Abstract: We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We… ▽ More

    Submitted 5 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL 2021, pre-MIT Press publication version

  22. arXiv:2011.05284  [pdf, other

    cs.CL

    Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara

    Authors: Allahsera Auguste Tapo, Bakary Coulibaly, Sébastien Diarra, Christopher Homan, Julia Kreutzer, Sarah Luger, Arthur Nagashima, Marcos Zampieri, Michael Leventhal

    Abstract: Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  23. arXiv:2011.02511  [pdf, ps, other

    cs.CL cs.LG

    Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

    Authors: Julia Kreutzer, Stefan Riezler, Carolin Lawrence

    Abstract: Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising approach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview… ▽ More

    Submitted 9 June, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: 5th Workshop on Structured Prediction for NLP at ACL 2021 Previously named "Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP" and presented at Challenges of Real-World RL Workshop at NeurIPS 2020

  24. arXiv:2010.12174  [pdf, other

    cs.CL cs.IR cs.LG

    KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi

    Authors: Rubungo Andre Niyongabo, Hong Qu, Julia Kreutzer, Li Huang

    Abstract: Recent progress in text classification has been focused on high-resource languages such as English and Chinese. For low-resource languages, amongst them most African languages, the lack of well-annotated data and effective preprocessing, is hindering the progress and the transfer of successful methods. In this paper, we introduce two news datasets (KINNEWS and KIRNEWS) for multi-class classificati… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: COLING 2020

    ACM Class: I.2.7

  25. arXiv:2010.02353  [pdf, other

    cs.CL cs.AI cs.LG

    Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

    Authors: Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer , et al. (23 additional authors not shown)

    Abstract: Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communicat… ▽ More

    Submitted 6 November, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020; updated benchmarks

  26. arXiv:2010.02352  [pdf, other

    cs.CL

    Inference Strategies for Machine Translation with Conditional Masking

    Authors: Julia Kreutzer, George Foster, Colin Cherry

    Abstract: Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance,… ▽ More

    Submitted 20 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020, updated Fig 3

  27. arXiv:2004.11222  [pdf, other

    cs.CL

    Correct Me If You Can: Learning from Error Corrections and Markings

    Authors: Julia Kreutzer, Nathaniel Berger, Stefan Riezler

    Abstract: Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popul… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: To appear at EAMT 2020 (Research Track)

  28. arXiv:2004.04418  [pdf, other

    cs.CL cs.LG

    On Optimal Transformer Depth for Low-Resource Language Translation

    Authors: Elan van Biljon, Arnu Pretorius, Julia Kreutzer

    Abstract: Transformers have shown great promise as an approach to Neural Machine Translation (NMT) for low-resource languages. However, at the same time, transformer models remain difficult to optimize and require careful tuning of hyper-parameters to be useful in this setting. Many NMT toolkits come with a set of default hyper-parameters, which researchers and practitioners often adopt for the sake of conv… ▽ More

    Submitted 14 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

  29. arXiv:2003.11529  [pdf, other

    cs.CL

    Masakhane -- Machine Translation For Africa

    Authors: Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir

    Abstract: Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Accepted for the AfricaNLP Workshop, ICLR 2020

  30. arXiv:1907.12484  [pdf, other

    cs.CL cs.LG

    Joey NMT: A Minimalist NMT Toolkit for Novices

    Authors: Julia Kreutzer, Jasmijn Bastings, Stefan Riezler

    Abstract: We present Joey NMT, a minimalist neural machine translation toolkit based on PyTorch that is specifically designed for novices. Joey NMT provides many popular NMT features in a small and simple code base, so that novices can easily and quickly learn to use it and adapt it to their needs. Despite its focus on simplicity, Joey NMT supports classic architectures (RNNs, transformers), fast beam searc… ▽ More

    Submitted 18 June, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

    Journal ref: EMNLP-IJCNLP 2019

  31. arXiv:1907.05190  [pdf, other

    cs.CL stat.ML

    Self-Regulated Interactive Sequence-to-Sequence Learning

    Authors: Julia Kreutzer, Stefan Riezler

    Abstract: Not all types of supervision signals are created equal: Different types of feedback have different costs and effects on learning. We show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can be cast as a learning-to-learn problem leading to improved cost-aware sequence-to-sequence learning. In experiments on interactive neural machi… ▽ More

    Submitted 31 October, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: ACL 2019

  32. arXiv:1810.01480  [pdf, other

    cs.CL stat.ML

    Learning to Segment Inputs for NMT Favors Character-Level Processing

    Authors: Julia Kreutzer, Artem Sokolov

    Abstract: Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static… ▽ More

    Submitted 5 November, 2018; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: Technical report for IWSLT 2018 paper

  33. arXiv:1806.04402  [pdf, other

    cs.CL

    Explaining and Generalizing Back-Translation through Wake-Sleep

    Authors: Ryan Cotterell, Julia Kreutzer

    Abstract: Back-translation has become a commonly employed heuristic for semi-supervised neural machine translation. The technique is both straightforward to apply and has led to state-of-the-art results. In this work, we offer a principled interpretation of back-translation as approximate inference in a generative model of bitext and show how the standard implementation of back-translation corresponds to a… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

  34. arXiv:1805.10627  [pdf, other

    cs.CL stat.ML

    Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

    Authors: Julia Kreutzer, Joshua Uyheng, Stefan Riezler

    Abstract: We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of a reward estimator, and the effect of the quality of reward estimates on the overall RL task. Our a… ▽ More

    Submitted 13 December, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  35. arXiv:1805.01553  [pdf, other

    cs.CL stat.ML

    A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation

    Authors: Tsz Kin Lam, Julia Kreutzer, Stefan Riezler

    Abstract: We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or delete segments, we employ the idea of learning from human reinforcements in form of judgments on the quality of partial translations. Secondly, human effort is further reduced by using the entropy of wor… ▽ More

    Submitted 5 June, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: Published at EAMT 2018; Updated algorithm

  36. arXiv:1804.05958  [pdf, other

    cs.CL stat.ML

    Can Neural Machine Translation be Improved with User Feedback?

    Authors: Julia Kreutzer, Shahram Khadivi, Evgeny Matusov, Stefan Riezler

    Abstract: We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Accepted at NAACL-HLT 2018 (Industry Track)

  37. arXiv:1707.09050  [pdf, other

    cs.CL stat.ML

    A Shared Task on Bandit Learning for Machine Translation

    Authors: Artem Sokolov, Julia Kreutzer, Kellen Sunderland, Pavel Danchenko, Witold Szymaniak, Hagen Fürstenau, Stefan Riezler

    Abstract: We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On e… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: Conference on Machine Translation (WMT) 2017

  38. arXiv:1704.06497  [pdf, other

    stat.ML cs.CL cs.LG

    Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

    Authors: Julia Kreutzer, Artem Sokolov, Stefan Riezler

    Abstract: Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-b… ▽ More

    Submitted 13 December, 2018; v1 submitted 21 April, 2017; originally announced April 2017.

    Comments: ACL 2017

  39. arXiv:1606.00739  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Stochastic Structured Prediction under Bandit Feedback

    Authors: Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler

    Abstract: Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analy… ▽ More

    Submitted 2 November, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain