Skip to main content

Showing 1–31 of 31 results for author: Cherry, C

  1. arXiv:2407.10456  [pdf, other

    cs.CL

    Don't Throw Away Data: Better Sequence Knowledge Distillation

    Authors: Jun Wang, Eleftheria Briakou, Hamid Dadkhahi, Rishabh Agarwal, Colin Cherry, Trevor Cohn

    Abstract: A critical component in knowledge distillation is the means of coupling the teacher and student. The predominant sequence knowledge distillation method involves supervised learning of the student against teacher-decoded outputs, and is exemplified by the current state of the art, which incorporates minimum Bayes risk (MBR) decoding. In this paper we seek to integrate MBR more tightly in distillati… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2402.17193  [pdf, other

    cs.CL cs.LG

    When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

    Authors: Biao Zhang, Zhongtao Liu, Colin Cherry, Orhan Firat

    Abstract: While large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications, our understanding on the inductive biases (especially the scaling properties) of different finetuning methods is still limited. To fill this gap, we conduct systematic experiments studying whether and how different scaling factors, including LLM model size, pretraining data size, new… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: ICLR24

  3. arXiv:2401.01419  [pdf, other

    cs.CL

    To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation

    Authors: Jiaming Luo, Colin Cherry, George Foster

    Abstract: We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence. Across three language pairs and two types of divergence defined as the structural difference between the source and the target, MT is consistently more conservative than HT, with less morphosyntactic diversity, more convergent patte… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: TACL, pre-MIT Press publication version

  4. arXiv:2310.06707  [pdf, other

    cs.CL cs.AI

    Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model

    Authors: Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Xavier Garcia, Daniel Cremers

    Abstract: Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by deco… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  5. XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

    Authors: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson , et al. (2 additional authors not shown)

    Abstract: Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;… ▽ More

    Submitted 24 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  6. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  7. arXiv:2305.10266  [pdf, other

    cs.CL

    Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

    Authors: Eleftheria Briakou, Colin Cherry, George Foster

    Abstract: Large, multilingual language models exhibit surprisingly good zero- or few-shot machine translation capabilities, despite having never seen the intentionally-included translation examples provided to typical neural translation systems. We investigate the role of incidental bilingualism -- the unintentional consumption of bilingual signals, including translation examples -- in explaining the transl… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  8. arXiv:2302.01398  [pdf, other

    cs.CL

    The unreasonable effectiveness of few-shot learning for machine translation

    Authors: Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

    Abstract: We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  9. arXiv:2211.09102  [pdf, other

    cs.CL

    Prompting PaLM for Translation: Assessing Strategies and Performance

    Authors: David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster

    Abstract: Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translat… ▽ More

    Submitted 25 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  10. arXiv:2203.13339  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation

    Authors: Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobuyuki Morioka

    Abstract: End-to-end speech-to-speech translation (S2ST) without relying on intermediate text representations is a rapidly emerging frontier of research. Recent works have demonstrated that the performance of such direct S2ST systems is approaching that of conventional cascade S2ST when trained on comparable datasets. However, in practice, the performance of direct S2ST is bounded by the availability of pai… ▽ More

    Submitted 27 June, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Interspeech 2022

  11. arXiv:2203.10752  [pdf, other

    cs.CL

    XTREME-S: Evaluating Cross-lingual Speech Representations

    Authors: Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan Van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

    Abstract: We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as w… ▽ More

    Submitted 13 April, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Minor fix: language code for Filipino (Tagalog), "tg" -> "tl"

  12. arXiv:2202.01994  [pdf, other

    cs.LG cs.CL

    Data Scaling Laws in NMT: The Effect of Noise and Architecture

    Authors: Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat

    Abstract: In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size. Then, we systematically vary aspects of the training setup to understand… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  13. arXiv:2202.01374  [pdf, other

    cs.CL cs.LG

    mSLAM: Massively multilingual joint pre-training for speech and text

    Authors: Ankur Bapna, Colin Cherry, Yu Zhang, Ye Jia, Melvin Johnson, Yong Cheng, Simran Khanuja, Jason Riesa, Alexis Conneau

    Abstract: We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages. mSLAM combines w2v-BERT pre-training on speech with SpanBERT pre-training on character-level text, along with Connectionist Temporal Classification (CTC) losses on paired spee… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  14. arXiv:2112.08570  [pdf, other

    cs.CL

    Can Multilinguality benefit Non-autoregressive Machine Translation?

    Authors: Sweta Agrawal, Julia Kreutzer, Colin Cherry

    Abstract: Non-autoregressive (NAR) machine translation has recently achieved significant improvements, and now outperforms autoregressive (AR) models on some benchmarks, providing an efficient alternative to AR inference. However, while AR translation is often implemented using multilingual models that benefit from transfer between languages and from improved serving efficiency, multilingual NAR models rema… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  15. arXiv:2109.07740  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Neural Machine Translation

    Authors: Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry

    Abstract: We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model size follows a certain scaling law. Specifically (i) We propose a formula which describes the scaling behavior of cross-entropy loss as a bivariate function of encoder and decoder size, and show that it gives accu… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 31 pages, 23 figures

  16. arXiv:2104.05146  [pdf, other

    cs.CL

    Assessing Reference-Free Peer Evaluation for Machine Translation

    Authors: Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry

    Abstract: Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual model can achieve state of the art results when used as a reference-free metric. We experiment with various modifications to this model and demonstrat… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  17. arXiv:2010.11132  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Sentence Boundary Augmentation For Neural Machine Translation Robustness

    Authors: Daniel Li, Te I, Naveen Arivazhagan, Colin Cherry, Dirk Padfield

    Abstract: Neural Machine Translation (NMT) models have demonstrated strong state of the art performance on translation tasks where well-formed training and evaluation data are provided, but they remain sensitive to inputs that include errors of various types. Specifically, in the context of long-form speech translation systems, where the input transcripts come from Automatic Speech Recognition (ASR), the NM… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: 5 pages, 4 figures

  18. arXiv:2010.10245  [pdf, other

    cs.CL cs.LG

    Human-Paraphrased References Improve Neural Machine Translation

    Authors: Markus Freitag, George Foster, David Grangier, Colin Cherry

    Abstract: Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by Freitag et al. When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over mo… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted at WMT 2020

  19. arXiv:2010.02352  [pdf, other

    cs.CL

    Inference Strategies for Machine Translation with Conditional Masking

    Authors: Julia Kreutzer, George Foster, Colin Cherry

    Abstract: Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance,… ▽ More

    Submitted 20 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020, updated Fig 3

  20. arXiv:2004.03643  [pdf, other

    cs.CL

    Re-translation versus Streaming for Simultaneous Translation

    Authors: Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, George Foster

    Abstract: There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare… ▽ More

    Submitted 29 June, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: IWSLT 2020

  21. arXiv:1912.03393  [pdf, other

    cs.CL cs.AI cs.LG

    Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation

    Authors: Naveen Arivazhagan, Colin Cherry, Te I, Wolfgang Macherey, Pallavi Baljekar, George Foster

    Abstract: We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repea… ▽ More

    Submitted 7 April, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

    Comments: ICASSP 2020

  22. arXiv:1907.05019  [pdf, other

    cs.CL cs.LG

    Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

    Authors: Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, Yonghui Wu

    Abstract: We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

  23. arXiv:1906.05218  [pdf, other

    cs.CL

    Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

    Authors: Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, Colin Raffel

    Abstract: Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural mach… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at ACL 2019

  24. arXiv:1906.00048  [pdf, other

    cs.CL

    Thinking Slow about Latency Evaluation for Simultaneous Machine Translation

    Authors: Colin Cherry, George Foster

    Abstract: Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation. Since simultaneous systems trade quality to reduce latency, having an effective and interpretable latency metric is crucial. We introduce a variant of the recently proposed Average Lagging (AL) metric,… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

  25. arXiv:1903.00041  [pdf, other

    cs.CL

    Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

    Authors: Gaurav Kumar, George Foster, Colin Cherry, Maxim Krikun

    Abstract: We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). Specifically, given a training dataset with a sentence-level feature such as noise, we seek an optimal curriculum, or order for presenting examples to the system during training. Our curriculum framework allows examples to appear an arbitrary number of times, and thus generalizes dat… ▽ More

    Submitted 28 February, 2019; originally announced March 2019.

    Comments: NAACL 2019 short paper. Reviewer comments not yet addressed

  26. arXiv:1902.08295  [pdf, other

    cs.LG stat.ML

    Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

    Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

    Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  27. arXiv:1901.11528  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

    Authors: Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare

    Abstract: We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives. In this task, the goal is to establish universe details, and to collaborate on an interesting story in that universe, through a series of natural dialogue exchanges. Our model can augment any probabilistic conversational agent by allowing i… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

    Comments: 20 pages, 9 figures

  28. arXiv:1810.00428  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Efficient Sequence Labeling with Actor-Critic Training

    Authors: Saeed Najafi, Colin Cherry, Grzegorz Kondrak

    Abstract: Neural approaches to sequence labeling often use a Conditional Random Field (CRF) to model their output dependencies, while Recurrent Neural Networks (RNN) are used for the same purpose in other tasks. We set out to establish RNNs as an attractive alternative to CRFs for sequence labeling. To do so, we address one of the RNN's most prominent shortcomings, the fact that it is not exposed to its own… ▽ More

    Submitted 30 September, 2018; originally announced October 2018.

  29. arXiv:1808.09943  [pdf, other

    cs.CL

    Revisiting Character-Based Neural Machine Translation with Capacity and Compression

    Authors: Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey

    Abstract: Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering. However, it results in longer sequences in which each symbol contains less information, creating both modeling and computational challenges. In this paper, we show th… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: To appear at EMNLP 2018

  30. arXiv:1704.07431  [pdf, ps, other

    cs.CL

    A Challenge Set Approach to Evaluating Machine Translation

    Authors: Pierre Isabelle, Colin Cherry, George Foster

    Abstract: Neural machine translation represents an exciting leap forward in translation quality. But what longstanding weaknesses does it resolve, and which remain? We address these questions with a challenge set approach to translation evaluation and error analysis. A challenge set consists of a small set of sentences, each hand-designed to probe a system's capacity to bridge a particular structural diverg… ▽ More

    Submitted 28 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

    Comments: EMNLP 2017. 28 pages, including appendix. Machine readable data included in a separate file. This version corrects typos in the challenge set

  31. arXiv:1704.05907  [pdf, ps, other

    cs.CL cs.LG cs.NE

    End-to-End Multi-View Networks for Text Classification

    Authors: Hongyu Guo, Colin Cherry, Jiang Su

    Abstract: We propose a multi-view network for text classification. Our method automatically creates various views of its input text, each taking the form of soft attention weights that distribute the classifier's focus among a set of base features. For a bag-of-words representation, each view focuses on a different subset of the text's words. Aggregating many such views results in a more discriminative and… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

    Comments: 6 pages