Skip to main content

Showing 1–21 of 21 results for author: Üstün, A

  1. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2402.14740  [pdf, other

    cs.LG

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

    Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 27 pages, 7 figures, 2 tables

    ACM Class: I.2.7

  5. arXiv:2402.07827  [pdf, other

    cs.CL

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

    Authors: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

    Abstract: Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2402.06619  [pdf, other

    cs.CL cs.AI

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

    Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  7. arXiv:2309.05444  [pdf, other

    cs.CL cs.LG

    Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

    Authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

    Abstract: The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architectur… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  8. arXiv:2309.04564  [pdf, other

    cs.CL cs.LG

    When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

    Authors: Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

    Abstract: Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scraping the internet, leading to pretraining datasets comprised of noisy web text. To date, efforts to prune these datasets down to a higher quality subset have relied on hand-crafted heuristics encoded as rule-based filters. In this work… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  9. arXiv:2305.19268  [pdf, other

    cs.LG cs.AI

    Intriguing Properties of Quantization at Scale

    Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

    Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 32 pages, 14 figures

  10. arXiv:2205.12148  [pdf, other

    cs.CL

    Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

    Authors: Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord, Sebastian Ruder

    Abstract: Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. This model ge… ▽ More

    Submitted 25 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022 (Main Conference)

  11. arXiv:2205.11277  [pdf, other

    cs.CL

    When does Parameter-Efficient Transfer Learning Work for Machine Translation?

    Authors: Ahmet Üstün, Asa Cooper Stickland

    Abstract: Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for… ▽ More

    Submitted 24 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022 (Main Conference)

  12. arXiv:2110.10472  [pdf, other

    cs.CL

    Multilingual Unsupervised Neural Machine Translation with Denoising Adapters

    Authors: Ahmet Üstün, Alexandre Bérard, Laurent Besacier, Matthias Gallé

    Abstract: We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune. In this paper we propose instead to use denoising adapters, ada… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted as a long paper to EMNLP 2021

  13. arXiv:2109.12012  [pdf, other

    cs.CL

    Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

    Authors: Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

    Abstract: This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German--Lower Sorbian (DE--DSB): a high-resource language to a low-resource one. Our system uses a transformer encoder-decoder architecture in which we make three changes to the standard training procedure. First, our training focuses on two langua… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  14. arXiv:2107.06055  [pdf, other

    cs.CL

    On the Difficulty of Translating Free-Order Case-Marking Languages

    Authors: Arianna Bisazza, Ahmet Üstün, Stephan Sportel

    Abstract: Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate wheth… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: Accepted to TACL, pre-MIT Press publication version

  15. arXiv:2105.07316  [pdf, other

    cs.CL

    From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

    Authors: Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank

    Abstract: The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual Slot and Intent… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: To appear in the proceedings of NAACL 2021

  16. arXiv:2103.01273  [pdf, other

    cs.CL

    On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

    Authors: Rob van der Goot, Ahmet Üstün, Barbara Plank

    Abstract: Recent complementary strands of research have shown that leveraging information on the data source through encoding their properties into embeddings can lead to performance increase when training a single model on heterogeneous data sources. However, it remains unclear in which situations these dataset embeddings are most effective, because they are used in a large variety of settings, languages a… ▽ More

    Submitted 5 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  17. arXiv:2007.12544  [pdf, other

    cs.CL

    FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings

    Authors: Bertelt Braaksma, Richard Scholtens, Stan van Suijlekom, Remy Wang, Ahmet Üstün

    Abstract: In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data in the SemEval-2020 Task 9. We investigate performance of various pre-trained Transformer models by using different fine-tuning strategies. We explore both monolingual and multilingual models with the standard fine-tuning method. Additionally, we propose a custom model that we fine-t… ▽ More

    Submitted 19 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: In Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020), Barcelona, Spain, December. Association for Computational Linguistics

  18. arXiv:2005.14672  [pdf, other

    cs.CL

    Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

    Authors: Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank

    Abstract: Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration option… ▽ More

    Submitted 11 March, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: EACL demo version (MaChAmp 0.2) https://machamp-nlp.github.io/

  19. arXiv:2004.14327  [pdf, other

    cs.CL

    UDapter: Language Adaptation for Truly Universal Dependency Parsing

    Authors: Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord

    Abstract: Recent advances in multilingual dependency parsing have brought the idea of a truly universal parser closer to reality. However, cross-language interference and restrained model capacity remain major obstacles. To address this, we propose a novel multilingual task adaptation approach based on contextual parameter generation and adapter modules. This approach enables to learn adapters via language… ▽ More

    Submitted 6 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: In EMNLP 2020

  20. arXiv:1704.07329  [pdf, other

    cs.CL

    A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

    Authors: Murathan Kurfalı, Ahmet Üstün, Burcu Can

    Abstract: In this paper, we introduce a trie-structured Bayesian model for unsupervised morphological segmentation. We adopt prior information from different sources in the model. We use neural word embeddings to discover words that are morphologically derived from each other and thereby that are semantically similar. We use letter successor variety counts obtained from tries that are built by neural word e… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: 12 pages, accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistics

  21. arXiv:1703.03200  [pdf, other

    cs.CL

    Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

    Authors: Burcu Can, Ahmet Üstün, Murathan Kurfalı

    Abstract: Sparsity is one of the major problems in natural language processing. The problem becomes even more severe in agglutinating languages that are highly prone to be inflected. We deal with sparsity in Turkish by adopting morphological features for part-of-speech tagging. We learn inflectional and derivational morpheme tags in Turkish by using conditional random fields (CRF) and we employ the morpheme… ▽ More

    Submitted 10 March, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

    Comments: 13 pages, accepted and presented in 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING)