Skip to main content

Showing 1–50 of 81 results for author: Hooker, S

  1. arXiv:2407.05694  [pdf, other

    cs.AI cs.CL cs.ET cs.LG

    On the Limitations of Compute Thresholds as a Governance Strategy

    Authors: Sara Hooker

    Abstract: At face value, this essay is about understanding a fairly esoteric governance tool called compute thresholds. However, in order to grapple with whether these thresholds will achieve anything, we must first understand how they came to be. This requires engaging with a decades-old debate at the heart of computer science progress, namely, is bigger always better? Hence, this essay may be of interest… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2407.01490  [pdf, other

    cs.CL cs.AI cs.LG

    LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

    Authors: Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs) via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying the consequences of synthetic data integration. We provide one of the most comprehensive studies to-date… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2406.18682  [pdf, other

    cs.CL cs.AI cs.LG

    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

    Authors: Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches… ▽ More

    Submitted 8 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.03368  [pdf, other

    cs.CL cs.AI

    IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

    Authors: David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Tadesse Kebede Guge , et al. (1 additional authors not shown)

    Abstract: Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoB… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Under review

  7. arXiv:2405.19462  [pdf, other

    cs.CL

    Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning

    Authors: Everlyn Asiko Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer, Bruce A. Bassett, Sara Hooker

    Abstract: Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT),… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 Findings

  8. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2403.03893  [pdf, other

    cs.CL cs.AI

    From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

    Authors: Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis

    Abstract: To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient anno… ▽ More

    Submitted 30 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  10. arXiv:2402.14740  [pdf, other

    cs.LG

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

    Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 27 pages, 7 figures, 2 tables

    ACM Class: I.2.7

  11. arXiv:2402.07827  [pdf, other

    cs.CL

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

    Authors: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

    Abstract: Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  12. arXiv:2402.06619  [pdf, other

    cs.CL cs.AI

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

    Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  13. arXiv:2312.03886  [pdf, other

    cs.LG cs.AI cs.CY

    On The Fairness Impacts of Hardware Selection in Machine Learning

    Authors: Sree Harsha Nelaturu, Nishaanth Kanna Ravichandran, Cuong Tran, Sara Hooker, Ferdinando Fioretto

    Abstract: In the machine learning ecosystem, hardware selection is often regarded as a mere utility, overshadowed by the spotlight on algorithms and data. This oversight is particularly problematic in contexts like ML-as-a-service platforms, where users often lack control over the hardware used for model deployment. How does the choice of hardware impact generalization properties? This paper investigates th… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  14. arXiv:2311.18598  [pdf, other

    cs.LG cs.AI cs.MA

    Generalisable Agents for Neural Network Optimisation

    Authors: Kale-ab Tessera, Callum Rhys Tilbury, Sasha Abramowitz, Ruan de Kock, Omayma Mahjoub, Benjamin Rosman, Sara Hooker, Arnu Pretorius

    Abstract: Optimising deep neural networks is a challenging task due to complex training dynamics, high computational requirements, and long training times. To address this difficulty, we propose the framework of Generalisable Agents for Neural Network Optimisation (GANNO) -- a multi-agent reinforcement learning (MARL) approach that learns to improve neural network optimisation by dynamically and responsivel… ▽ More

    Submitted 22 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at the Workshop on Advanced Neural Network Training (WANT) and Optimization for Machine Learning (OPT) at NeurIPS 2023

  15. arXiv:2311.17295  [pdf, other

    cs.CL cs.AI

    Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

    Authors: Meriem Boubdir, Edward Kim, Beyza Ermis, Sara Hooker, Marzieh Fadaee

    Abstract: In Natural Language Processing (NLP), the Elo rating system, originally designed for ranking players in dynamic games such as chess, is increasingly being used to evaluate Large Language Models (LLMs) through "A vs B" paired comparisons. However, while popular, the system's suitability for assessing entities with constant skill levels, such as LLMs, remains relatively unexplored. We study two fund… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 22 pages, 7 figures, 2 tables. Revised version of the paper accepted at GEM Workshop, EMNLP 2023

  16. arXiv:2311.00471  [pdf, other

    physics.plasm-ph physics.acc-ph

    Multi-GeV Wakefield Acceleration in a Plasma-Modulated Plasma Accelerator

    Authors: Johannes J. van de Wetering, Simon M. Hooker, Roman Walczak

    Abstract: We investigate the accelerator stage of a Plasma-Modulated Plasma Accelerator (P-MoPA) [Phys. Rev. Lett. 127, 184801 (2021)] using both the paraxial wave equation and particle-in-cell (PIC) simulations. We show that adjusting the laser and plasma parameters of the modulator stage of a P-MoPA allows the temporal profile of pulses within the pulse train to be controlled, which in turn allows the wak… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures

  17. arXiv:2310.16787  [pdf, other

    cs.CL cs.AI cs.LG

    The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

    Authors: Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

    Abstract: The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tool… ▽ More

    Submitted 4 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 30 pages (18 main), 6 figures, 5 tables

  18. arXiv:2310.16111  [pdf, other

    cs.CL cs.CR cs.LG

    Locally Differentially Private Document Generation Using Zero Shot Prompting

    Authors: Saiteja Utpala, Sara Hooker, Pin Yu Chen

    Abstract: Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (Findings)

  19. arXiv:2310.14424  [pdf, other

    cs.CL cs.AI

    Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

    Authors: Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker

    Abstract: Human evaluation is increasingly critical for assessing large language models, capturing linguistic nuances, and reflecting user preferences more accurately than traditional automated metrics. However, the resource-intensive nature of this type of annotation process poses significant challenges. The key question driving our work: "is it feasible to minimize human-in-the-loop feedback by prioritizi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 37 pages, 8 figures

  20. arXiv:2310.07589  [pdf, other

    cs.AI

    Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes i… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  21. arXiv:2310.05097  [pdf, other

    physics.acc-ph physics.plasm-ph

    Resonant excitation of plasma waves in a plasma channel

    Authors: Aimee J. Ross, James Chappell, Johannes J. van de Wetering, James Cowley, Emily Archer, Nicolas Bourgeois, Laura Corner, David R. Emerson, Linus Feder, Xiao J. Gu, Oscar Jakobsson, Harry Jones, Alexander Picksley, Linus Reid, Wei-Ting Wang, Roman Walczak, Simon M. Hooker

    Abstract: We demonstrate resonant excitation of a plasma wave by a train of short laser pulses guided in a pre-formed plasma channel, for parameters relevant to a plasma-modulated plasma accelerator (P-MoPA). We show experimentally that a train of $N \approx 10$ short pulses, of total energy $\sim 1$ J, can be guided through $110$ mm long plasma channels with on-axis densities in the range… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 13 pages, 14 figures (including Supplemental Material)

    Journal ref: Physical Review Research vol. 6, L022001 (2024)

  22. arXiv:2309.07181  [pdf, other

    cs.SE cs.LG

    The Grand Illusion: The Myth of Software Portability and Implications for ML Progress

    Authors: Fraser Mince, Dzung Dinh, Jonas Kgomo, Neil Thompson, Sara Hooker

    Abstract: Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if softwa… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 28 pages, 13 figures, repo can be found at associated https://github.com/for-ai/portability

  23. arXiv:2309.05444  [pdf, other

    cs.CL cs.LG

    Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

    Authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

    Abstract: The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architectur… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  24. arXiv:2309.04564  [pdf, other

    cs.CL cs.LG

    When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

    Authors: Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

    Abstract: Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scraping the internet, leading to pretraining datasets comprised of noisy web text. To date, efforts to prune these datasets down to a higher quality subset have relied on hand-crafted heuristics encoded as rule-based filters. In this work… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  25. arXiv:2307.13689  [pdf, other

    physics.acc-ph physics.plasm-ph

    All-optical GeV electron bunch generation in a laser-plasma accelerator via truncated-channel injection

    Authors: A. Picksley, J. Chappell, E. Archer, N. Bourgeois, J. Cowley, D. R. Emerson, L. Feder, X. J. Gu, O. Jakobsson, A. J. Ross, W. Wang, R. Walczak, S. M. Hooker

    Abstract: We describe a simple scheme, truncated-channel injection, to inject electrons directly into the wakefield driven by a drive pulse guided by an all-optical plasma channel. We use this approach to generate dark-current-free 1.2 GeV, 4.5 % relative energy spread electron bunches with 120 TW laser pulses guided in a 110-mm-long hydrodynamic optical-field-ionized (HOFI) plasma channel. Our experiments… ▽ More

    Submitted 9 January, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

  26. arXiv:2307.03718  [pdf, other

    cs.CY cs.AI

    Frontier AI Regulation: Managing Emerging Risks to Public Safety

    Authors: Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf

    Abstract: Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilit… ▽ More

    Submitted 7 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Update July 11th: - Added missing footnote back in. - Adjusted author order (mistakenly non-alphabetical among the first 6 authors) and adjusted affiliations (Jess Whittlestone's affiliation was mistagged and Gillian Hadfield had SRI added to her affiliations) Updated September 4th: Various typos

  27. Measurement of the decay of laser-driven linear plasma wakefields

    Authors: J. Jonnerby, A. von Boetticher, J. Holloway, L. Corner, A. Picksley, A. J. Ross, R. J. Shalloo, C. Thornton, N. Bourgeois, R. Walczak, S. M. Hooker

    Abstract: We present the first measurements of the temporal decay rate of one-dimensional, linear Langmuir waves excited by an ultra-short laser pulse. Langmuir waves with relative amplitudes of approximately $6\%$ were driven by $1.7$ J, $50$ fs laser pulses in hydrogen and deuterium plasmas of density $n_{e0} = 8.4 \times 10^{17}$ cm$^{-3}$. The wakefield lifetimes were measured to be… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  28. arXiv:2306.05949  [pdf, other

    cs.CY cs.AI

    Evaluating the Social Impact of Generative AI Systems in Systems and Society

    Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman , et al. (6 additional authors not shown)

    Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Forthcoming in Hacker, Engel, Hammer, Mittelstadt (eds), Oxford Handbook on the Foundations and Regulation of Generative AI. Oxford University Press

  29. arXiv:2305.19268  [pdf, other

    cs.LG cs.AI

    Intriguing Properties of Quantization at Scale

    Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

    Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 32 pages, 14 figures

  30. arXiv:2305.16779  [pdf, other

    physics.plasm-ph physics.acc-ph physics.flu-dyn physics.optics

    Demonstration of tunability of HOFI waveguides via start-to-end simulations

    Authors: S. M. Mewes, G. J. Boyle, A. Ferran Pousa, R. J. Shalloo, J. Osterhoff, C. Arran, L. Corner, R. Walczak, S. M. Hooker, M. Thévenet

    Abstract: In recent years, hydrodynamic optical-field-ionized (HOFI) channels have emerged as a promising technique to create laser waveguides suitable for guiding tightly-focused laser pulses in a plasma, as needed for laser-plasma accelerators. While experimental advances in HOFI channels continue to be made, the underlying mechanisms and the roles of the main parameters remain largely unexplored. In this… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 8 pages (+5 appendix), 7 figures, submitted to PRResearch

  31. arXiv:2304.12397  [pdf, other

    cs.CL cs.AI

    On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relat… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  32. arXiv:2303.14032  [pdf, other

    physics.plasm-ph physics.acc-ph

    Stability of the Modulator in a Plasma-Modulated Plasma Accelerator

    Authors: Johannes J. van de Wetering, Simon M. Hooker, Roman Walczak

    Abstract: We explore the regime of operation of the modulator stage of a recently proposed laser-plasma accelerator scheme [Phys. Rev. Lett. 127, 184801 (2021)], dubbed the Plasma-Modulated Plasma Accelerator (P-MoPA). The P-MoPA scheme offers a potential route to high-repetition-rate, GeV-scale plasma accelerators driven by picosecond-duration laser pulses from, for example, kilohertz thin-disk lasers. The… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 8 pages, 5 figures plus supplementary materials

  33. arXiv:2303.07723  [pdf, other

    physics.plasm-ph physics.acc-ph

    Modulational instability in large-amplitude linear laser wakefields

    Authors: Alexander von Boetticher, Roman Walczak, Simon Hooker

    Abstract: We investigate the growth of ion density perturbations in large-amplitude linear laser wakefields via two-dimensional particle-in-cell simulations. Growth rates and wave numbers are found to be consistent with a longitudinal strong-field modulational instability (SFMI). We examine the transverse dependence of the instability for a Gaussian wakefield envelope and show that growth rates and wavenumb… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 6 pages, 4 figures

    Journal ref: Physical Review E vol. 107, L023201 (2023)

  34. arXiv:2303.00586  [pdf, other

    stat.ML cs.AI cs.CV cs.CY cs.LG

    FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

    Authors: Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

    Abstract: Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble -- all the individual DNNs share the same training set, architecture… ▽ More

    Submitted 20 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  35. Intriguing Properties of Compression on Multilingual Models

    Authors: Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer

    Abstract: Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingu… ▽ More

    Submitted 25 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: Accepted to EMNLP 2022

  36. arXiv:2210.14986  [pdf, other

    cs.CL

    The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

    Authors: Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

    Abstract: Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context -- incorporating its pragmatics. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meani… ▽ More

    Submitted 3 December, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted as Spotlight at NeurIPS 2023

  37. arXiv:2209.10015  [pdf, other

    cs.LG cs.AI

    Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

    Authors: Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker

    Abstract: Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or raw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  38. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  39. arXiv:2207.00200  [pdf, other

    cs.LG cs.CV

    Studying the impact of magnitude pruning on contrastive learning methods

    Authors: Francesco Corti, Rahim Entezari, Sara Hooker, Davide Bacciu, Olga Saukh

    Abstract: We study the impact of different pruning techniques on the representation learned by deep neural networks trained with contrastive loss functions. Our work finds that at high sparsity levels, contrastive learning results in a higher number of misclassified examples relative to models trained with traditional cross-entropy loss. To understand this pronounced difference, we use metrics such as the n… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  40. arXiv:2206.06479  [pdf, other

    cs.LG

    Robust Distillation for Worst-class Performance

    Authors: Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

    Abstract: Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  41. arXiv:2203.08366  [pdf, other

    physics.acc-ph

    Linear colliders based on laser-plasma accelerators

    Authors: C. Benedetti, S. S. Bulanov, E. Esarey, C. G. R. Geddes, A. J. Gonsalves, A. Huebl, R. Lehe, K. Nakamura, C. B. Schroeder, D. Terzani, J. van Tilborg, M. Turner, J. -L. Vay, T. Zhou, F. Albert, J. Bromage, E. M. Campbell, D. H. Froula, J. P. Palastro, J. Zuegel, D. Bruhwiler, N. M. Cook, B. Cros, M. C. Downer, M. Fuchs , et al. (18 additional authors not shown)

    Abstract: White paper to the Proceedings of the U.S. Particle Physics Community Planning Exercise (Snowmass 2021): Linear colliders based on laser-plasma accelerators

    Submitted 4 July, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Contribution to Snowmass 2021, Accelerator Frontier

  42. arXiv:2201.07895  [pdf

    physics.acc-ph hep-ex

    European Strategy for Particle Physics -- Accelerator R&D Roadmap

    Authors: C. Adolphsen, D. Angal-Kalinin, T. Arndt, M. Arnold, R. Assmann, B. Auchmann, K. Aulenbacher, A. Ballarino, B. Baudouy, P. Baudrenghien, M. Benedikt, S. Bentvelsen, A. Blondel, A. Bogacz, F. Bossi, L. Bottura, S. Bousson, O. Brüning, R. Brinkmann, M. Bruker, O. Brunner, P. N. Burrows, G. Burt, S. Calatroni, K. Cassou , et al. (111 additional authors not shown)

    Abstract: The 2020 update of the European Strategy for Particle Physics emphasised the importance of an intensified and well-coordinated programme of accelerator R&D, supporting the design and delivery of future particle accelerators in a timely, affordable and sustainable way. This report sets out a roadmap for European accelerator R&D for the next five to ten years, covering five topical areas identified… ▽ More

    Submitted 30 March, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 270 pages, 58 figures. Editor: N. Mounet. LDG chair: D. Newbold. Panel chairs: P. Védrine (HFM), S. Bousson (RF), R. Assmann (plasma), D. Schulte (muon), M. Klein (ERL). Panel editors: B. Baudouy (HFM), L. Bottura (HFM), S. Bousson (RF), G. Burt (RF), R. Assmann (plasma), E. Gschwendtner (plasma), R. Ischebeck (plasma), C. Rogers (muon), D. Schulte (muon), M. Klein (ERL)

    Report number: CERN-2022-001

    Journal ref: European Strategy for Particle Physics - Accelerator R&D Roadmap, N. Mounet (ed.), CERN Yellow Reports: Monographs, CERN-2022-001 (CERN, Geneva, 2022)

  43. arXiv:2201.05610  [pdf, other

    cs.LG cs.CV

    When less is more: Simplifying inputs aids neural network understanding

    Authors: Robin Tibor Schirrmeister, Rosanne Liu, Sara Hooker, Tonio Ball

    Abstract: How do neural network image classifiers respond to simpler and simpler inputs? And what do such responses reveal about the learning process? To answer these questions, we need a clear measure of input simplicity (or inversely, complexity), an optimization objective that correlates with simplification, and a framework to incorporate such objective into training and inference. Lastly we need a varie… ▽ More

    Submitted 1 February, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

    ACM Class: I.2.6

  44. arXiv:2110.03036  [pdf, other

    cs.CL cs.AI

    The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

    Authors: Orevaoghene Ahia, Julia Kreutzer, Sara Hooker

    Abstract: A "bigger is better" explosion in the number of parameters in deep neural networks has made it increasingly challenging to make state-of-the-art networks accessible in compute-restricted environments. Compression techniques have taken on renewed importance as a way to bridge the gap. However, evaluation of the trade-offs incurred by popular compression techniques has been centered on high-resource… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of EMNLP 2021

  45. arXiv:2110.00448  [pdf, other

    physics.plasm-ph physics.acc-ph

    Demonstration of kilohertz operation of Hydrodynamic Optical-Field-Ionized Plasma Channels

    Authors: A. Alejo, J. Cowley, A. Picksley, R. Walczak, S. M. Hooker

    Abstract: We demonstrate experimentally that hydrodynamic optical-field-ionized (HOFI) plasma channels can be generated at kHz-scale pulse repetition rates, in a static gas cell and for an extended period. Using a pump-probe arrangement, we show via transverse interferometry that the properties of two HOFI channels generated \SI{1}{ms} apart are essentially the same. We demonstrate that HOFI channels can be… ▽ More

    Submitted 3 March, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Journal publication can be cited as https://doi.org/10.1103/PhysRevAccelBeams.25.011301 . Raw data can be downloaded from https://doi.org/10.5281/zenodo.6242523 . 8 pages, 4 figures

    Journal ref: Phys. Rev. Accel. Beams 25, 011301 (2022)

  46. GeV-scale accelerators driven by plasma-modulated pulses from kilohertz lasers

    Authors: O. Jakobsson, S. M. Hooker, R. Walczak

    Abstract: We describe a new approach for driving GeV-scale plasma accelerators with long laser pulses. We show that the temporal phase of a long, high-energy driving laser pulse can be modulated periodically by co-propagating it with low-amplitude plasma wave driven by a short, low-energy seed pulse. Compression of the modulated driver by a dispersive optic generates a train of short pulses suitable for res… ▽ More

    Submitted 27 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 13 pages, 7 figures (including Supplemental Material). Published as a letter by PRL

    Journal ref: Physical Review Letters Vol. 127, No. 18 (2021)

  47. arXiv:2107.13098  [pdf, other

    cs.CV cs.LG

    A Tale Of Two Long Tails

    Authors: Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

    Abstract: As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: Preliminary results accepted to Workshop on Uncertainty and Robustness in Deep Learning (UDL), ICML, 2021

  48. arXiv:2107.07741  [pdf, other

    cs.LG

    When does loss-based prioritization fail?

    Authors: Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski

    Abstract: Not all examples are created equal, but standard deep neural network training protocols treat each training point uniformly. Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol. Recent work has proposed ways to accelerate training by deviating from this uniform treatment. Popular meth… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  49. arXiv:2106.11872  [pdf, other

    cs.LG cs.NE

    Randomness In Neural Network Training: Characterizing The Impact of Tooling

    Authors: Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker

    Abstract: The quest for determinism in machine learning has disproportionately focused on characterizing the impact of noise introduced by algorithmic design choices. In this work, we address a less well understood and studied question: how does our choice of tooling introduce randomness to deep neural network training. We conduct large scale experiments across different types of hardware, accelerators, sta… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: 21 pages, 10 figures

  50. arXiv:2102.01670  [pdf, other

    cs.LG cs.CV

    Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

    Authors: Kale-ab Tessera, Sara Hooker, Benjamin Rosman

    Abstract: Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider the role of regularization,… ▽ More

    Submitted 15 June, 2021; v1 submitted 2 February, 2021; originally announced February 2021.