Skip to main content

Showing 1–4 of 4 results for author: Nystrom, A

  1. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  2. arXiv:2107.06499  [pdf, other

    cs.CL cs.LG

    Deduplicating Training Data Makes Language Models Better

    Authors: Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini

    Abstract: We find that existing language modeling datasets contain many near-duplicate examples and long repetitive substrings. As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data. We develop two tools that allow us to deduplicate training datasets -- for example removing from C4 a single 61 word English sentence that is repeat… ▽ More

    Submitted 24 March, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted to ACL 2022

  3. arXiv:1910.00927  [pdf, other

    cs.LG

    Stabilizing Generative Adversarial Networks: A Survey

    Authors: Maciej Wiatrak, Stefano V. Albrecht, Andrew Nystrom

    Abstract: Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data. Despite their recent successes, the process of training GANs remains challenging, suffering from instability problems such as non-convergence, vanishing or exploding gradients, and mode collapse. In recent years, a diverse set of approaches… ▽ More

    Submitted 24 March, 2020; v1 submitted 29 September, 2019; originally announced October 2019.

  4. arXiv:1803.06418  [pdf, other

    cs.DS math.NA

    Leveraging Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices Using $K$-Simplex Numbers

    Authors: Andrew Nystrom, John Hughes

    Abstract: An algorithm is provided for performing polynomial feature expansions that both operates on and produces compressed sparse row (CSR) matrices. Previously, no such algorithm existed, and performing polynomial expansions on CSR matrices required an intermediate densification step. The algorithm performs a $K$-degree expansion by using a bijective function involving $K$-simplex numbers of column indi… ▽ More

    Submitted 10 September, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: 6 pages, 2 figures