William Merrill

New York, New York, United States Contact Info
440 followers 419 connections

Join to view profile

About

I’m currently a PhD student at NYU and have previously worked at Google Research and the…

Activity

Join now to see all activity

Experience & Education

  • NYU Center for Data Science

View William’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

  • Sequential Neural Networks as Automata

    Deep Learning and Formal Languages workshop at ACL 2019

    This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the…

    This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.

    See publication
  • Finding Syntactic Representations in Neural Stacks

    Analyzing and Interpreting Neural Networks for NLP workshop at ACL 2019

    Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities. It has, however, proven difficult to assess the degree to which such a bias is effective, as the operation of the differentiable stack is not always interpretable. In this paper, we attempt to detect the presence of latent representations of hierarchical structure through an exploration of the unsupervised learning of constituency structure…

    Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities. It has, however, proven difficult to assess the degree to which such a bias is effective, as the operation of the differentiable stack is not always interpretable. In this paper, we attempt to detect the presence of latent representations of hierarchical structure through an exploration of the unsupervised learning of constituency structure. Using a technique due to Shen et al. (2018a,b), we extract syntactic trees from the pushing behavior of stack RNNs trained on language modeling and classification objectives. We find that our models produce parses that reflect natural language syntactic constituencies, demonstrating that stack RNNs do indeed infer linguistically relevant hierarchical structure.

    See publication
  • Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

    Computational Approaches to Historical Language Change workshop at ACL 2019

    We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger's ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used…

    We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger's ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used to predict novel sentences' years of composition relatively well. Comparisons to a feedforward baseline suggest that the temporal change learned by the LSTM is syntactic rather than purely lexical. Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.

    Other authors
    See publication
  • Context-Free Transductions with Neural Stacks

    Analyzing and Interpreting Neural Networks for NLP workshop at EMNLP 2018

    Co-lead author.

    This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modeling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are…

    Co-lead author.

    This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modeling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex networks often find approximate solutions by using the stack as unstructured memory.

    See publication
  • End-to-end Graph-based TAG Parsing with Neural Networks

    NAACL 2018

    We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end parser, which jointly performs supertagging, POS tagging, and parsing, outperforms the previously reported best results by more than 2.2 LAS and UAS points. The graph-based parsing architecture allows for global inference and rich feature representations for TAG parsing, alleviating the fundamental trade-off between transition-based and graph-based…

    We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end parser, which jointly performs supertagging, POS tagging, and parsing, outperforms the previously reported best results by more than 2.2 LAS and UAS points. The graph-based parsing architecture allows for global inference and rich feature representations for TAG parsing, alleviating the fundamental trade-off between transition-based and graph-based parsing systems. We also demonstrate that the proposed parser achieves state-of-the-art performance in the downstream tasks of Parsing Evaluation using Textual Entailments (PETE) and Unbounded Dependency Recovery. This provides further support for the claim that TAG is a viable formalism for problems that require rich structural analysis of sentences.

    See publication

Courses

  • Advanced NLP

    CPSC 677

  • Algorithms

    CPSC 365

  • Computational Complexity Theory

    CPSC 468

  • Deep Learning Theory and Applications

    CPSC 663

  • Formal Foundations of Linguistic Theories

    LING 224

  • Introduction to Analysis

    MATH 301

  • Introduction to Systems Programming and Computer Organization

    CPSC 323

  • NLP

    CPSC 477

  • Neural Networks and Language

    LING 380

  • Semantics I

    LING 263

  • Syntax I

    LING 253

  • Vector Calculus and Linear Algebra

    MATH 230/231

Projects

  • The Book of Thoth

    - Present

    In January 2016, Toby Jaroslaw and I began work on the Book of Thoth, an indie game that has users write spells to get past enemies and puzzles. The game uses natural language processing techniques to interpret words (written in a language based on ancient Egyptian) into spells. There are no preset spells, so players must create their own by combining the words they have unlocked in dynamic ways. I created the spell interpreter system, core game engine components like hit detection and texture…

    In January 2016, Toby Jaroslaw and I began work on the Book of Thoth, an indie game that has users write spells to get past enemies and puzzles. The game uses natural language processing techniques to interpret words (written in a language based on ancient Egyptian) into spells. There are no preset spells, so players must create their own by combining the words they have unlocked in dynamic ways. I created the spell interpreter system, core game engine components like hit detection and texture blending, and much of the levels and gameplay. The game is written from scratch in Java.

    Other creators
    See project
  • DeepMusic

    -

    A generative neural network architecture for composing music from a single song.

    Other creators
    See project
  • Voynich2Vec

    -

    Used word embedding techniques to analyze the Voynich, an undeciphered medieval manuscript in Yale's Beinecke library. We believe we have successfully identified morphological alternations in the manuscript.

    Other creators
    • Eli Baum
    See project

Honors & Awards

  • Grace Hopper Prize for Computer Science Finalist

    Yale University Computer Science Department

    My team designed a deep learning model that picks winning teams in the video game Dota 2, for which we were named a finalist in Yale's Grace Hopper Prize for Computer Science.

    After the prize, we went on to build a live web app demo for the project.

    Live demo: http://draftnet.herokuapp.com/

  • Rising Scientist Award

    Child Mind Institute

    Awarded for my research on the neurolinguistics of texting acronyms done during high school.

  • Keynote Speaker at Packer Science Research Symposium 2018

    Packer Collegiate Institute

    I gave the keynote speech at the research symposium at my former high school. Former speakers had all been tenured professors.

Languages

  • Icelandic

    Limited working proficiency

  • Latin

    Full professional proficiency

  • Norse, Old

    Full professional proficiency

  • English, Old (ca.450-1100)

    Full professional proficiency

More activity by William

View William’s full profile

  • See who you know in common
  • Get introduced
  • Contact William directly
Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named William Merrill in United States

Add new skills with these courses