Sequence to sequence learning with neural networks

I Sutskever, O Vinyals, QV Le�- Advances in neural�…, 2014 - proceedings.neurips.cc
Abstract Deep Neural Networks (DNNs) are powerful models that have achieved excellent
performance on difficult learning tasks. Although DNNs work well whenever large labeled�…

Convolutional sequence to sequence learning

J Gehring, M Auli, D Grangier…�- International�…, 2017 - proceedings.mlr.press
The prevalent approach to sequence to sequence learning maps an input sequence to a
variable length output sequence via recurrent neural networks. We introduce an architecture�…

Semi-supervised sequence learning

AM Dai, QV Le�- Advances in neural information processing�…, 2015 - proceedings.neurips.cc
We present two approaches to use unlabeled data to improve Sequence Learningwith
recurrent networks. The first approach is to predict what comes next in asequence, which is�…

Sequence-to-sequence learning with latent neural grammars

Y Kim�- Advances in Neural Information Processing�…, 2021 - proceedings.neurips.cc
Sequence-to-sequence learning with neural networks has become the de facto standard for
sequence modeling. This approach typically models the local distribution over the next�…

Grid long short-term memory

N Kalchbrenner, I Danihelka, A Graves�- arXiv preprint arXiv:1507.01526, 2015 - arxiv.org
This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a
multidimensional grid that can be applied to vectors, sequences or higher dimensional data�…

Von mises-fisher loss for training sequence to sequence models with continuous outputs

S Kumar, Y Tsvetkov�- arXiv preprint arXiv:1812.04616, 2018 - arxiv.org
The Softmax function is used in the final layer of nearly all existing sequence-to-sequence
models for language generation. However, it is usually the slowest layer to compute which�…

Classical structured prediction losses for sequence to sequence learning

S Edunov, M Ott, M Auli, D Grangier…�- arXiv preprint arXiv�…, 2017 - arxiv.org
There has been much recent work on training neural attention models at the sequence-level
using either reinforcement learning-style methods or by optimizing the beam. In this paper�…

Compressive transformers for long-range sequence modelling

JW Rae, A Potapenko, SM Jayakumar…�- arXiv preprint arXiv�…, 2019 - arxiv.org
We present the Compressive Transformer, an attentive sequence model which compresses
past memories for long-range sequence learning. We find the Compressive Transformer�…

Muse: Parallel multi-scale attention for sequence to sequence learning

G Zhao, X Sun, J Xu, Z Zhang, L Luo�- arXiv preprint arXiv:1911.09483, 2019 - arxiv.org
In sequence to sequence learning, the self-attention mechanism proves to be highly
effective, and achieves significant improvements in many tasks. However, the self-attention�…

[PDF][PDF] Recurrent residual learning for sequence classification

Y Wang, F Tian�- Proceedings of the 2016 conference on�…, 2016 - aclanthology.org
In this paper, we explore the possibility of leveraging Residual Networks (ResNet), a
powerful structure in constructing extremely deep neural network for image understanding�…