Sequence to sequence learning with neural networks
Abstract Deep Neural Networks (DNNs) are powerful models that have achieved excellent
performance on difficult learning tasks. Although DNNs work well whenever large labeled�…
performance on difficult learning tasks. Although DNNs work well whenever large labeled�…
Convolutional sequence to sequence learning
The prevalent approach to sequence to sequence learning maps an input sequence to a
variable length output sequence via recurrent neural networks. We introduce an architecture�…
variable length output sequence via recurrent neural networks. We introduce an architecture�…
Semi-supervised sequence learning
We present two approaches to use unlabeled data to improve Sequence Learningwith
recurrent networks. The first approach is to predict what comes next in asequence, which is�…
recurrent networks. The first approach is to predict what comes next in asequence, which is�…
Sequence-to-sequence learning with latent neural grammars
Y Kim�- Advances in Neural Information Processing�…, 2021 - proceedings.neurips.cc
Sequence-to-sequence learning with neural networks has become the de facto standard for
sequence modeling. This approach typically models the local distribution over the next�…
sequence modeling. This approach typically models the local distribution over the next�…
Grid long short-term memory
This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a
multidimensional grid that can be applied to vectors, sequences or higher dimensional data�…
multidimensional grid that can be applied to vectors, sequences or higher dimensional data�…
Von mises-fisher loss for training sequence to sequence models with continuous outputs
S Kumar, Y Tsvetkov�- arXiv preprint arXiv:1812.04616, 2018 - arxiv.org
The Softmax function is used in the final layer of nearly all existing sequence-to-sequence
models for language generation. However, it is usually the slowest layer to compute which�…
models for language generation. However, it is usually the slowest layer to compute which�…
Classical structured prediction losses for sequence to sequence learning
There has been much recent work on training neural attention models at the sequence-level
using either reinforcement learning-style methods or by optimizing the beam. In this paper�…
using either reinforcement learning-style methods or by optimizing the beam. In this paper�…
Compressive transformers for long-range sequence modelling
We present the Compressive Transformer, an attentive sequence model which compresses
past memories for long-range sequence learning. We find the Compressive Transformer�…
past memories for long-range sequence learning. We find the Compressive Transformer�…
Muse: Parallel multi-scale attention for sequence to sequence learning
In sequence to sequence learning, the self-attention mechanism proves to be highly
effective, and achieves significant improvements in many tasks. However, the self-attention�…
effective, and achieves significant improvements in many tasks. However, the self-attention�…
[PDF][PDF] Recurrent residual learning for sequence classification
In this paper, we explore the possibility of leveraging Residual Networks (ResNet), a
powerful structure in constructing extremely deep neural network for image understanding�…
powerful structure in constructing extremely deep neural network for image understanding�…