Google Scholar

Sequence to sequence learning with neural networks

I Sutskever, O Vinyals, QV Le�- Advances in neural�…, 2014 - proceedings.neurips.cc

Abstract Deep Neural Networks (DNNs) are powerful models that have achieved excellent
performance on difficult learning tasks. Although DNNs work well whenever large labeled�…

Save Cite Cited by 26310 Related articles All 32 versions View as HTML

[PDF] mlr.press

Convolutional sequence to sequence learning

J Gehring, M Auli, D Grangier…�- International�…, 2017 - proceedings.mlr.press

The prevalent approach to sequence to sequence learning maps an input sequence to a
variable length output sequence via recurrent neural networks. We introduce an architecture�…

Save Cite Cited by 4108 Related articles All 8 versions View as HTML

[PDF] neurips.cc

Semi-supervised sequence learning

AM Dai, QV Le�- Advances in neural information processing�…, 2015 - proceedings.neurips.cc

We present two approaches to use unlabeled data to improve Sequence Learningwith
recurrent networks. The first approach is to predict what comes next in asequence, which is�…

Save Cite Cited by 1594 Related articles All 14 versions View as HTML

[PDF] neurips.cc

Sequence-to-sequence learning with latent neural grammars

Y Kim�- Advances in Neural Information Processing�…, 2021 - proceedings.neurips.cc

Sequence-to-sequence learning with neural networks has become the de facto standard for
sequence modeling. This approach typically models the local distribution over the next�…

Save Cite Cited by 41 Related articles All 9 versions View as HTML

[PDF] arxiv.org

Grid long short-term memory

N Kalchbrenner, I Danihelka, A Graves�- arXiv preprint arXiv:1507.01526, 2015 - arxiv.org

This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a
multidimensional grid that can be applied to vectors, sequences or higher dimensional data�…

Save Cite Cited by 444 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Von mises-fisher loss for training sequence to sequence models with continuous outputs

S Kumar, Y Tsvetkov�- arXiv preprint arXiv:1812.04616, 2018 - arxiv.org

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence
models for language generation. However, it is usually the slowest layer to compute which�…

Save Cite Cited by 78 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Classical structured prediction losses for sequence to sequence learning

S Edunov, M Ott, M Auli, D Grangier…�- arXiv preprint arXiv�…, 2017 - arxiv.org

There has been much recent work on training neural attention models at the sequence-level
using either reinforcement learning-style methods or by optimizing the beam. In this paper�…

Save Cite Cited by 203 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Compressive transformers for long-range sequence modelling

JW Rae, A Potapenko, SM Jayakumar…�- arXiv preprint arXiv�…, 2019 - arxiv.org

We present the Compressive Transformer, an attentive sequence model which compresses
past memories for long-range sequence learning. We find the Compressive Transformer�…

Save Cite Cited by 497 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Muse: Parallel multi-scale attention for sequence to sequence learning

G Zhao, X Sun, J Xu, Z Zhang, L Luo�- arXiv preprint arXiv:1911.09483, 2019 - arxiv.org

In sequence to sequence learning, the self-attention mechanism proves to be highly
effective, and achieves significant improvements in many tasks. However, the self-attention�…

Save Cite Cited by 53 Related articles All 3 versions View as HTML

[PDF] aclanthology.org

[PDF][PDF] Recurrent residual learning for sequence classification

Y Wang, F Tian�- Proceedings of the 2016 conference on�…, 2016 - aclanthology.org

In this paper, we explore the possibility of leveraging Residual Networks (ResNet), a
powerful structure in constructing extremely deep neural network for image understanding�…

Save Cite Cited by 117 Related articles All 4 versions View as HTML

Cite

Advanced search

Saved to My library

Sequence to sequence learning with neural networks

Convolutional sequence to sequence learning

Semi-supervised sequence learning

Sequence-to-sequence learning with latent neural grammars

Grid long short-term memory

Von mises-fisher loss for training sequence to sequence models with continuous outputs

Classical structured prediction losses for sequence to sequence learning

Compressive transformers for long-range sequence modelling

Muse: Parallel multi-scale attention for sequence to sequence learning

[PDF][PDF] Recurrent residual learning for sequence classification