Google Scholar

Xlnet: Generalized autoregressive pretraining for language understanding

Z Yang, Z Dai, Y Yang, J Carbonell…�- Advances in neural�…, 2019 - proceedings.neurips.cc

With the capability of modeling bidirectional contexts, denoising autoencoding based
pretraining like BERT achieves better performance than pretraining approaches based on�…

Save Cite Cited by 9564 Related articles All 14 versions View as HTML

[PDF] neurips.cc

Mpnet: Masked and permuted pre-training for language understanding

K Song, X Tan, T Qin, J Lu…�- Advances in neural�…, 2020 - proceedings.neurips.cc

BERT adopts masked language modeling (MLM) for pre-training and is one of the most
successful pre-training models. Since BERT neglects dependency among predicted tokens�…

Save Cite Cited by 885 Related articles All 8 versions View as HTML

[PDF] arxiv.org

Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling

A Wang, J Hula, P Xia, R Pappagari, RT McCoy…�- arXiv preprint arXiv�…, 2018 - arxiv.org

Natural language understanding has recently seen a surge of progress with the use of
sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which�…

Save Cite Cited by 118 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Structbert: Incorporating language structures into pre-training for deep language understanding

W Wang, B Bi, M Yan, C Wu, Z Bao, J Xia…�- arXiv preprint arXiv�…, 2019 - arxiv.org

Recently, the pre-trained language model, BERT (and its robustly optimized version
RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and�…

Save Cite Cited by 155 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Glm: General language model pretraining with autoregressive blank infilling

Z Du, Y Qian, X Liu, M Ding, J Qiu, Z Yang…�- arXiv preprint arXiv�…, 2021 - arxiv.org

There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5)�…

Save Cite Cited by 913 Related articles All 4 versions View as HTML

[PDF] github.io

How to fine-tune bert for text classification?

C Sun, X Qiu, Y Xu, X Huang�- …�: 18th China national conference, CCL 2019�…, 2019 - Springer

Abstract Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training model, BERT�…

Save Cite Cited by 1874 Related articles All 11 versions

[PDF] arxiv.org

Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha�- arXiv preprint arXiv�…, 2021 - arxiv.org

Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These�…

Save Cite Cited by 262 Related articles All 2 versions View as HTML

[PDF] mlr.press

How fine can fine-tuning be? learning efficient language models

E Radiya-Dixit, X Wang�- International Conference on�…, 2020 - proceedings.mlr.press

State-of-the-art performance on language understanding tasks is now achieved with
increasingly large networks; the current record holder has billions of parameters. Given a�…

Save Cite Cited by 52 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Prompt tuning for discriminative pre-trained language models

Y Yao, B Dong, A Zhang, Z Zhang, R Xie, Z Liu…�- arXiv preprint arXiv�…, 2022 - arxiv.org

Recent works have shown promising results of prompt tuning in stimulating pre-trained
language models (PLMs) for natural language processing (NLP) tasks. However, to the best�…

Save Cite Cited by 24 Related articles All 7 versions View as HTML

[PDF] arxiv.org

Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks

J Phang, T F�vry, SR Bowman�- arXiv preprint arXiv:1811.01088, 2018 - arxiv.org

Pretraining sentence encoders with language modeling and related unsupervised tasks has
recently been shown to be very effective for language understanding tasks. By�…

Save Cite Cited by 454 Related articles All 2 versions View as HTML

Cite

Advanced search

Saved to My library

Xlnet: Generalized autoregressive pretraining for language understanding

Mpnet: Masked and permuted pre-training for language understanding

Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling

Structbert: Incorporating language structures into pre-training for deep language understanding

Glm: General language model pretraining with autoregressive blank infilling

How to fine-tune bert for text classification?

Ammus: A survey of transformer-based pretrained models in natural language processing

How fine can fine-tuning be? learning efficient language models

Prompt tuning for discriminative pre-trained language models

Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks

Related searches