Xlnet: Generalized autoregressive pretraining for language understanding

Z Yang, Z Dai, Y Yang, J Carbonell…�- Advances in neural�…, 2019 - proceedings.neurips.cc
With the capability of modeling bidirectional contexts, denoising autoencoding based
pretraining like BERT achieves better performance than pretraining approaches based on�…

Mpnet: Masked and permuted pre-training for language understanding

K Song, X Tan, T Qin, J Lu…�- Advances in neural�…, 2020 - proceedings.neurips.cc
BERT adopts masked language modeling (MLM) for pre-training and is one of the most
successful pre-training models. Since BERT neglects dependency among predicted tokens�…

Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling

A Wang, J Hula, P Xia, R Pappagari, RT McCoy…�- arXiv preprint arXiv�…, 2018 - arxiv.org
Natural language understanding has recently seen a surge of progress with the use of
sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which�…

Structbert: Incorporating language structures into pre-training for deep language understanding

W Wang, B Bi, M Yan, C Wu, Z Bao, J Xia…�- arXiv preprint arXiv�…, 2019 - arxiv.org
Recently, the pre-trained language model, BERT (and its robustly optimized version
RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and�…

Glm: General language model pretraining with autoregressive blank infilling

Z Du, Y Qian, X Liu, M Ding, J Qiu, Z Yang…�- arXiv preprint arXiv�…, 2021 - arxiv.org
There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5)�…

How to fine-tune bert for text classification?

C Sun, X Qiu, Y Xu, X Huang�- …�: 18th China national conference, CCL 2019�…, 2019 - Springer
Abstract Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training model, BERT�…

Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha�- arXiv preprint arXiv�…, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These�…

How fine can fine-tuning be? learning efficient language models

E Radiya-Dixit, X Wang�- International Conference on�…, 2020 - proceedings.mlr.press
State-of-the-art performance on language understanding tasks is now achieved with
increasingly large networks; the current record holder has billions of parameters. Given a�…

Prompt tuning for discriminative pre-trained language models

Y Yao, B Dong, A Zhang, Z Zhang, R Xie, Z Liu…�- arXiv preprint arXiv�…, 2022 - arxiv.org
Recent works have shown promising results of prompt tuning in stimulating pre-trained
language models (PLMs) for natural language processing (NLP) tasks. However, to the best�…

Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks

J Phang, T F�vry, SR Bowman�- arXiv preprint arXiv:1811.01088, 2018 - arxiv.org
Pretraining sentence encoders with language modeling and related unsupervised tasks has
recently been shown to be very effective for language understanding tasks. By�…