Xlnet: Generalized autoregressive pretraining for language understanding
With the capability of modeling bidirectional contexts, denoising autoencoding based
pretraining like BERT achieves better performance than pretraining approaches based on�…
pretraining like BERT achieves better performance than pretraining approaches based on�…
Mpnet: Masked and permuted pre-training for language understanding
BERT adopts masked language modeling (MLM) for pre-training and is one of the most
successful pre-training models. Since BERT neglects dependency among predicted tokens�…
successful pre-training models. Since BERT neglects dependency among predicted tokens�…
Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling
Natural language understanding has recently seen a surge of progress with the use of
sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which�…
sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which�…
Structbert: Incorporating language structures into pre-training for deep language understanding
Recently, the pre-trained language model, BERT (and its robustly optimized version
RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and�…
RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and�…
Glm: General language model pretraining with autoregressive blank infilling
There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5)�…
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5)�…
How to fine-tune bert for text classification?
Abstract Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training model, BERT�…
language representations. As a state-of-the-art language model pre-training model, BERT�…
Ammus: A survey of transformer-based pretrained models in natural language processing
KS Kalyan, A Rajasekharan, S Sangeetha�- arXiv preprint arXiv�…, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These�…
almost every NLP task. The evolution of these models started with GPT and BERT. These�…
How fine can fine-tuning be? learning efficient language models
E Radiya-Dixit, X Wang�- International Conference on�…, 2020 - proceedings.mlr.press
State-of-the-art performance on language understanding tasks is now achieved with
increasingly large networks; the current record holder has billions of parameters. Given a�…
increasingly large networks; the current record holder has billions of parameters. Given a�…
Prompt tuning for discriminative pre-trained language models
Recent works have shown promising results of prompt tuning in stimulating pre-trained
language models (PLMs) for natural language processing (NLP) tasks. However, to the best�…
language models (PLMs) for natural language processing (NLP) tasks. However, to the best�…
Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks
Pretraining sentence encoders with language modeling and related unsupervised tasks has
recently been shown to be very effective for language understanding tasks. By�…
recently been shown to be very effective for language understanding tasks. By�…