Learning transferable architectures for scalable image recognition
Developing neural network image classification models often requires significant
architecture engineering. In this paper, we study a method to learn the model architectures�…
architecture engineering. In this paper, we study a method to learn the model architectures�…
Neural architecture search with reinforcement learning
Neural networks are powerful and flexible models that work well for many difficult learning
tasks in image, speech and natural language understanding. Despite their success, neural�…
tasks in image, speech and natural language understanding. Despite their success, neural�…
Efficientdet: Scalable and efficient object detection
Abstract Model efficiency has become increasingly important in computer vision. In this
paper, we systematically study neural network architecture design choices for object�…
paper, we systematically study neural network architecture design choices for object�…
Self-training with noisy student improves imagenet classification
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled�…
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled�…
Transformer-xl: Attentive language models beyond a fixed-length context
Transformer networks have a potential of learning longer-term dependency, but are limited
by a fixed-length context in the setting of language modeling. As a solution, we propose a�…
by a fixed-length context in the setting of language modeling. As a solution, we propose a�…
Electra: Pre-training text encoders as discriminators rather than generators
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by
replacing some tokens with [MASK] and then train a model to reconstruct the original tokens�…
replacing some tokens with [MASK] and then train a model to reconstruct the original tokens�…
Specaugment: A simple data augmentation method for automatic speech recognition
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank�…
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank�…
Searching for activation functions
The choice of activation functions in deep networks has a significant effect on the training
dynamics and task performance. Currently, the most successful and widely-used activation�…
dynamics and task performance. Currently, the most successful and widely-used activation�…
Unsupervised data augmentation for consistency training
Semi-supervised learning lately has shown much promise in improving deep learning
models when labeled data is scarce. Common among recent approaches is the use of�…
models when labeled data is scarce. Common among recent approaches is the use of�…
Distributed representations of sentences and documents
Many machine learning algorithms require the input to be represented as a fixed length
feature vector. When it comes to texts, one of the most common representations is bag-of�…
feature vector. When it comes to texts, one of the most common representations is bag-of�…