Learning transferable architectures for scalable image recognition

B Zoph, V Vasudevan, J Shlens…�- Proceedings of the�…, 2018 - openaccess.thecvf.com
Developing neural network image classification models often requires significant
architecture engineering. In this paper, we study a method to learn the model architectures�…

Neural architecture search with reinforcement learning

B Zoph, QV Le�- arXiv preprint arXiv:1611.01578, 2016 - arxiv.org
Neural networks are powerful and flexible models that work well for many difficult learning
tasks in image, speech and natural language understanding. Despite their success, neural�…

Efficientdet: Scalable and efficient object detection

M Tan, R Pang, QV Le�- …�of the IEEE/CVF conference on�…, 2020 - openaccess.thecvf.com
Abstract Model efficiency has become increasingly important in computer vision. In this
paper, we systematically study neural network architecture design choices for object�…

Self-training with noisy student improves imagenet classification

Q Xie, MT Luong, E Hovy…�- Proceedings of the IEEE�…, 2020 - openaccess.thecvf.com
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled�…

Transformer-xl: Attentive language models beyond a fixed-length context

Z Dai, Z Yang, Y Yang, J Carbonell, QV Le…�- arXiv preprint arXiv�…, 2019 - arxiv.org
Transformer networks have a potential of learning longer-term dependency, but are limited
by a fixed-length context in the setting of language modeling. As a solution, we propose a�…

Electra: Pre-training text encoders as discriminators rather than generators

K Clark, MT Luong, QV Le, CD Manning�- arXiv preprint arXiv:2003.10555, 2020 - arxiv.org
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by
replacing some tokens with [MASK] and then train a model to reconstruct the original tokens�…

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph…�- arXiv preprint arXiv�…, 2019 - arxiv.org
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank�…

Searching for activation functions

P Ramachandran, B Zoph, QV Le�- arXiv preprint arXiv:1710.05941, 2017 - arxiv.org
The choice of activation functions in deep networks has a significant effect on the training
dynamics and task performance. Currently, the most successful and widely-used activation�…

Unsupervised data augmentation for consistency training

…, Z Dai, E Hovy, T Luong, Q Le�- Advances in neural�…, 2020 - proceedings.neurips.cc
Semi-supervised learning lately has shown much promise in improving deep learning
models when labeled data is scarce. Common among recent approaches is the use of�…

Distributed representations of sentences and documents

Q Le, T Mikolov�- International conference on machine�…, 2014 - proceedings.mlr.press
Many machine learning algorithms require the input to be represented as a fixed length
feature vector. When it comes to texts, one of the most common representations is bag-of�…