Published as a conference paper at ICLR 2022
Jenny Hong, Kyle Hsu, Jing Huang, Thomas F. Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri,
Siddharth Karamcheti, G. Keeling, Fereshte Khani, O. Khattab, Pang Wei Koh, M. Krass, Ranjay
Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, J. Leskovec, Is-
abelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir P.
Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, D. Narayanan, Ben
Newman, Allen Nie, J. C. Niebles, H. Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel
Papadimitriou, Joon Sung Park, C. Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan,
Robert Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher
R’e, D. Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, K. Srinivasan, Alex Tamkin, Rohan
Taori, Armin W. Thomas, Florian Tram�r, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu,
Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, M. Zaharia, Michael Zhang,
Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. On
the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. URL
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large
annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference
on Empirical Methods in Natural Language Processing, pp. 632–642, 2015. URL https:
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel
Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler,
Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray,
Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever,
and Dario Amodei. Language models are few-shot learners. In Advances in Neural Information Pro-
Duo Chai, Wei Wu, Qinghong Han, Fei Wu, and Jiwei Li. Description based text classification with
reinforcement learning. In Proceedings of the International Conference on Machine Learning, pp.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harri Edwards,
Yura Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke
Zettlemoyer. QuAC: Question answering in context. In Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing, pp. 2174–2184, 2018. URL https:
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina
Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings
of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2924–2936,
Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, and Quoc V. Le.
BAM! born-again multi-task networks for natural language understanding. In Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics, pp. 5931–5937, 2019b.
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and
Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 reasoning challenge.
Ronan Collobert, Jason Weston, L�on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel
Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning
12