ICL papers

Theory

Von Oswald, Johannes, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. “Transformers learn in-context by gradient descent.” In International Conference on Machine Learning, pp. 35151-35174. PMLR, 2023. [Paper]

Ahn, Kwangjun, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. “Transformers learn to implement preconditioned gradient descent for in-context learning.” Advances in Neural Information Processing Systems 36 (2024). [Paper]

Fu, Deqing, Tian-Qi Chen, Robin Jia, and Vatsal Sharan. “Transformers learn higher-order optimization methods for in-context learning: A study with linear models.” arXiv preprint arXiv:2310.17086 (2023). [Paper]

Functions ICL can learn

Garg, Shivam, Dimitris Tsipras, Percy S. Liang, and Gregory Valiant. “What can transformers learn in-context? a case study of simple function classes.” Advances in Neural Information Processing Systems 35 (2022) [Paper]

Li, Yingcong, Muhammed Emrullah Ildiz, Dimitris Papailiopoulos, and Samet Oymak. “Transformers as algorithms: Generalization and stability in in-context learning.” International Conference on Machine Learning, (2023) [Paper]

Li, Yingcong, Yixiao Huang, Muhammed E. Ildiz, Ankit Singh Rawat, and Samet Oymak. “Mechanics of next token prediction with self-attention.” In International Conference on Artificial Intelligence and Statistics, (2024) [Paper]