Ilya Top 30

Ilya Sutskever's curated reading list of ~30 papers and resources spanning the conceptual foundations of deep learning, from architecture breakthroughs to information theory and complexity. This list circulated widely as a recommended curriculum for understanding the intellectual roots of modern AI.

Canonical list

#	Title	Year	Wiki page
1	The Annotated Transformer (Attention Is All You Need)	2017	Attention Is All You Need
2	The First Law of Complexodynamics (Aaronson)	2011	The First Law Of Complexodynamics
3	The Unreasonable Effectiveness of Recurrent Neural Networks (Karpathy)	2015	The Unreasonable Effectiveness Of Recurrent Neural Networks
4	Understanding LSTM Networks (Olah)	2015	Understanding Lstm Networks
5	Recurrent Neural Network Regularization (Zaremba et al.)	2014	Recurrent Neural Network Regularization
6	Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (Hinton & van Camp)	1993	Keeping Neural Networks Simple By Minimizing Description Length
7	Pointer Networks (Vinyals et al.)	2015	Pointer Networks
8	ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)	2012	Imagenet Classification With Deep Convolutional Neural Networks
9	Order Matters: Sequence to Sequence for Sets (Vinyals et al.)	2016	Order Matters Sequence To Sequence For Sets
10	GPipe: Efficient Training of Giant Neural Nets (Huang et al.)	2019	Gpipe Efficient Training Of Giant Neural Nets
11	Deep Residual Learning for Image Recognition (ResNet)	2015	Deep Residual Learning For Image Recognition
12	Multi-Scale Context Aggregation by Dilated Convolutions (Yu & Koltun)	2016	Multi Scale Context Aggregation By Dilated Convolutions
13	Neural Message Passing for Quantum Chemistry (Gilmer et al.)	2017	Neural Message Passing For Quantum Chemistry
14	Attention Is All You Need (Vaswani et al.)	2017	Attention Is All You Need
15	Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al.)	2014	Neural Machine Translation By Jointly Learning To Align And Translate
16	Identity Mappings in Deep Residual Networks (He et al.)	2016	Identity Mappings In Deep Residual Networks
17	A Simple Neural Network Module for Relational Reasoning (Relation Networks)	2017	A Simple Neural Network Module For Relational Reasoning
18	Variational Lossy Autoencoder (Chen et al.)	2017	Variational Lossy Autoencoder
19	Relational Recurrent Neural Networks (Santoro et al.)	2018	Relational Recurrent Neural Networks
20	Quantifying the Rise and Fall of Complexity in Closed Systems (Coffee Automaton)	2014	Quantifying The Rise And Fall Of Complexity In Closed Systems
21	Neural Turing Machines (Graves et al.)	2014	Neural Turing Machines
22	Deep Speech 2 (Amodei et al.)	2015	Deep Speech 2
23	Scaling Laws for Neural Language Models (Kaplan et al.)	2020	Scaling Laws For Neural Language Models
24	A Tutorial Introduction to the Minimum Description Length Principle (Grünwald)	2004	A Tutorial Introduction To The Minimum Description Length Principle
25	Machine Super Intelligence (Legg)	2008	Machine Super Intelligence
26	Kolmogorov Complexity and Algorithmic Randomness (Li & Vitányi / Shen et al.)	2017	Kolmogorov Complexity And Algorithmic Randomness
27	CS231n: Convolutional Neural Networks for Visual Recognition (Stanford course)	2015	Cs231N Convolutional Neural Networks For Visual Recognition
28	Denoising Diffusion Probabilistic Models (Ho et al.)	2020	Denoising Diffusion Probabilistic Models
29	An Image Is Worth 16x16 Words: Vision Transformer (ViT)	2020	An Image Is Worth 16X16 Words Transformers For Image Recognition At Scale
30	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al.)	2022	Chain Of Thought Prompting Elicits Reasoning

Thematic clusters

Architectures: Transformer (#1/#14), ResNet (#11/#16), AlexNet (#8), ViT (#29), GPipe (#10)

Sequence modeling: RNNs (#3), LSTMs (#4), RNN regularization (#5), Pointer Networks (#7), Seq2Seq for sets (#9), Deep Speech 2 (#22)

Attention and relational reasoning: Bahdanau attention (#15), Relation Networks (#17), Relational RNNs (#19)

Information theory and compression: MDL/Hinton (#6), Grünwald MDL tutorial (#24), Kolmogorov complexity (#26)

Complexity and intelligence: Complexodynamics (#2), Coffee Automaton (#20), Machine Super Intelligence (#25)

Graph and message passing: Neural Message Passing (#13), Dilated Convolutions (#12)

Scaling and modern methods: Scaling Laws (#23), Diffusion Models (#28), Chain-of-Thought (#30)

Memory and computation: Neural Turing Machines (#21)

Courses: CS231n (#27)

Why this list matters

The list reveals Ilya's emphasis on compression, complexity, and information-theoretic foundations alongside practical architecture breakthroughs. The inclusion of Kolmogorov complexity, MDL, and complexodynamics papers signals that intelligence is deeply connected to compression — a theme that runs through scaling laws and modern LLM capabilities.