Four-level explainers for deeply understanding research papers — from beginner to frontier. Each summary includes interactive quizzes to test your understanding.
One algorithm, zero game-specific knowledge, three superhuman games. AlphaZero defeated Stockfish (chess, 28-0), Elmo (shogi, 90-8), and AlphaGo Zero (Go, 60-40) — searching 1,000× fewer positions but evaluating each one with deep neural network understanding. Proved that the learning algorithm is domain-general.
DeepMind's landmark paper that defeated Lee Sedol 4-1 — combining policy networks (move prediction), value networks (position evaluation), and Monte Carlo Tree Search. Proved that neural networks + search can achieve superhuman performance in domains where brute-force search is impossible.
OpenAI's data-centric breakthrough — a custom-trained captioner re-describes every training image with rich detail, then retrains a standard diffusion model on these synthetic captions. 71.7% human preference over SDXL. Proved that data quality trumps model architecture for image generation.
Meta's unified multimodal model that tokenizes everything — text, images, code — into discrete tokens and trains a single transformer with next-token prediction. Beats GPT-4V on mixed-modal reasoning. The architecture that Transfusion argues against.
A single transformer that uses next-token prediction for text and diffusion denoising for images — matching DALL-E 2/SDXL on image generation and LLaMA-1 on text, at less than 1/3 the compute of discrete tokenization approaches.
A hardened version of MMMU that patches shortcut exploitation — filtering text-answerable questions, expanding to 10 options, and embedding questions in screenshots. Performance dropped 17–27% across all models.
The first comprehensive multimodal benchmark testing college-level expert reasoning — 11,500 questions across 30 subjects with 30 heterogeneous image types. GPT-4V scored 56% vs. human experts at 76–89%.
A benchmark of 900 videos (11s to 1 hour) with 2,700 expert-annotated questions across 6 domains — revealing that all models degrade on longer videos and that subtitles/audio significantly help.
A unified 2×2 framework for making AI agents better after pre-training. Four paradigms — A1, A2, T1, T2 — organize 100+ methods. Key finding: training smarter tools (T2) can match full agent retraining with 70× less data.
AI agents autonomously discover state-of-the-art adversarial attack algorithms by recombining existing methods — achieving 100% attack success on a hardened model.
Level 1 — Beginner: Plain language, analogies, no jargon. Assumes no background.
Level 2 — Intermediate: How the methods work, key technical concepts, comparisons.
Level 3 — Expert: Full math, algorithms, related work connections, critical evaluation.
Phase 4 — Frontier: Improvement vectors, latest follow-on work, open gaps scorecard.
Each level ends with a 5-question interactive quiz. Score 4/5 or higher to pass.