Research Paper Summaries

Four-level explainers for deeply understanding research papers — from beginner to frontier. Each summary includes interactive quizzes to test your understanding.

Papers

AI Researchers' Perspectives on Automating AI R&D and Intelligence Explosions

Qualitative interview study of 25 leading AI researchers (Aug–Sep 2025) on automating AI R&D and intelligence explosion scenarios. 20/25 flagged ASARA as one of the most severe AI risks; 17/25 expect frontier models to be kept internal; a clear schism between frontier-lab researchers and academics on trajectory clarity.

qualitative interviews intelligence explosion AI governance arXiv Mar 2026

GIM: Evaluating Models via Tasks that Integrate Multiple Cognitive Domains

820 expert-authored problems (~11 person-hours each) measuring integration density — coordination of multiple cognitive operations simultaneously — calibrated with IRT 2PL psychometrics across 47 model configurations. Key finding: best human+AI centaur (θ=2.26) beats best pure LLM (θ=2.16), but operator skill at directing AI is the differentiating variable.

evaluation IRT psychometrics centaur study integration density March 2026

On Evaluation of Embodied Navigation Agents

The 7-page consensus document that reshaped embodied AI — zero figures, zero tables, one equation. Defined SPL (Success weighted by inverse Path Length), the PointGoal/ObjectGoal/AreaGoal taxonomy, and 7 recommendations that became the foundation for the Habitat platform and every major navigation benchmark since 2018.

embodied AI navigation evaluation SPL metric arXiv Jul 2018

Seedance 2.0: Advancing Video Generation for World Complexity

ByteDance's dual-branch diffusion transformer co-generates audio and video, accepts 15 multimodal references (9 images + 3 videos + 3 audio), supports V2V editing and multi-shot narratives, and beat Sora 2, Veo 3.1, and Kling 3.0 in blind evaluation — all at ~$0.14 per clip.

video generation diffusion transformer audio-video RLHF arXiv Apr 2026

Vision Banana: Image Generators are Generalist Vision Learners

One image generator, three specialist-killers. Google DeepMind shows that a single generative model (Nano Banana Pro), with lightweight instruction tuning, beats SAM 3, Depth Anything V3, and Lotus-2 — zero-shot. The thesis: image generation is to vision what next-token prediction is to language.

generative vision segmentation depth estimation instruction tuning arXiv Apr 2026

AlphaFold 2: Highly Accurate Protein Structure Prediction with AlphaFold

DeepMind solved protein folding — a 50-year grand challenge — by treating it as attention over evolution. AlphaFold 2's Evoformer extracts co-evolutionary signals from MSAs, predicting 3D structures at near-experimental accuracy (median GDT 92.4 at CASP14). Predicted 200M+ protein structures. 2024 Nobel Prize in Chemistry.

protein structure attention co-evolution structural biology Nature 2021

AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

One algorithm, zero game-specific knowledge, three superhuman games. AlphaZero defeated Stockfish (chess, 28-0), Elmo (shogi, 90-8), and AlphaGo Zero (Go, 60-40) — searching 1,000× fewer positions but evaluating each one with deep neural network understanding. Proved that the learning algorithm is domain-general.

reinforcement learning self-play MCTS game AI Science 2018

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search

DeepMind's landmark paper that defeated Lee Sedol 4-1 — combining policy networks (move prediction), value networks (position evaluation), and Monte Carlo Tree Search. Proved that neural networks + search can achieve superhuman performance in domains where brute-force search is impossible.

reinforcement learning MCTS game AI Nature 2016

DALL-E 3: Improving Image Generation with Better Captions

OpenAI's data-centric breakthrough — a custom-trained captioner re-describes every training image with rich detail, then retrains a standard diffusion model on these synthetic captions. 71.7% human preference over SDXL. Proved that data quality trumps model architecture for image generation.

image generation data-centric AI recaptioning diffusion

How this works

Level 1 — Beginner: Plain language, analogies, no jargon. Assumes no background.

Level 2 — Intermediate: How the methods work, key technical concepts, comparisons.

Level 3 — Expert: Full math, algorithms, related work connections, critical evaluation.

Phase 4 — Frontier: Improvement vectors, latest follow-on work, open gaps scorecard.

Each level ends with a 5-question interactive quiz. Score 4/5 or higher to pass.

Research Paper Summaries

Papers

AI Researchers' Perspectives on Automating AI R&D and Intelligence Explosions

GIM: Evaluating Models via Tasks that Integrate Multiple Cognitive Domains

On Evaluation of Embodied Navigation Agents

Seedance 2.0: Advancing Video Generation for World Complexity

Vision Banana: Image Generators are Generalist Vision Learners

AlphaFold 2: Highly Accurate Protein Structure Prediction with AlphaFold

AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search

DALL-E 3: Improving Image Generation with Better Captions

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills

Claudini: Autoresearch Discovers SOTA Adversarial Attack Algorithms for LLMs

How this works