Imagine 25 chefs at the world’s most cutting-edge restaurants and culinary schools. A researcher asks each one: “Do you think AI cooking robots will eventually get so good they start designing better cooking robots themselves — and if they do, will that change everything overnight?”
The chefs from fancy new restaurants (where AI tools are already speeding up the kitchen) tend to say “yes, this is happening, and faster than most people realize.” The chefs from culinary schools say “slow down — we’ve heard hype like this before, and there are real obstacles you’re ignoring.”
That’s what this paper does, but with AI researchers and AI that builds AI. The authors interviewed 25 top researchers in August and September 2025 and published the patterns they found.
The core idea was first proposed by I.J. Good in 1966:
If AI ever gets smart enough to improve itself, then a slightly smarter AI can design an even smarter AI, which can design an even smarter one, and so on — a runaway feedback loop where AI capability shoots up at a pace humans can’t follow.
The paper calls this kind of AI ASARA — AI Systems for AI R&D Automation. The technical name for “AI smart enough to help build better AI.”
Four big takeaways:
Out of 25 leading researchers, 20 said this was one of the most severe and urgent AI risks — not a sci-fi distraction, but a real concern.
Researchers inside frontier AI companies tend to see a relatively clear path to ASARA. Academic researchers are more skeptical — they’ve seen AI hype cycles fizzle before and their culture rewards skepticism.
17 out of 25 thought frontier AI labs will keep their most powerful AI internal rather than release it publicly. It’s more valuable for accelerating their own research than for selling to customers.
Some want “red lines” — bright-line rules like “no AI may improve itself without human approval.” Others think red lines are too rigid and prefer transparency requirements (mandatory reporting, government monitoring).
The interviewees mostly agreed on the shape of the path to ASARA, even when they disagreed on timing:
Most discussions of “will AI take over” happen between two camps: enthusiastic believers and dismissive skeptics, shouting past each other. This paper is one of the first systematic surveys of what the people actually building this technology privately think. It captures nuance — the same person who thinks ASARA is coming might also think red lines won’t work. A thermometer reading of where expert opinion actually sits in late 2025.
This is a qualitative research study. The goal isn’t to count things or run statistical tests — it’s to surface reasoning patterns and capture why experts hold the views they do.
182 researchers invited; 25 agreed (13.7% response rate, normal for elite-expert interviews). Three recruitment channels deliberately to capture different vantage points:
| Channel | Count | Captures |
|---|---|---|
| Literature-based (Google Scholar) | 7 | Published authors on recursive improvement |
| Conference workshops (NeurIPS / ICLR 2024) | 8 | Active researchers at relevant venues |
| Network / snowball | 10 | “Recommend someone who disagrees with you” |
Participant mix: 7 from frontier labs, 4 ex-frontier-lab, 9 academics, 3 industry, 2 nonprofit. The stratification is what lets them compare clusters later.
14 core questions, 40–60 minutes each, organized into three sections:
Critical detail: when participants didn’t know a concept, the interviewer read a scripted definition. This standardizes the prompt so people react to the same idea.
The lead author developed codes inductively (grounded-theory style). For the categorical dot plots in Figures 1–4, the authors did something novel: they fed anonymized transcripts to Claude with structured prompts that forced classification into fixed codes. The full prompt is in Appendix B.
This makes the categorical results reproducible — but creates a methodological dependency on Claude that the paper acknowledges only partially.
Figures recreated from the paper’s text. Position totals are exact (stated in the paper); per-affiliation dot placements are inferred from qualitative descriptions. Original figures at arxiv.org/html/2603.03338v2.
The most analytically interesting frame. 15 of 25 participants spontaneously distinguished:
| Skill | What it is | Why hard for AI |
|---|---|---|
| Execution | Implementing experiments, training code, ablations | Easier — well-defined sub-goals, fast feedback |
| Ideation | Picking experiments, “research taste,” noticing what matters | Harder — long feedback loops, hard to evaluate, paradigm shifts have a long tail |
A subset reframed the ideation problem more sharply: it’s not generating ideas that’s hard, it’s validating them. Expert humans with decades of experience struggle to recognize good ideas. And ML models tend to learn the mode of training data, not exceptional cases — exactly the wrong inductive bias for spotting paradigm shifts.
Sixteen participants flagged binding constraints. Three came up repeatedly:
Red lines = bright-line rules that trigger major response when crossed (e.g. IDAIS Beijing: “No AI system should copy or improve itself without explicit human approval”).
Three implementation challenges identified even by supporters:
Trade-off captured by P8: red lines are “the dumbest possible supervisor but the most trustworthy” — crude but transparent vs. sophisticated but discretionary.
| Finding | Count | Meaning |
|---|---|---|
| ASARA as severe / urgent risk | 20/25 | Strong elite consensus on risk salience |
| Expects internal deployment | 17/25 | Most expect frontier labs to withhold ASARA models |
| Three-stage path described | 17/25 | Convergence on trajectory shape |
| Ideation/execution distinction | 15/25 | Spontaneous frame for thinking about capabilities |
| Risk = “meta risk” (amplifies others) | 18/25 | Most common reason for concern |
| Concerned about adaptation lag | 17/25 | Second most common concern |
Since this is a qualitative interview study, L3 goes deep on methodology, the intelligence explosion theoretical literature this paper sits inside, and critical evaluation rather than equations and algorithms.
40% of participants came through network/snowball channels including MATS connections (Krueger is a MATS supervisor; lead author is in the program). Snowball sampling is known to produce homophilous networks — even when you ask for disagreement, you get disagreement within your cluster. Likely shape of non-response bias: declines included skeptics who think the framing is uninteresting, systematically reducing the “skeptical academic” representation in the sample. This actually works against the headline schism finding, which is epistemically reassuring.
Codes were developed inductively — grounded-theory style, letting concepts emerge. The Limitations section flags that inter-rater reliability was not calculated. The standard remedy is a second researcher independently coding a subset, with Cohen’s kappa or Krippendorff’s alpha. Values above ~0.8 are strong agreement; below 0.6 starts being concerning. This absence matters less for descriptive counts (“20 of 25 said X”) and more for interpretive claims like “a schism emerged.”
For Figures 1–4, transcripts were fed to Claude with structured prompts forcing classification into fixed codes. Methodological strength: reproducible (anyone can re-run the prompt), removes manual coder bias, auditable via extracted quotes.
Three issues the paper doesn’t engage with:
“An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind.”
Three buried assumptions, each a fault line in later literature:
Yudkowsky’s Intelligence Explosion Microeconomics (2013) formalizes Good’s argument with the k-factor:
Mapping onto chain-reaction physics:
k < 1 → subcritical (improvements taper off)
k = 1 → critical (sustained linear improvement)
k > 1 → supercritical (each round produces more than it cost
→ exponential explosion)
With I(t) the cognitive capability and R(t) the resources invested:
Yudkowsky’s key move: the empirical question isn’t “will AI improve?” (yes) but is k persistently > 1? Most natural processes have k < 1 — diminishing returns dominate.
This formalism also describes the “bootstrapping level” P1 mentioned: the capability threshold below which k < 1 (human time gates progress) and above which k > 1 (system sustains its own improvement).
Fast takeoff: optimization power grows faster than recalcitrance. In the software-only regime, optimization power can grow superlinearly while recalcitrance grows sublinearly. In the hardware regime, recalcitrance grows much faster — you have to fab chips, build datacenters, generate power.
This exactly matches the paper’s ideation/execution distinction: ideation = low-recalcitrance but hard to apply optimization power; execution = higher-recalcitrance but easier to apply optimization power.
Bostrom adds two theses absent from Good/Yudkowsky:
The paper’s “meta risk” framing (ASARA amplifies all other risks) is a downstream consequence of instrumental convergence.
Effective compute: E = C × A (physical compute × algorithmic efficiency).
Once AI labor dominates research (R ∝ A), substitution gives:
Regimes:
| Condition | Dynamics |
|---|---|
| β < 0 | Diminishing returns — sub-exponential |
| β = 0 | Constant returns — plain exponential |
| β > 0 | Increasing returns — finite-time singularity |
When β > 0, the ODE produces a finite-time blow-up (in the idealized model). Whether this happens in reality depends on historical algorithmic-progress data once AI dominates research — data we don’t yet have.
The key disagreement isn’t whether AI will improve. It’s whether k > 1 (or β > 0) is sustained for AI-on-AI improvement.
| Thinker | Core position | Implicit prior on k |
|---|---|---|
| Good 1966 | Argues from definition | k > 1 by assumption |
| Yudkowsky | Framework for asking the question | Open empirical |
| Bostrom | Decomposes into opt power / recalcitrance | k > 1 likely in software regime |
| Davidson | Empirical model fit to ML history | k > 1 contingent on β estimate |
| Chollet | Principled skepticism | k ≤ 1, diminishing returns rule |
The Field et al. participants are basically a poll across this fault line. Frontier-lab researchers cluster near Bostrom/Davidson; academic researchers cluster near Chollet.
The sharper frame: research progress is bottlenecked on evaluation, and ASARA-class self-improvement is bottlenecked on evaluating research direction.
Execution = generation + evaluation
≈ generation (med) + evaluation (easy, fast feedback)
Ideation = generation + evaluation
≈ generation (easy) + evaluation (hard)
“Research taste” is the human-research term for what ML calls a reward model:
You don’t just need AI that proposes directions (solved) — you need RewardModelAI matching RewardModelsenior_human well enough that following its rankings produces good research. Two failure modes that echo RLHF:
~80% of papers → cited < 10 times ~15% of papers → cited 10–100 times ~5% of papers → cited 100–1000 times ~0.1% of papers → cited > 10,000 times, define paradigms
Cumulative impact is dominated by the tail. A reward model 95% accurate on the median idea but mode-seeking on the tail produces 95% reasonable-looking work and 5% missed paradigm shifts — functionally similar to “no paradigm shifts.”
MLE minimizes forward KL:
Forward KL is mean-seeking — it penalizes the model for assigning low probability to high-data regions, smearing probability across modes. The colloquial sense (“learning the typical case, missing the tail”) is exactly what MLE does. Both cut against tail-case evaluation accuracy.
Gao, Schulman, Hilton fit functional forms to gold reward vs proxy reward as KL grows. RL version:
Where d = √KL(π ‖ πinit). Gold reward follows an inverted-U: it rises, peaks, then declines. Proxy keeps rising monotonically. The two terms map to:
| Term | Goodhart type | Mechanism | Reduced by |
|---|---|---|---|
| α (linear) | Regressional | Selection on noise in proxy features | Larger RM helps weakly |
| β log(d) | Extremal | Optimized samples drift OOD | Larger RM helps strongly |
| (not captured) | Adversarial | Policy manipulates the proxy | Open research problem |
If you retrain the RM in k stages of distance d/k each:
The new term β · log k is positive but logarithmic in k, not linear. Doubling iterations adds the same as the previous doubling. The α term doesn’t move — regressional Goodhart is unaffected by iteration.
Translation for ASARA: even idealized “AI keeps retraining its own evaluator” produces logarithmic gains, not exponential ones. The mechanism Section 2’s k > 1 regime needs, the math gives you keffective decaying like 1/log(rounds) — sub-exponential dynamics inside “the explosion.”
Research progress rate = min(generation rate, evaluation rate)
If generation rate → ∞ (AI is fast):
Bottleneck shifts to evaluation rate
Evaluation rate is bounded by:
- quality of the reward model
- true distribution of idea quality (long-tailed)
- calibration on tail cases (poor by default)
Asymmetric capability growth pattern:
| Capability | Trajectory | Reason |
|---|---|---|
| Coding ability | Fast, sustained growth | Clear evaluators |
| Math problem solving | Fast, sustained growth | Clear evaluators |
| Benchmark performance | Fast growth, then saturation | Goodhart |
| True novel-research generation | Slow growth, possible plateau | No evaluator |
That asymmetry is already visible in 2025–2026 data. Coding and math benchmarks have shot up; novel-paradigm research from AI has not appeared.
Under this analysis, the recursive loop has a specific shape: fast on execution, gated on evaluation, with each evaluation improvement requiring increasing amounts of ground-truth signal that takes calendar time to accumulate. Not Chollet’s “no acceleration ever” and not Bostrom’s “fast takeoff” — something in between, dominated by long stretches of incremental work punctuated by paradigm shifts that arrive roughly on the historical schedule because that’s how long ground truth takes to accumulate.
No follow-on study has reproduced this paper’s interviews with multiple human coders, multiple LLM classifiers, or a held-out validation set. The methodological dependency on a single LLM (Claude) and single human coder remains the largest unaddressed gap. This is the cheapest fix — running the existing 25 transcripts through a Claude-vs-GPT-vs-Gemini comparison could be done in a weekend.
METR’s May 2026 self-reported productivity survey of 349 technical workers found median 1.4–2× value uplift from AI tools. But it’s about current productivity, not ASARA scenarios — it doesn’t ask about intelligence explosion, governance, or deployment. A semi-structured 200+ researcher study with proper stratification is still wide open.
No team is running ASARA-belief surveys on a recurring cadence. Given how quickly the technology is moving — GPT-5 IMO gold August 2025, Claude Mythos May 2026, METR time horizons reaching 16 hours in May 2026 — a snapshot-and-update protocol would be the highest-information-density direction.
METR has built much of this infrastructure:
Open: nobody has formally mapped Field et al.’s qualitative milestone descriptions onto these existing benchmarks.
Davidson, Halperin, Houlden, and Korinek — “When Does Automating AI Research Produce Explosive Growth? Feedback Loops in Innovation Networks” (NBER Working Paper 35155, 2026) — develops a semi-endogenous growth model with an innovation network and derives a clean analytical condition for superexponential (“explosive”) growth. Two reinforcing channels offset diminishing returns:
This is the most direct extension of the theoretical lineage — bridges the qualitative k > 1 / β > 0 disagreement into a formal model with testable conditions. Still open: no measurements of β in the AI-on-AI regime, because we don’t yet have sustained AI doing the research at scale.
Substantial reporting on Chinese AI development exists, but no comparable interview study of PRC researchers.
A Field-et-al-style interview study with 20–25 Chinese AI researchers using the same protocol is still missing.
A development post-dating the paper that directly affects the red-lines findings:
The Global Call for AI Red Lines (September 2025) gathered 200+ prominent signatories including Nobel laureates, calling for binding international agreement on AI red lines by end of 2026. Followed by:
Field et al.’s participants in Aug–Sep 2025 were debating red lines as an abstract governance approach. By mid-2026, red lines became an active political fight — making the paper’s documentation of expert preferences a useful historical baseline.
| Vector | Status | What exists | What’s still open |
|---|---|---|---|
| 1. Methodological triangulation | Area to explore | — | Multi-model classification, multi-coder IRR, validation sets |
| 2. Larger samples | Partial | METR n=349 (productivity only) | ASARA-specific large-N study |
| 3. Longitudinal tracking | Area to explore | — | Recurring panel of same researchers |
| 4. Operationalizing milestones | Substantial | METR time-horizon, RE-Bench, MLE-Bench, MLR-Bench, MLRC-Bench, AIFM | Formal mapping of paper’s qualitative milestones to benchmarks |
| 5. Theoretical grounding | Substantial | Davidson-Halperin-Houlden-Korinek NBER 2026, AIFM | Direct measurement of β in AI-on-AI regime |
| 6. Cross-national coverage | Partial | CSET, Carnegie, ChinaTalk reporting | Parallel interview study with PRC researchers |
A longitudinal panel reconnecting the original 25 every 9 months, expanding to include 25 PRC-based researchers using the same protocol, with multi-model AI-assisted coding and a human-validation subset. One design hits Vectors 1, 3, and 6 simultaneously and provides exactly the time-series data needed to test whether the “schism” persists or converges as capabilities advance.