Assistance is mainstream.
Autonomy is not.
Nearly 4 in 5 developers use AI assistance. But only about 1 in 3 have used an AI agent — and almost 38% say they don't plan to. The gap is about trust, risk, and verification debt.
Agent Adoption Breakdown
Stack Overflow 2025 — frequency of agent use specifically
Autonomy amplifies risk: test failures, security mistakes, and policy violations. "Verification debt" shifts effort from writing code to reviewing and debugging it.
The Long-Horizon Step Change (Early 2026)
METR (Model Evaluation & Threat Research) tracks the "50% time horizon" — the task duration at which frontier AI agents succeed half the time. This metric has been doubling every ~89 days since 2024, creating what some analysts call "a new Moore's Law for AI agents."
| Model | Date | 50% Time Horizon |
|---|---|---|
| GPT-4 / Claude 3 Opus | Early 2024 | ~4 min |
| Claude 3.5 Sonnet | Jun 2024 | ~11 min |
| Claude 3.7 Sonnet | Feb 2025 | ~60 min |
| Claude Opus 4.5 | Nov 2025 | ~293 min (~5 hrs) |
| Claude Opus 4.6 | Feb 2026 | ~719 min (~12 hrs) |
This enables a qualitative shift: agents that can work for hours, not minutes. OpenAI demonstrated a single Codex session running 25 hours uninterrupted, generating 30K lines of code. Cursor shipped cloud agents running on isolated VMs with up to 8 parallel instances. The architecture shifted from synchronous prompt-response to asynchronous execution loops with verification.
The caveat: Anthropic's own research found a "significant deployment overhang" — models are more capable of autonomy than users currently exercise. Median real-world Claude Code turns remain at ~45 seconds, even as the 99.9th percentile nearly doubled to 45+ minutes.