TUNDRA // NEXUS

Mission Control
Curated Links/2026-02-21-claude-opus-46-metr-14-hour-benchmark
🟢

Exponential Progress: Claude Opus 4.6 Has 50% Time Horizon Of 14.5 Hours On METR Time Horizons Benchmark

🔗officechai.com
February 21, 2026
SIGNAL9/10
#ai

🟢 READ | ⏱ 6 min | 📡 9/10 | 🎯 AI researchers, engineering leaders, anyone tracking capability trajectories

TL;DR

METR's time-horizon benchmark measures how long an AI takes to complete tasks that would take a skilled human professional from scratch. Claude Opus 4.6 hits a 50% success rate on tasks requiring 14.5 hours of human expert time. The confidence interval is wide (6–98 hours) because the benchmark is nearly saturated — the model may be underselling its actual capability. More important than the number: the exponential doubling rate (123 days) has held since 2023 with R²=0.93.

Signal

  • In mid-2024, frontier models had time horizons in single-digit minutes. Early 2025: 15–30 minutes. Late 2025: several hours. Early 2026: 14.5 hours. That's 3 orders of magnitude in ~18 months
  • Benchmark saturation is itself a signal: METR is actively developing new evaluation methods because the current task suite can no longer distinguish between frontier models
  • Tasks in the benchmark include implementing complex network protocols from scratch, iterative ML debugging, and cybersecurity tasks — not trivia, genuine work-product generation

What They're NOT Telling You

The 14.5-hour figure applies to well-specified, self-contained tasks with no organizational context, relationships, or ambiguous goals — the conditions that define most real jobs. METR explicitly caveats this. The article (and the benchmark) are also measuring rate-of-capability-growth, not economic impact, which has many more variables. The "singularity" framing in the intro is Musk-quoting clickbait; the METR data itself is measured and credible.

Trust Check

Factuality ✅ (METR data directly cited, R² and CI disclosed) | Author Authority ⚠️ (OfficeChai, secondary source) | Actionability ✅ (recalibrate automation assumptions now)