top of page

Did Anthropic's Claude Fable 5 cross a threshold?

  • Writer: Ram Srinivasan
    Ram Srinivasan
  • 2 days ago
  • 7 min read

Yesterday Anthropic released Claude Fable 5.


Wharton professor Ethan Mollick handed Fable a 15-page spec and “it would work for 9+ hours and deliver terrific results.” Stripe used Fable to run a migration across a 50-million-line codebase in a single day. The internal estimate for doing it by hand: a full team, two months.


Fable runs on the same weights as Mythos 5, the system that triggered government conversations about cyber and bio misuse last spring. In other words, Fable is Mythos with a permissions wrapper. We now have the capability for long-horizon autonomous work accessible to anyone with a credit card.


But do we have a way to work WITH it? When a machine takes your request and disappears for twelve hours, what does collaboration even look like? Douglas Adams saw this problem coming nearly fifty years ago.


raw media image




In The Hitchhiker’s Guide to the Galaxy, a civilization builds Deep Thought, a computer to answer the ultimate question of life, the universe, and everything. They ask, walk away for 7.5 million years, and come back. The answer is 42.


That is the shape of working with Fable 5 today: hand off a project at midnight, wake up to finished work.


And it means something profound just happened to the value of a question. When the machine handles the twelve hours in the middle, everything YOU contribute concentrates at the two ends: the brief you write going in, and the judgment you apply to what comes back. The quality of the answer is set by the quality of the question.


Call it the Deep Thought Rule: the longer a machine can run on its own, the more the outcome depends on the question you asked at the start. Execution was the expensive part of knowledge work and the question was an afterthought. Fable just inverted that. The question is now the highest-leverage thing a human produces.


Four questions surface:


How fast is this capability actually moving?


If one prompt is now one project, what breaks inside the org?


What does 2X the price of Opus actually buy?


And how will we collaborate with machines that work alone for hours?


Let’s take them one at a time.


1/ ANALYST TO EXPERT IN SIX MONTHS?

GDPval-AA is a benchmark for real knowledge work. It takes actual professional deliverables (legal briefs, financial models, research reports) and pits AI models against each other head-to-head, with expert judges picking the winner. The scores use Elo, the rating system used in competitive chess. It doesn’t measure how good you are in the abstract, it measures who beats whom, and how often.


When GDPval-AA launched in January 2026, the leader was GPT-5.2 at 1,442 Elo. Today Fable 5 sits at 1,932.


That is 490 points in six months.


Because Elo is logarithmic, a 490-point gap is substantially larger than it appears. Something like this: today’s Fable beats January’s best model in the world on 19 of every 20 tasks. In chess terms, that’s a grandmaster sitting down across from a club player.


The corroborating numbers stack up fast. Fable scored 91/100 on Every’s Senior Engineer benchmark; Opus 4.8 scored 63/100 and GPT-5.5 scored 62/100. It is the first model past 90% on Hex’s core analytics benchmark, the long multi-stage data work that until last week required a human analyst. Anthropic now writes 80% of its own code with Claude.


The progress in human terms is like an analyst becoming an expert in six months.


Will this progress continue? Will the upcoming IPOs alter the pace? When will we hit AGI? All good questions but what matters is that we already have incredible capabilities at hand AND are barely scratching the surface with adoption.


And here’s a question that does matter: Are you positioned to use them effectively?


2/ IS THE NEW UNIT OF WORK A SINGLE PROMPT?

The unit of work you can hand off just got bigger. Much bigger.


Yesterday: “Answer this question.” Today: “Complete this project.”


Every organization is architected around the old unit:


Your processes assume work arrives in human-sized pieces. Tickets, sprints, weekly status meetings, approval chains with five signatures. All of it built for tasks that take a person days. When a project completes overnight, a five-day approval loop is a problem. You call it governance, but is it really the new bottleneck?


Your incentives reward visible activity. Hours logged, tickets closed, lines of code shipped, decks produced. But when execution is delegated, activity is exactly what disappears. Are you actively punishing your best AI operators without knowing it?


Your hiring and promotion ladders are built on execution apprenticeship: juniors do the work, learn by doing, and rise into judgment roles. If the doing gets delegated, where does judgment come from? How do you “judge” judgment? That’s a question about the job req you’re posting this month.


Your operating model prices labor in salaries and headcount. Does it now need a compute line that behaves like a contractor budget?


The scarce skills in the new unit of work are briefing (defining the problem the way you’d brief a senior contractor) and judging (knowing whether the thing that came back is actually good).


Do briefing and judging appear in your company’s competency frameworks?


3/ WHAT DOES 2X THE PRICE OF OPUS BUY?

Let’s talk about the sticker shock, because everyone’s asking the same question.


Fable 5 costs $10 per million input tokens and $50 per million output. That’s 2X the price of Opus 4.8. On subscription plans it’s free until June 22; after that it eats usage credits at the same 2X rate.


So: is it 2X better? That’s the wrong question. The honest answer from the first 24 hours of testing is that Fable isn’t uniformly better at all tasks. It’s categorically better at one class of work and a worse fit for another.


Some have called Fable a “warp drive”: superb for large, well-defined tasks you hand off asynchronously, and a poor fit for quick back-and-forth. And for everyday chat and small tasks, the emerging consensus is: stick with older or smaller models.


Now for the part that flips the math. Fable is dramatically more token-efficient on big jobs. On Cognition’s FrontierCode benchmark it leads frontier models even at medium reasoning effort, and early customers report it finishing projects in far fewer turns and tokens.


However, tokenmaxxing is already a problem; will models like Fable make it worse? A job at 2X the per-token rate can land near the old total cost, except it’s done in one run instead of forty rounds of supervision. You’re paying for the absence of yourself from the middle of the work.


Should we price that against a contractor’s day rate, not against Opus?


4/ BOTH HANDS ON THE WHEEL

Which brings us to how do we collaborate with these models? Is a chat window the best interface? Is anyone building the alternative?


Last week I was at Bloomberg Tech in San Francisco when Mira Murati walked on stage for her first major public appearance in 18 months. She previewed what Thinking Machines Lab calls interaction models.


The idea is to provide AI with substantially more context throughout the interaction. This is done with continuous streams of audio, text, and video, processed in 200-millisecond intervals. The model catches interruptions, mid-thought corrections, even the pauses where you’re still deciding.


Then she dismantled the industry’s favorite safety phrase. “Human in the loop,” she argued, just describes a checkpoint where you sign off and the machine is good to go. The whole point of a loop is to exit it.


Her alternative was a tandem bicycle: both riders pedaling, and going uphill, whoever is stronger is pedaling harder BUT both hands are on the wheel. A system designed for collaboration, not collaboration bolted on as a checkpoint.


Hold the two images side by side:


Deep Thought: ask, vanish, pray.


Tandem bike: continuous, correctable, shared.


In 1973, Xerox PARC had the Alto: graphical interface, mouse, networking, the entire future of personal computing sitting in one lab. BUT Xerox captured almost none of the value. Apple did. Why? Because Apple obsessed over the interface between human and machine while everyone else obsessed over the machine.


Are we at the exact same moment in AI?


THE BOTTOM LINE

Fable 5 is Mythos with a governor: frontier capability, generally available, at 2X the price of Opus. YES, the 2X buys autonomy, BUT it doesn’t buy you universally better answers.


Pay it for projects, NOT chat.


For me here’s what’s exciting. The difference between answering and finishing is NOT a gradient. It’s a different kind of work. And a different kind of work calls for a different operator.


The unit of work is changing at the pace of frontier AI. Are you?


Until next time,

Ram — 

Ram Srinivasan


MIT Alum | Author, The Conscious Machine | Global Future of Work and AI Adoption Leader published in Business Insider, Fortune, Harvard Business Review, MIT Executive Viewpoints and more.


A Message From Ram:

My mission is to illuminate the path toward humanity's exponential future. If you're a leader, innovator, or changemaker passionate about leveraging breakthrough technologies to create unprecedented positive impact, you're in the right place. If you know others who share this vision, please share these insights. Together, we can accelerate the trajectory of human progress.


Disclaimer:

Ram Srinivasan currently serves as an Innovation Strategist and Transformation Leader, authoring groundbreaking works including "The Conscious Machine" and the upcoming "The Exponential Human."


All views expressed on "Substrate" and across all digital channels and social media platforms are strictly personal opinions and do not represent the official positions of any organizations or entities I am affiliated with, past or present. The content shared is for informational and inspirational purposes only. These perspectives are my own and should not be construed as professional, legal, financial, technical, or strategic advice. Any decisions made based on this information are solely the responsibility of the reader.


While I strive to ensure accuracy and timeliness in all communications, the rapid pace of technological change means that some information may become outdated. I encourage readers to conduct their own due diligence and seek appropriate professional advice for their specific circumstances.


 
 
bottom of page