Claude Opus 4.8 Released
A Point Release That Knows It's a Point Release
On May 28, Anthropic shipped Opus 4.8. The version number tells you most of what you need to know: this is a .8, not a 5.0. Anthropic says as much, they call it a “modest but tangible improvement.” That kind of honesty is rare in a launch post, so credit where it’s due.
But “incremental” doesn’t mean “uninteresting,” especially if you run agents in production. Here’s what actually changed, and what’s marketing.
What’s actually new?
Same price. $5 per million input tokens, $25 per million output. Unchanged from 4.7. In a market where almost every release ships with a price bump, a capability upgrade at flat pricing is the most concrete win here.
The “honesty” improvement. This is the headline, and the one I’d actually pay attention to. Anthropic claims Opus 4.8 is roughly 4x less likely than 4.7 to let flaws in its own code pass unremarked. Models love to declare victory (“Done! Fixed the bug!”) when the evidence is thin. If 4.8 is genuinely better at flagging its own uncertainty, that’s the difference between an agent you can leave running and one you have to babysit. For anyone wiring Claude into CI or autonomous pipelines, a model that says “I’m not sure this works” is worth more than one that’s confidently wrong.
There’s an irony in marketing honesty as a feature (it quietly admits the previous model was overconfident) but it’s the right kind of admission.
Dynamic workflows in Claude Code. Research preview. Claude can now plan a task, spin up hundreds of parallel subagents in a single session, and verify its own output before reporting back. The flagship example: codebase-scale migrations across hundreds of thousands of lines, from kickoff to merge, using your existing test suite as the bar. If you’ve ever scoped a framework migration across a monorepo, you know why this matters. Whether it holds up outside a demo is the open question, but the shape is right. (Enterprise, Team, and Max plans only.)
Effort control. There’s now a dial next to the model picker (low to max) controlling how hard the model thinks. Default is high. Low effort means faster responses and slower rate-limit burn. Sensible UX: not every prompt needs maximum reasoning, and now you don’t pay (in time or limits) for thinking you didn’t need.
1M token context by default on the API, Bedrock, and Vertex (200k on Microsoft Foundry). 128k max output, adaptive thinking. Fast mode (2.5x throughput) is now a third cheaper than it was on previous models.
A small but real API change: the Messages API now accepts system entries inside the messages array. You can update Claude’s instructions mid-task (permissions, token budgets, environment context) without breaking the prompt cache or faking a user turn. If you build agent harnesses, that kills a genuinely annoying workaround.
The benchmarks, with a grain of salt
Anthropic’s numbers are strong: the only model to complete every case on their Super-Agent benchmark, 84% on Online-Mind2Web for computer use (ahead of both 4.7 and GPT-5.5), and the highest score yet on their Legal Agent Benchmark. Notice the pattern, these are Anthropic’s benchmarks, or run on Anthropic’s harnesses. The footnotes quietly adjust competitor scores and prior Opus scores. That’s standard practice at every lab, and it’s exactly why first-party benchmarks are a starting point, not a verdict. Wait for independent runs.
The Mythos carrot
Buried at the end: Anthropic again dangles “Mythos-class” models — higher intelligence than Opus, currently restricted to a handful of orgs for cybersecurity under Project Glasswing, “coming in the coming weeks.” They’ve been saying versions of this for a while. The stated reason for the delay is that models this capable need stronger cyber safeguards before public release. Take that as you will: either responsible caution or a convenient way to keep a more powerful model permanently just over the horizon. Probably some of both.
Should you care?
If you use Claude casually, the effort control is the change you’ll notice. If you build agents or run Claude Code against real repos, the honesty improvement and dynamic workflows are worth testing this week, at the same price, there’s no reason not to. Just don’t mistake a .8 for a revolution. Anthropic isn’t, and neither should you.

