The article digs into rising concerns about Anthropic’s Claude AI. Users have noticed what feels like a drop in performance, especially on tough engineering and coding tasks. At the same time, Anthropic has started testing a beefier model called Mythos. That’s fueled a lot of talk—was Claude intentionally nerfed to save compute, or was it just reconfigured?
People have been posting comparisons, benchmarks, and anecdotes all over X, GitHub, and Reddit. Some swear Claude’s gotten worse at nuanced, multi-step reasoning. Others say it depends on how you test.
Independent analyses and Anthropic’s own statements frame this as more of a configurability shift, not a mysterious outage. Some folks argue the drop isn’t a loss of skill, just a change in default behavior. Now, high-effort reasoning seems less common unless you specifically ask for it.
What sparked the controversy around Claude and Mythos
Developers and users started raising flags about Claude falling short on complex tasks. Meanwhile, Anthropic rolled out Mythos for testing, which only added fuel to the speculation. Was Claude’s performance really dialed down on purpose? There’s no shortage of side-by-side outputs and benchmarks floating around, each telling a slightly different story.
Some analyses and company comments suggest it’s more about how you configure things than a real downgrade. The idea is that Claude’s abilities didn’t vanish—they just became less obvious unless you tweak the settings.
Signals from users and independent testing
- Public threads show mixed results: Mythos sometimes outshines Claude, but not always. Claude seems to stumble more on tricky, multi-step prompts.
- The way you frame prompts and run benchmarks really affects what you see, making fair comparisons tricky.
How Claude’s default behavior has changed and what that means
Anthropic says they tweaked the default reasoning level in Claude Code. They deny doing this because of compute limits or to push Mythos. Users can flip the /model selector between faster, low-effort and slower, high-effort modes. Anthropic insists this isn’t a sneaky nerf—just a real, user-facing option. Boris Cherny, who leads Claude Code, pointed out that the selector is sticky and keeps your choice between sessions. That’s meant to show it’s a genuine setting, not a hidden limiter.
What the company says about the selector and defaults
Testing from outside the company suggests that making shallow reasoning the default does lower the depth of answers on multi-step tasks. Still, Anthropic says you can always demand deeper thinking if you want. They argue it’s about user-selected settings, not a system-wide clampdown on capability.
Is there a hidden nerf or is habituation at play?
The debate keeps coming back to habituation. As people get used to better AI, they raise their expectations. Even if the model’s skills stay the same, it can feel like a step back. Maybe what looks like regression is really just users getting pickier and prompts getting tougher.
Evidence and alternative explanations
Some independent reviews agree that the default depth was dialed back, but not in some secretive, sweeping way. The picture’s nuanced: any real drop is measured against ever-changing prompts and user expectations. For now, until someone runs a rigorous, peer-reviewed study, the answer feels up in the air.
Access fragmentation and the pricing shifts around top-tier AI
Zooming out, this whole saga points to a bigger trend: access to the best AI is getting more fragmented. It’s often locked behind higher-priced tiers, APIs, or special programs. Anthropic’s nudging big enterprise customers toward usage-based token pricing. That ties smarter behavior more directly to how much you spend. They’re also prepping an upgrade to the Opus model (v4.7).
What this means for developers and enterprises
As these models keep evolving, teams have to juggle raw power with predictable costs. If defaults shift depending on what you pay, prompt design and evaluation need to keep up. The Opus v4.7 upgrade will probably put more advanced computation behind a paywall, so careful benchmarking is going to matter even more for reliability and ROI.
What to watch next: implications and practical takeaways
In a market that cares about both power and accessibility, the Claude/Mythos debate really highlights how governance, testing, and pricing shape what you actually get. The big questions? Will default AI experiences keep drifting as models get better and pricier, and will vendors stay open about what you can actually tweak?
Takeaways for practitioners
- Design prompts and workflows that allow for configurable default behavior. Make sure there are clear paths to higher-effort modes when needed.
- Benchmark against real engineering tasks. Include multi-step reasoning and coding sequences.
- Keep an eye on cost-performance trade-offs as new models arrive. Try to align expectations with enterprise pricing, but expect some surprises.
- Watch out for access gating. Sometimes, the best AI experience means dealing with higher-cost plans or joining pilot programs.
If you’re in the field, it’s worth examining disputes with some nuance. Separate real capability from marketing, and focus on solid evaluation and governance to keep AI tools reliable and affordable for engineering work.
Here is the source article for this story: Anthropic’s AI downgrade stings power users