Claude Mythos & Capybara: A Comprehensive Research Report
Claude Mythos & Capybara: A Comprehensive Research Report#
Research compiled 2026-03-28
Abstract#
On March 26–27, 2026, Anthropic accidentally exposed approximately 3,000 unpublished internal assets through a misconfigured content management system, inadvertently revealing the existence of its next-generation AI model: Claude Mythos. The model operates under a new tier designation called Capybara — the first tier above Opus in Anthropic’s product hierarchy. Anthropic has confirmed the leak and acknowledged that Mythos has completed training, describing it as “by far the most powerful AI model we’ve ever developed” and a genuine “step change” in AI capability. This report synthesizes all available information from the leak and subsequent reporting to provide a comprehensive overview of Claude Mythos/Capybara, its capabilities, its cybersecurity implications, its position within the current Anthropic model family, and the broader competitive AI landscape as of Q1 2026.
Keywords: Claude Mythos, Claude Capybara, Anthropic, large language model, AI capabilities, cybersecurity, model tiers
📋 Introduction#
Problem statement#
The frontier AI landscape is advancing rapidly. Anthropic’s unplanned disclosure of Claude Mythos — a model the company considers a qualitative leap beyond its current flagship — creates an immediate need to understand what it is, what it can do, how it compares to existing options, and what risks it introduces1. For organizations evaluating AI tools, the emergence of a new capability tier above Opus reshapes procurement, security, and strategy planning.
Research questions#
This report investigates:
- RQ1 — What is Claude Mythos/Capybara, and how does it relate to Anthropic’s existing model family?
- RQ2 — What are its documented capabilities, and how do they compare to Claude Opus 4.6 and competitors?
- RQ3 — What are the cybersecurity implications and the expected release strategy?
Scope and boundaries#
- In scope: All publicly available information from the March 2026 data leak and subsequent reporting; comparisons to currently available Anthropic models; competitive context (OpenAI, Google, xAI)
- Out of scope: Internal Anthropic architecture details (not leaked); pricing specifics for Mythos (not yet available); Claude 5 (separate upcoming release)
- Target audience: Technical professionals evaluating AI tooling, capability planning, and security posture
💬 Context Notes
- This report was produced two days after the initial leak (March 26, 2026) and reflects the information environment as of March 28, 2026
- Anthropic has confirmed the model’s existence but has not released official documentation; all capability specifics come from the accidentally-exposed draft blog posts
- The situation is fluid — additional details may emerge as Anthropic proceeds with its early-access program
📚 Background#
Industry context#
As of Q1 2026, the frontier AI model market is dominated by four primary competitors: Anthropic (Claude), OpenAI (GPT), Google DeepMind (Gemini), and xAI (Grok)2. Each has pursued a tiered model strategy balancing cost, speed, and capability. Anthropic’s existing three-tier structure — Haiku (fast/cheap), Sonnet (balanced), Opus (flagship) — has served as the company’s core commercial offering since 2023. The addition of a fourth tier represents a significant structural shift.
Claude Opus 4.6 (released February 4, 2026) currently serves as Anthropic’s flagship. It introduced a 1-million-token context window at standard pricing, 128k max output tokens, adaptive reasoning, and context compaction — achieving state-of-the-art scores across agentic coding, legal reasoning, long-context comprehension, and scientific reasoning3.
Prior work and model history#
| Model | Release | Key Advance | Relevance |
|---|---|---|---|
| Claude 3 Opus | 2024 | First “frontier” Opus tier | Established Opus as top-tier brand |
| Claude Opus 4.5 | Nov 24, 2025 | Coding + workplace tasks | Incremental Opus improvement |
| Claude Opus 4.6 | Feb 4, 2026 | 1M context, adaptive reasoning, 128k output | Current flagship; Mythos baseline for comparison |
| Claude Sonnet 4.6 | Feb 17, 2026 | Same pricing as Sonnet 4.5, 1M context | Balanced tier upgrade |
| Claude Mythos (Capybara) | TBD 2026 | Step-change across all benchmarks | Subject of this report |
Gap in current knowledge#
No official benchmark numbers for Mythos have been released. The leaked draft blog posts used qualitative language (“dramatically higher scores”) without publishing specific figures. All capability comparisons in this report are therefore directional, not quantitative.
📋 Extended Model Context
Claude 5 is separately expected in Q2–Q3 2026 (roughly May–September), described as featuring near-AGI reasoning and 500K–1M token context windows. It is unclear whether Claude Mythos/Capybara is the same product as Claude 5 or a distinct release that precedes it. Current reporting treats them as separate efforts, with Mythos/Capybara focused on a specific capability jump above Opus rather than a full generational release4.
🔬 Methodology#
Approach#
This report uses secondary source synthesis — aggregating, cross-referencing, and evaluating reporting from multiple independent technology publications that covered the March 2026 data leak. No primary access to Anthropic systems or leaked documents was obtained directly.
flowchart LR
accTitle: Research Methodology Flow
accDescr: Web research gathered from leak reporting, synthesized against existing Anthropic model documentation, then cross-referenced with competitive benchmarks.
leak["🔓 Anthropic Data Leak<br/>March 26–27, 2026"]
reporting["📰 Press Coverage<br/>Fortune, Futurism, SiliconANGLE,<br/>The Decoder, CNBC, etc."]
existing["📄 Existing Anthropic Docs<br/>Opus 4.6 benchmarks,<br/>API pricing, release notes"]
competitor["🏆 Competitor Data<br/>OpenAI, Google, xAI<br/>benchmark comparisons"]
synthesis["🔍 Synthesis &<br/>Cross-reference"]
report["📋 This Report"]
leak --> reporting
reporting --> synthesis
existing --> synthesis
competitor --> synthesis
synthesis --> report
classDef source fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
classDef process fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
classDef output fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
class leak,reporting,existing,competitor source
class synthesis process
class report output
Data sources#
| Source | Type | Coverage |
|---|---|---|
| Fortune (exclusive report) | Technology journalism | Initial leak reporting, Anthropic confirmation1 |
| Futurism | Technology journalism | Cybersecurity risk analysis5 |
| SiliconANGLE | Technology journalism | Reasoning capabilities, release strategy6 |
| The Decoder | Technology journalism | Benchmark comparison language7 |
| CNBC | Financial journalism | Market impact, cybersecurity stock movement8 |
| Anthropic official docs | Primary source | Opus 4.6 benchmarks, pricing, API specs3 |
| Artificial Analysis | Benchmark aggregator | Competitive model comparisons9 |
Limitations of methodology#
⚠️ Known limitations: All Mythos capability data derives from unpublished draft blog posts exposed in the leak. Specific benchmark numbers were not included in those drafts. Competitor comparisons are based on Opus 4.6 benchmarks, not Mythos directly. The situation is actively developing — this report reflects a 48-hour snapshot.
📊 Findings#
Finding 1: Capybara is a new product tier, not just a new model#
Anthropic’s current product hierarchy is three tiers: Haiku → Sonnet → Opus. The leaked draft blog explicitly states that Capybara is a new tier name, not a model name: “Capybara is a new name for a new tier of model: larger and more intelligent than our Opus models — which were, until now, our most powerful.” Claude Mythos is the first specific model released under the Capybara tier1.
flowchart TD
accTitle: Anthropic Model Tier Hierarchy
accDescr: Anthropic's four-tier model hierarchy as of 2026, showing Capybara as the new top tier above Opus, with Claude Mythos as the first Capybara-tier model.
haiku["🐦 Haiku<br/><b>Fast · Cheap</b><br/>Best for: high-volume, latency-sensitive tasks"]
sonnet["🎵 Sonnet<br/><b>Balanced</b><br/>Best for: everyday tasks, cost-effective intelligence"]
opus["🏔️ Opus<br/><b>Flagship</b><br/>Best for: complex reasoning, agentic work<br/>Current: Opus 4.6"]
capybara["🦫 Capybara<br/><b>Breakthrough</b> ← NEW TIER<br/>Best for: frontier research, cybersecurity, novel problems<br/>First model: Claude Mythos"]
haiku --> sonnet --> opus --> capybara
classDef standard fill:#e0f2fe,stroke:#0369a1,stroke-width:2px,color:#0c4a6e
classDef new fill:#fef3c7,stroke:#d97706,stroke-width:3px,color:#78350f
class haiku,sonnet,opus standard
class capybara new
📌 Key insight: Capybara is to Opus what Opus was to Sonnet — a qualitatively different capability level, not just a tuned variant. This means it will also be priced accordingly (more expensive than Opus 4.6).
Finding 2: Capability jump is described as a “step change” across all key domains#
The leaked draft blog posts characterized Claude Mythos as achieving “dramatically higher scores” than Claude Opus 4.6 across three primary domains: software coding, academic reasoning, and cybersecurity7. Anthropic’s internal characterization used the phrase “step change” — language Anthropic reserves for capability discontinuities rather than incremental improvements.
Radar chart comparing relative capability levels across tiers (illustrative, based on directional language from leaked drafts — not official benchmark figures):
xychart-beta
title "Claude Model Capability Comparison"
x-axis [Coding, Reasoning, Cyber, Context, Writing]
y-axis "Score" 0 --> 100
bar [30, 35, 25, 40, 45]
bar [58, 60, 50, 75, 70]
bar [74, 80, 65, 90, 85]
bar [92, 93, 95, 90, 88]
Legend: Haiku 4.5 | Sonnet 4.6 | Opus 4.6 | Mythos (Capybara)
⚠️ Values above are directional estimates based on qualitative leak language, not official Anthropic benchmarks.
Finding 3: Cybersecurity capability is the headline differentiator — and the primary risk#
The leaked draft described Claude Mythos as “currently far ahead of any other AI model in cyber capabilities.” This is both Mythos’s most significant competitive advantage and the primary reason Anthropic is delaying general availability5.
The draft blog warned that the model “could allow attacks to scale faster than defenders could counter them” and described its capabilities as “unprecedented cybersecurity risks.” In a notable irony, this characterization was itself exposed by a cybersecurity failure (CMS misconfiguration)5.
Anthropic’s rollout strategy directly addresses this:
flowchart LR
accTitle: Claude Mythos Phased Rollout Strategy
accDescr: Anthropic's cautious release strategy for Claude Mythos, beginning with cyber defense organizations before broader commercial availability.
trained["✅ Training<br/>Complete<br/>March 2026"]
earlyaccess["🔒 Early Access<br/>Cyber Defense Orgs<br/>Q1–Q2 2026"]
evaluation["🔬 Evaluation<br/>Period<br/>Q2–Q3 2026"]
commercial["🌐 Broader<br/>Commercial Release<br/>Late 2026 (est.)"]
trained --> earlyaccess --> evaluation --> commercial
classDef done fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
classDef active fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
classDef future fill:#e0e7ff,stroke:#4338ca,stroke-width:2px,color:#1e1b4b
class trained done
class earlyaccess active
class evaluation,commercial future
📌 Key insight: Anthropic is deliberately giving cyber defense organizations a head start — effectively treating Mythos’s cybersecurity capability as a dual-use risk that requires defenders to be equipped before the model is broadly accessible to potential attackers.
Finding 4: Claude Opus 4.6 — the current baseline — is already highly competitive#
To contextualize the magnitude of the Mythos “step change,” it’s important to understand where Opus 4.6 sits in the current landscape. As of February 2026, Opus 4.6 is competitive or leading across most major benchmarks39:
| Benchmark | Claude Opus 4.6 | Notes |
|---|---|---|
| SWE-bench (coding) | ~74% | Competitive; Grok 4 leads at 75%, GPT-5.4 at 74.9% |
| GPQA Diamond (science reasoning) | Leads by 3.5pts vs GPT-5.4 | Best in class for graduate-level science |
| Terminal-Bench 2.0 (agentic coding) | 65.4% | State-of-the-art agentic performance |
| OSWorld (computer use) | 72.7% | Leading computer use benchmark |
| BrowseComp (agentic search) | 84.0% | Strong agentic web interaction |
| Humanity’s Last Exam (reasoning) | 53.1% (with tools) | Frontier reasoning performance |
| BigLaw Bench (legal reasoning) | 90.2% | Best legal reasoning score in Claude family |
| ARC AGI 2 | 68.8% | Novel problem-solving |
| MRCR v2 (long-context) | 76% | Strong long-context retrieval |
| Context window | 1M tokens | At standard pricing ($5/$25 per MTok) |
| Max output | 128k tokens | Double the previous 64k limit |
If Mythos delivers “dramatically higher scores” across coding, reasoning, and cybersecurity on top of this baseline, it represents a significant capability advance.
Finding 5: The competitive landscape as of Q1 2026#
quadrantChart
title AI Model Landscape - Capability vs Availability
x-axis Low Availability --> High Availability
y-axis Lower Capability --> Higher Capability
quadrant-1 Available Leaders
quadrant-2 Restricted Leaders
quadrant-3 Restricted Standard
quadrant-4 Available Standard
Claude Mythos: [0.12, 0.97]
Claude Opus 4.6: [0.75, 0.82]
GPT-5.4: [0.85, 0.79]
Gemini 3.1 Pro: [0.80, 0.76]
Grok 4: [0.65, 0.78]
Claude Sonnet 4.6: [0.90, 0.65]
Claude Haiku 4.5: [0.95, 0.42]
| Model | Maker | Coding | Reasoning | Writing | Ecosystem | Context |
|---|---|---|---|---|---|---|
| Claude Mythos | Anthropic | ★★★★★ | ★★★★★ | ★★★★☆ | Limited | TBD |
| Claude Opus 4.6 | Anthropic | ★★★★☆ | ★★★★★ | ★★★★★ | Claude Code, Cursor | 1M tokens |
| GPT-5.4 | OpenAI | ★★★★☆ | ★★★★☆ | ★★★★☆ | Largest ecosystem | Large |
| Gemini 3.1 Pro | ★★★★☆ | ★★★★★ | ★★★☆☆ | Google Workspace | 1M tokens | |
| Grok 4 | xAI | ★★★★★ | ★★★★☆ | ★★★☆☆ | X/Twitter | Large |
| Claude Sonnet 4.6 | Anthropic | ★★★★☆ | ★★★★☆ | ★★★★☆ | Full API access | 1M tokens |
📌 Key insight: No competitor has announced a model in a tier equivalent to Capybara. If Mythos delivers on the leaked characterization, it would represent a meaningful capability lead — at least temporarily — particularly in cybersecurity and agentic coding.
💡 Analysis#
Interpretation#
RQ1 — What is Claude Mythos/Capybara? It is the first model in a new fourth tier of Anthropic’s product hierarchy, positioned above Opus. “Capybara” is the tier; “Mythos” is the first model. This is analogous to Anthropic’s 2024 introduction of Opus as a tier above Sonnet — a structural, not just incremental, upgrade.
RQ2 — How do capabilities compare? The only quantitative comparisons available are for Opus 4.6 vs. competitors — and Opus 4.6 is already at or near the frontier in reasoning and coding. If Mythos is a genuine “step change” above that baseline, its absolute capability would represent a new frontier for AI, particularly in cybersecurity, where no competitor currently claims comparable performance.
RQ3 — What are the cybersecurity implications and release strategy? Anthropic is managing Mythos as a dual-use capability — genuinely useful for cybersecurity defense, but dangerous if accessible to adversarial actors. The phased rollout starting with defense organizations is a meaningful safety measure. The estimated general availability window of late 2026 may be tied to Anthropic’s anticipated IPO, suggesting business incentives align with this timeline6.
Implications#
For organizations:
- Monitor early-access announcements — if your work touches cyber defense or advanced agentic use cases, early access may be worth pursuing
- Current stack planning should account for a new top-tier pricing bracket (likely above Opus 4.6’s $5/$25 per MTok) entering the market in H2 2026
- Claude Opus 4.6 remains the practical recommendation for all current use cases
For security planning:
- The Mythos cybersecurity capability cuts both ways — organizations should treat the general availability window as a deadline to harden systems, not wait for the model to be available to test against them
- Anthropic’s own data leak (the event that revealed Mythos) is a pointed reminder that AI lab operational security is itself imperfect5
Limitations#
- All Mythos capability data is derived from draft marketing copy, not technical papers or reproducible benchmarks
- “Dramatically higher scores” is subjective — we do not know the absolute or relative magnitude
- The October 2026 release estimate is speculative, based on IPO alignment inference, not Anthropic statements
💬 Discussion Notes
- The leak itself has market implications: cybersecurity stocks fell on the CNBC report about Mythos’s capabilities8, reflecting investor concern that AI may outpace existing security tooling
- Some analysts have pointed out that the “unprecedented cybersecurity risk” framing may also be strategic — positioning Anthropic as a responsible actor taking precautions, while building pre-release hype5
- The name “Mythos” (mythology, foundational narrative) versus the codename “Capybara” (a large, calm semi-aquatic rodent) reflects Anthropic’s tradition of whimsical internal codenames with weighty product names
🎯 Conclusions#
Summary#
Claude Mythos is Anthropic’s most capable AI model to date, accidentally revealed on March 26–27, 2026 through a CMS misconfiguration. It occupies a new “Capybara” tier above Opus — a structural capability level, not just a model upgrade. Leaked draft blog posts describe it as delivering dramatically higher scores across coding, academic reasoning, and cybersecurity compared to Opus 4.6, with cybersecurity capability described as “far ahead of any other AI model.” Training is complete; the model is in early-access testing with cyber defense organizations. General availability is estimated for late 2026. No competitors have announced an equivalent tier.
Recommendations#
- Watch early-access announcements — Anthropic will expand the Capybara program from defense orgs to broader enterprise. Organizations should position to apply for early access if/when that program opens, especially those with AI/cybersecurity-adjacent work.
- Treat late 2026 as a capability inflection point — Current AI tool evaluations and ROI planning should account for a significant new capability tier entering general availability within approximately 6–9 months.
- Maintain Opus 4.6 as the current recommendation — Sonnet 4.6 for cost-sensitive use cases, Opus 4.6 for complex reasoning and agentic tasks. Mythos is not yet accessible; do not delay current AI adoption waiting for it.
- Prepare security posture for a more capable threat landscape — The Mythos cybersecurity capability will eventually reach adversarial actors. Organizations should use the remaining pre-release window to audit and harden systems.
Future work#
- Benchmark watch — As Anthropic publishes official Mythos data, update this note with actual numbers
- Pricing analysis — Once the Capybara tier is priced, compare cost-per-capability against Opus 4.6 to inform procurement decisions
- Claude 5 tracking — Separately monitor Claude 5 (expected Q2–Q3 2026) to determine if it is the same product as Mythos or a distinct release
🔗 References#
All sources cited in this report:
Last updated: 2026-03-28
Fortune. (2026, March 26). “Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power revealed in data leak.” Fortune. https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/ ↩︎ ↩︎ ↩︎
Various. (2026). “AI Models in 2026: Which One Should You Actually Use?” GuruSup. https://gurusup.com/blog/ai-comparisons ↩︎
Anthropic. (2026, February 4). “Introducing Claude Opus 4.6.” Anthropic. https://www.anthropic.com/news/claude-opus-4-6 ↩︎ ↩︎ ↩︎
Claude5.com. (2026). “When Is Claude 5 Coming Out? Q2 2026 (Here’s the Evidence).” Claude 5 Hub. https://claude5.com/news/when-is-claude-5-coming-out-release-date-prediction ↩︎
Futurism. (2026, March 27). “Anthropic Just Leaked Upcoming Model With ‘Unprecedented Cybersecurity Risks’ in the Most Ironic Way Possible.” Futurism. https://futurism.com/artificial-intelligence/anthropic-step-change-new-model-claude-mythos ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
SiliconANGLE. (2026, March 27). “Anthropic to launch new ‘Claude Mythos’ model with advanced reasoning features.” SiliconANGLE. https://siliconangle.com/2026/03/27/anthropic-launch-new-claude-mythos-model-advanced-reasoning-features/ ↩︎ ↩︎
The Decoder. (2026, March 27). “Anthropic leak reveals new model ‘Claude Mythos’ with ‘dramatically higher scores on tests’ than any previous model.” The Decoder. https://the-decoder.com/anthropic-leak-reveals-new-model-claude-mythos-with-dramatically-higher-scores-on-tests-than-any-previous-model/ ↩︎ ↩︎
CNBC. (2026, March 27). “Cybersecurity stocks fall on report Anthropic is testing a powerful new model.” CNBC. https://www.cnbc.com/2026/03/27/anthropic-cybersecurity-stocks-ai-mythos.html ↩︎ ↩︎
Artificial Analysis. (2026). “Claude Opus 4.6 (max) — Intelligence, Performance & Price Analysis.” Artificial Analysis. https://artificialanalysis.ai/models/claude-opus-4-6-adaptive ↩︎ ↩︎