Claude Mythos & Capybara: A Comprehensive Research Report#

Research compiled 2026-03-28

Abstract#

On March 26–27, 2026, Anthropic accidentally exposed approximately 3,000 unpublished internal assets through a misconfigured content management system, inadvertently revealing the existence of its next-generation AI model: Claude Mythos. The model operates under a new tier designation called Capybara — the first tier above Opus in Anthropic’s product hierarchy. Anthropic has confirmed the leak and acknowledged that Mythos has completed training, describing it as “by far the most powerful AI model we’ve ever developed” and a genuine “step change” in AI capability. This report synthesizes all available information from the leak and subsequent reporting to provide a comprehensive overview of Claude Mythos/Capybara, its capabilities, its cybersecurity implications, its position within the current Anthropic model family, and the broader competitive AI landscape as of Q1 2026.

Keywords: Claude Mythos, Claude Capybara, Anthropic, large language model, AI capabilities, cybersecurity, model tiers

📋 Introduction#

Problem statement#

The frontier AI landscape is advancing rapidly. Anthropic’s unplanned disclosure of Claude Mythos — a model the company considers a qualitative leap beyond its current flagship — creates an immediate need to understand what it is, what it can do, how it compares to existing options, and what risks it introduces¹. For organizations evaluating AI tools, the emergence of a new capability tier above Opus reshapes procurement, security, and strategy planning.

Research questions#

This report investigates:

RQ1 — What is Claude Mythos/Capybara, and how does it relate to Anthropic’s existing model family?
RQ2 — What are its documented capabilities, and how do they compare to Claude Opus 4.6 and competitors?
RQ3 — What are the cybersecurity implications and the expected release strategy?

Scope and boundaries#

In scope: All publicly available information from the March 2026 data leak and subsequent reporting; comparisons to currently available Anthropic models; competitive context (OpenAI, Google, xAI)
Out of scope: Internal Anthropic architecture details (not leaked); pricing specifics for Mythos (not yet available); Claude 5 (separate upcoming release)
Target audience: Technical professionals evaluating AI tooling, capability planning, and security posture

💬 Context Notes

This report was produced two days after the initial leak (March 26, 2026) and reflects the information environment as of March 28, 2026
Anthropic has confirmed the model’s existence but has not released official documentation; all capability specifics come from the accidentally-exposed draft blog posts
The situation is fluid — additional details may emerge as Anthropic proceeds with its early-access program

📚 Background#

Industry context#

As of Q1 2026, the frontier AI model market is dominated by four primary competitors: Anthropic (Claude), OpenAI (GPT), Google DeepMind (Gemini), and xAI (Grok)². Each has pursued a tiered model strategy balancing cost, speed, and capability. Anthropic’s existing three-tier structure — Haiku (fast/cheap), Sonnet (balanced), Opus (flagship) — has served as the company’s core commercial offering since 2023. The addition of a fourth tier represents a significant structural shift.

Claude Opus 4.6 (released February 4, 2026) currently serves as Anthropic’s flagship. It introduced a 1-million-token context window at standard pricing, 128k max output tokens, adaptive reasoning, and context compaction — achieving state-of-the-art scores across agentic coding, legal reasoning, long-context comprehension, and scientific reasoning³.

Prior work and model history#

Model	Release	Key Advance	Relevance
Claude 3 Opus	2024	First “frontier” Opus tier	Established Opus as top-tier brand
Claude Opus 4.5	Nov 24, 2025	Coding + workplace tasks	Incremental Opus improvement
Claude Opus 4.6	Feb 4, 2026	1M context, adaptive reasoning, 128k output	Current flagship; Mythos baseline for comparison
Claude Sonnet 4.6	Feb 17, 2026	Same pricing as Sonnet 4.5, 1M context	Balanced tier upgrade
Claude Mythos (Capybara)	TBD 2026	Step-change across all benchmarks	Subject of this report

Gap in current knowledge#

No official benchmark numbers for Mythos have been released. The leaked draft blog posts used qualitative language (“dramatically higher scores”) without publishing specific figures. All capability comparisons in this report are therefore directional, not quantitative.

📋 Extended Model Context

Claude 5 is separately expected in Q2–Q3 2026 (roughly May–September), described as featuring near-AGI reasoning and 500K–1M token context windows. It is unclear whether Claude Mythos/Capybara is the same product as Claude 5 or a distinct release that precedes it. Current reporting treats them as separate efforts, with Mythos/Capybara focused on a specific capability jump above Opus rather than a full generational release⁴.

🔬 Methodology#

Approach#

This report uses secondary source synthesis — aggregating, cross-referencing, and evaluating reporting from multiple independent technology publications that covered the March 2026 data leak. No primary access to Anthropic systems or leaked documents was obtained directly.

flowchart LR
    accTitle: Research Methodology Flow
    accDescr: Web research gathered from leak reporting, synthesized against existing Anthropic model documentation, then cross-referenced with competitive benchmarks.

    leak["🔓 Anthropic Data Leak<br/>March 26–27, 2026"]
    reporting["📰 Press Coverage<br/>Fortune, Futurism, SiliconANGLE,<br/>The Decoder, CNBC, etc."]
    existing["📄 Existing Anthropic Docs<br/>Opus 4.6 benchmarks,<br/>API pricing, release notes"]
    competitor["🏆 Competitor Data<br/>OpenAI, Google, xAI<br/>benchmark comparisons"]
    synthesis["🔍 Synthesis &<br/>Cross-reference"]
    report["📋 This Report"]

    leak --> reporting
    reporting --> synthesis
    existing --> synthesis
    competitor --> synthesis
    synthesis --> report

    classDef source fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
    classDef process fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
    classDef output fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d

    class leak,reporting,existing,competitor source
    class synthesis process
    class report output

Data sources#

Source	Type	Coverage
Fortune (exclusive report)	Technology journalism	Initial leak reporting, Anthropic confirmation¹
Futurism	Technology journalism	Cybersecurity risk analysis⁵
SiliconANGLE	Technology journalism	Reasoning capabilities, release strategy⁶
The Decoder	Technology journalism	Benchmark comparison language⁷
CNBC	Financial journalism	Market impact, cybersecurity stock movement⁸
Anthropic official docs	Primary source	Opus 4.6 benchmarks, pricing, API specs³
Artificial Analysis	Benchmark aggregator	Competitive model comparisons⁹

Limitations of methodology#

⚠️ Known limitations: All Mythos capability data derives from unpublished draft blog posts exposed in the leak. Specific benchmark numbers were not included in those drafts. Competitor comparisons are based on Opus 4.6 benchmarks, not Mythos directly. The situation is actively developing — this report reflects a 48-hour snapshot.

📊 Findings#

Finding 1: Capybara is a new product tier, not just a new model#

Anthropic’s current product hierarchy is three tiers: Haiku → Sonnet → Opus. The leaked draft blog explicitly states that Capybara is a new tier name, not a model name: “Capybara is a new name for a new tier of model: larger and more intelligent than our Opus models — which were, until now, our most powerful.” Claude Mythos is the first specific model released under the Capybara tier¹.

flowchart TD
    accTitle: Anthropic Model Tier Hierarchy
    accDescr: Anthropic's four-tier model hierarchy as of 2026, showing Capybara as the new top tier above Opus, with Claude Mythos as the first Capybara-tier model.

    haiku["🐦 Haiku<br/><b>Fast · Cheap</b><br/>Best for: high-volume, latency-sensitive tasks"]
    sonnet["🎵 Sonnet<br/><b>Balanced</b><br/>Best for: everyday tasks, cost-effective intelligence"]
    opus["🏔️ Opus<br/><b>Flagship</b><br/>Best for: complex reasoning, agentic work<br/>Current: Opus 4.6"]
    capybara["🦫 Capybara<br/><b>Breakthrough</b> ← NEW TIER<br/>Best for: frontier research, cybersecurity, novel problems<br/>First model: Claude Mythos"]

    haiku --> sonnet --> opus --> capybara

    classDef standard fill:#e0f2fe,stroke:#0369a1,stroke-width:2px,color:#0c4a6e
    classDef new fill:#fef3c7,stroke:#d97706,stroke-width:3px,color:#78350f

    class haiku,sonnet,opus standard
    class capybara new

📌 Key insight: Capybara is to Opus what Opus was to Sonnet — a qualitatively different capability level, not just a tuned variant. This means it will also be priced accordingly (more expensive than Opus 4.6).

Finding 2: Capability jump is described as a “step change” across all key domains#

The leaked draft blog posts characterized Claude Mythos as achieving “dramatically higher scores” than Claude Opus 4.6 across three primary domains: software coding, academic reasoning, and cybersecurity⁷. Anthropic’s internal characterization used the phrase “step change” — language Anthropic reserves for capability discontinuities rather than incremental improvements.

Radar chart comparing relative capability levels across tiers (illustrative, based on directional language from leaked drafts — not official benchmark figures):

xychart-beta
    title "Claude Model Capability Comparison"
    x-axis [Coding, Reasoning, Cyber, Context, Writing]
    y-axis "Score" 0 --> 100
    bar [30, 35, 25, 40, 45]
    bar [58, 60, 50, 75, 70]
    bar [74, 80, 65, 90, 85]
    bar [92, 93, 95, 90, 88]

Legend: Haiku 4.5 | Sonnet 4.6 | Opus 4.6 | Mythos (Capybara)

⚠️ Values above are directional estimates based on qualitative leak language, not official Anthropic benchmarks.

Finding 3: Cybersecurity capability is the headline differentiator — and the primary risk#

The leaked draft described Claude Mythos as “currently far ahead of any other AI model in cyber capabilities.” This is both Mythos’s most significant competitive advantage and the primary reason Anthropic is delaying general availability⁵.

The draft blog warned that the model “could allow attacks to scale faster than defenders could counter them” and described its capabilities as “unprecedented cybersecurity risks.” In a notable irony, this characterization was itself exposed by a cybersecurity failure (CMS misconfiguration)⁵.

Anthropic’s rollout strategy directly addresses this:

flowchart LR
    accTitle: Claude Mythos Phased Rollout Strategy
    accDescr: Anthropic's cautious release strategy for Claude Mythos, beginning with cyber defense organizations before broader commercial availability.

    trained["✅ Training<br/>Complete<br/>March 2026"]
    earlyaccess["🔒 Early Access<br/>Cyber Defense Orgs<br/>Q1–Q2 2026"]
    evaluation["🔬 Evaluation<br/>Period<br/>Q2–Q3 2026"]
    commercial["🌐 Broader<br/>Commercial Release<br/>Late 2026 (est.)"]

    trained --> earlyaccess --> evaluation --> commercial

    classDef done fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
    classDef active fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
    classDef future fill:#e0e7ff,stroke:#4338ca,stroke-width:2px,color:#1e1b4b

    class trained done
    class earlyaccess active
    class evaluation,commercial future

📌 Key insight: Anthropic is deliberately giving cyber defense organizations a head start — effectively treating Mythos’s cybersecurity capability as a dual-use risk that requires defenders to be equipped before the model is broadly accessible to potential attackers.

Finding 4: Claude Opus 4.6 — the current baseline — is already highly competitive#

To contextualize the magnitude of the Mythos “step change,” it’s important to understand where Opus 4.6 sits in the current landscape. As of February 2026, Opus 4.6 is competitive or leading across most major benchmarks³⁹:

Benchmark	Claude Opus 4.6	Notes
SWE-bench (coding)	~74%	Competitive; Grok 4 leads at 75%, GPT-5.4 at 74.9%
GPQA Diamond (science reasoning)	Leads by 3.5pts vs GPT-5.4	Best in class for graduate-level science
Terminal-Bench 2.0 (agentic coding)	65.4%	State-of-the-art agentic performance
OSWorld (computer use)	72.7%	Leading computer use benchmark
BrowseComp (agentic search)	84.0%	Strong agentic web interaction
Humanity’s Last Exam (reasoning)	53.1% (with tools)	Frontier reasoning performance
BigLaw Bench (legal reasoning)	90.2%	Best legal reasoning score in Claude family
ARC AGI 2	68.8%	Novel problem-solving
MRCR v2 (long-context)	76%	Strong long-context retrieval
Context window	1M tokens	At standard pricing ($5/$25 per MTok)
Max output	128k tokens	Double the previous 64k limit

If Mythos delivers “dramatically higher scores” across coding, reasoning, and cybersecurity on top of this baseline, it represents a significant capability advance.

Finding 5: The competitive landscape as of Q1 2026#

quadrantChart
    title AI Model Landscape - Capability vs Availability
    x-axis Low Availability --> High Availability
    y-axis Lower Capability --> Higher Capability
    quadrant-1 Available Leaders
    quadrant-2 Restricted Leaders  
    quadrant-3 Restricted Standard
    quadrant-4 Available Standard
    Claude Mythos: [0.12, 0.97]
    Claude Opus 4.6: [0.75, 0.82]
    GPT-5.4: [0.85, 0.79]
    Gemini 3.1 Pro: [0.80, 0.76]
    Grok 4: [0.65, 0.78]
    Claude Sonnet 4.6: [0.90, 0.65]
    Claude Haiku 4.5: [0.95, 0.42]

Model	Maker	Coding	Reasoning	Writing	Ecosystem	Context
Claude Mythos	Anthropic	★★★★★	★★★★★	★★★★☆	Limited	TBD
Claude Opus 4.6	Anthropic	★★★★☆	★★★★★	★★★★★	Claude Code, Cursor	1M tokens
GPT-5.4	OpenAI	★★★★☆	★★★★☆	★★★★☆	Largest ecosystem	Large
Gemini 3.1 Pro	Google	★★★★☆	★★★★★	★★★☆☆	Google Workspace	1M tokens
Grok 4	xAI	★★★★★	★★★★☆	★★★☆☆	X/Twitter	Large
Claude Sonnet 4.6	Anthropic	★★★★☆	★★★★☆	★★★★☆	Full API access	1M tokens

📌 Key insight: No competitor has announced a model in a tier equivalent to Capybara. If Mythos delivers on the leaked characterization, it would represent a meaningful capability lead — at least temporarily — particularly in cybersecurity and agentic coding.

💡 Analysis#

Interpretation#

RQ1 — What is Claude Mythos/Capybara? It is the first model in a new fourth tier of Anthropic’s product hierarchy, positioned above Opus. “Capybara” is the tier; “Mythos” is the first model. This is analogous to Anthropic’s 2024 introduction of Opus as a tier above Sonnet — a structural, not just incremental, upgrade.

RQ2 — How do capabilities compare? The only quantitative comparisons available are for Opus 4.6 vs. competitors — and Opus 4.6 is already at or near the frontier in reasoning and coding. If Mythos is a genuine “step change” above that baseline, its absolute capability would represent a new frontier for AI, particularly in cybersecurity, where no competitor currently claims comparable performance.

RQ3 — What are the cybersecurity implications and release strategy? Anthropic is managing Mythos as a dual-use capability — genuinely useful for cybersecurity defense, but dangerous if accessible to adversarial actors. The phased rollout starting with defense organizations is a meaningful safety measure. The estimated general availability window of late 2026 may be tied to Anthropic’s anticipated IPO, suggesting business incentives align with this timeline⁶.

Implications#

For organizations:

Monitor early-access announcements — if your work touches cyber defense or advanced agentic use cases, early access may be worth pursuing
Current stack planning should account for a new top-tier pricing bracket (likely above Opus 4.6’s $5/$25 per MTok) entering the market in H2 2026
Claude Opus 4.6 remains the practical recommendation for all current use cases

For security planning:

The Mythos cybersecurity capability cuts both ways — organizations should treat the general availability window as a deadline to harden systems, not wait for the model to be available to test against them
Anthropic’s own data leak (the event that revealed Mythos) is a pointed reminder that AI lab operational security is itself imperfect⁵

Limitations#

All Mythos capability data is derived from draft marketing copy, not technical papers or reproducible benchmarks
“Dramatically higher scores” is subjective — we do not know the absolute or relative magnitude
The October 2026 release estimate is speculative, based on IPO alignment inference, not Anthropic statements

💬 Discussion Notes

The leak itself has market implications: cybersecurity stocks fell on the CNBC report about Mythos’s capabilities⁸, reflecting investor concern that AI may outpace existing security tooling
Some analysts have pointed out that the “unprecedented cybersecurity risk” framing may also be strategic — positioning Anthropic as a responsible actor taking precautions, while building pre-release hype⁵
The name “Mythos” (mythology, foundational narrative) versus the codename “Capybara” (a large, calm semi-aquatic rodent) reflects Anthropic’s tradition of whimsical internal codenames with weighty product names

🎯 Conclusions#

Summary#

Claude Mythos is Anthropic’s most capable AI model to date, accidentally revealed on March 26–27, 2026 through a CMS misconfiguration. It occupies a new “Capybara” tier above Opus — a structural capability level, not just a model upgrade. Leaked draft blog posts describe it as delivering dramatically higher scores across coding, academic reasoning, and cybersecurity compared to Opus 4.6, with cybersecurity capability described as “far ahead of any other AI model.” Training is complete; the model is in early-access testing with cyber defense organizations. General availability is estimated for late 2026. No competitors have announced an equivalent tier.

Recommendations#

Watch early-access announcements — Anthropic will expand the Capybara program from defense orgs to broader enterprise. Organizations should position to apply for early access if/when that program opens, especially those with AI/cybersecurity-adjacent work.
Treat late 2026 as a capability inflection point — Current AI tool evaluations and ROI planning should account for a significant new capability tier entering general availability within approximately 6–9 months.
Maintain Opus 4.6 as the current recommendation — Sonnet 4.6 for cost-sensitive use cases, Opus 4.6 for complex reasoning and agentic tasks. Mythos is not yet accessible; do not delay current AI adoption waiting for it.
Prepare security posture for a more capable threat landscape — The Mythos cybersecurity capability will eventually reach adversarial actors. Organizations should use the remaining pre-release window to audit and harden systems.

Future work#

Benchmark watch — As Anthropic publishes official Mythos data, update this note with actual numbers
Pricing analysis — Once the Capybara tier is priced, compare cost-per-capability against Opus 4.6 to inform procurement decisions
Claude 5 tracking — Separately monitor Claude 5 (expected Q2–Q3 2026) to determine if it is the same product as Mythos or a distinct release

🔗 References#

All sources cited in this report:

Last updated: 2026-03-28

Fortune. (2026, March 26). “Exclusive: Anthropic ‘Mythos’ AI model representing ‘step change’ in power revealed in data leak.” Fortune. https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/ ↩︎ ↩︎ ↩︎
Various. (2026). “AI Models in 2026: Which One Should You Actually Use?” GuruSup. https://gurusup.com/blog/ai-comparisons ↩︎
Anthropic. (2026, February 4). “Introducing Claude Opus 4.6.” Anthropic. https://www.anthropic.com/news/claude-opus-4-6 ↩︎ ↩︎ ↩︎
Claude5.com. (2026). “When Is Claude 5 Coming Out? Q2 2026 (Here’s the Evidence).” Claude 5 Hub. https://claude5.com/news/when-is-claude-5-coming-out-release-date-prediction ↩︎
Futurism. (2026, March 27). “Anthropic Just Leaked Upcoming Model With ‘Unprecedented Cybersecurity Risks’ in the Most Ironic Way Possible.” Futurism. https://futurism.com/artificial-intelligence/anthropic-step-change-new-model-claude-mythos ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
SiliconANGLE. (2026, March 27). “Anthropic to launch new ‘Claude Mythos’ model with advanced reasoning features.” SiliconANGLE. https://siliconangle.com/2026/03/27/anthropic-launch-new-claude-mythos-model-advanced-reasoning-features/ ↩︎ ↩︎
The Decoder. (2026, March 27). “Anthropic leak reveals new model ‘Claude Mythos’ with ‘dramatically higher scores on tests’ than any previous model.” The Decoder. https://the-decoder.com/anthropic-leak-reveals-new-model-claude-mythos-with-dramatically-higher-scores-on-tests-than-any-previous-model/ ↩︎ ↩︎
CNBC. (2026, March 27). “Cybersecurity stocks fall on report Anthropic is testing a powerful new model.” CNBC. https://www.cnbc.com/2026/03/27/anthropic-cybersecurity-stocks-ai-mythos.html ↩︎ ↩︎
Artificial Analysis. (2026). “Claude Opus 4.6 (max) — Intelligence, Performance & Price Analysis.” Artificial Analysis. https://artificialanalysis.ai/models/claude-opus-4-6-adaptive ↩︎ ↩︎