DeepSeek drops a million-token open-source model

Presented by

Good morning, AI enthusiasts. DeepSeek just released its biggest model yet, and it's open-source, MIT-licensed, and built to hold an entire codebase in a single prompt.

The catch? It has 1.6 trillion parameters. But a new architecture cuts inference compute to 27% of the last version, making the one-million-token window practical to actually run. Is this the open-source moment that changes how builders think about frontier models?

In today's recap:

DeepSeek V4, a million-token open-source model
OpenAI's GPT-5.5, built for real agentic work
Build a daily research digest with Brave Ocelot
Anthropic's Claude Code bugs, three explained
4 new AI tools, prompts, and more

DEEPSEEK

DeepSeek launches V4 with million-token context

DeepSeek

Recaply: DeepSeek just released V4, a pair of open-source Mixture-of-Experts models with one-million-token context windows and an architecture that cuts inference compute to 27% of the previous generation.

Key details:

V4 uses a new attention design that drops KV cache usage to just 10% of DeepSeek-V3.2's, allowing the full one-million-token window to run without the memory overhead that makes long-context models impractical at scale.
DeepSeek-V4-Pro has 1.6T total parameters with 49B activated, while V4-Flash has 284B total parameters with 13B activated, both trained on more than 32T tokens.
V4-Pro-Max, the maximum reasoning effort mode, claims to be the best open-source model available today, with top-tier coding benchmarks and a narrowing gap with closed frontier models on reasoning and agentic tasks.
Both models are available now on HuggingFace and ModelScope under MIT license. V4-Flash delivers comparable reasoning to V4-Pro when given a larger thinking budget.

Why it matters: DeepSeek has a track record of open models that compete with OpenAI and Anthropic. V4 adds something new: a real one-million-token context window that is cheap enough to actually run. For builders, an MIT-licensed 1.6T-parameter model that can hold an entire codebase in one prompt is a different kind of tool than what existed a year ago.

PRESENTED BY MAXIO

What 2,000 SaaS Companies Reveal About Growth in 2026

Is your growth in-line with your peers in B2B SaaS & AI?

Benchmark yourself against actual billings data for Maxio’s 2000+ global customers, alongside firsthand company perspectives to understand how growth varied by company size, business model, and strategic focus.

Key takeaways from the report:

Average growth across 2,000 companies
Growth by revenue band
AI-led vs AI-enhanced. Who performed better?

Download the report

OPENAI

OpenAI ships GPT-5.5 for agentic work

OAI

Recaply: OpenAI just rolled out GPT-5.5, its smartest model to date, built for agentic coding, computer use, and long-horizon knowledge work, matching GPT-5.4's per-token latency at a much higher intelligence ceiling.

Key details:

GPT-5.5 understands complex goals, uses tools autonomously, checks its own work, and carries multi-step tasks through to completion without needing user direction at each step.
It scores 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro, delivering state-of-the-art intelligence at half the cost of competing frontier coding models on Artificial Analysis's Coding Index.
Dan Shipper, CEO of Every, called it "the first coding model I've used that has serious conceptual clarity," after it successfully rewrote a broken app that GPT-5.4 couldn't diagnose.
GPT-5.5 is rolling out now to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with API access coming very soon as safety requirements are finalized with partners.

Why it matters: There's been real talk about whether frontier model jumps still matter for everyday work. GPT-5.5 answers with efficiency: it scores higher and uses fewer tokens to do it. For engineering teams, the practical change is being able to hand off a messy, multi-part task and trust the model to reason through ambiguous failures on its own, without constant supervision.

GUIDES

Build a daily research digest with Brave Ocelot

Recaply: In this tutorial, you will learn how to use Brave's new Ocelot model, trained specifically for web content summarization, to turn 20+ daily sources into a single clean briefing without manual reading.

Step-by-step:

Download Brave browser at brave.com and open the Leo sidebar with Cmd+Shift+L on Mac or Ctrl+Shift+L on Windows. In Leo settings, select Ocelot as your active model, Brave's new web-summarization AI built into the browser.
Create a bookmarks folder called "Daily Digest" and add 20+ URLs you want to track each morning: AI newsletters, Hacker News, industry blogs, and key sites for your work.
Each morning, open a source URL from your folder, then open Leo and prompt: "Summarize this page in 3 bullet points: key finding, main announcement, and one thing a builder should act on."
Copy Leo's response into a running note in Notion, Obsidian, or any plain text file. Ocelot is trained on web content, so summaries are tighter and more accurate than a general-purpose model.
After all sources are done, paste your full bullet list into Leo and prompt: "What are the 3 most important things I should pay attention to today?" You now have a curated daily briefing.

Pro tip: For a fully automated version, clone the brave/ocelot repo on GitHub. It includes a Playwright script for automated page collection and a Python inference pipeline, so you can run Ocelot locally across a URL list without opening the browser each time.

ANTHROPIC

Anthropic explains three Claude Code bugs

Anthropic

Recaply: Anthropic just published a detailed postmortem on three changes that degraded Claude Code quality over the past month, with all issues now fixed and usage limits reset for all subscribers.

Key details:

Three separate bugs compounded: on March 4, reasoning effort was quietly downgraded from high to medium; on March 26, a caching bug started clearing Claude's thinking context on every turn instead of just once; on April 16, a verbosity prompt change hurt coding quality.
The caching bug caused sessions to drain usage limits faster than expected, triggering cache misses every turn and making Claude run tasks without memory of its own prior decisions.
Anthropic said the result showed up as forgetfulness, odd tool choices, and repetition, with three bugs hitting at different times making the combined effect look like broad, inconsistent degradation that was hard to pin down.
All three fixes landed by April 20 in Claude Code v2.1.116, affecting Sonnet 4.6, Opus 4.6, and Opus 4.7. Anthropic reset usage limits for all subscribers on April 23.

Why it matters: Labs rarely say this much about what went wrong. Anthropic named the exact bug dates, the affected model versions, and the API header that misfired. For developers, it confirms that the quality drop they noticed was real, not imagined. That kind of honesty doesn't always come from a frontier lab.

TOOLS

Trending AI Tools

🤖 DeepSeek V4 Pro - DeepSeek's open-source 1.6T-parameter model with one million-token context
🎤 MiMo-V2.5 Voice - Xiaomi's full-stack voice suite for the agent era
🔍 Brave Ocelot - Brave's open-source model trained specifically to summarize web content
🚀 Gemini Enterprise Agent Platform - Google Cloud's platform to build, scale, govern, and optimize enterprise agents

NEWS

What Matters in AI Right Now?

Meta is cutting about 8,000 employees, or 10% of its workforce, with job cuts beginning May 20. The company is also scrapping plans to fill 6,000 open roles, citing an AI efficiency push following several smaller rounds of reductions since last year.
Microsoft is offering voluntary retirement buyouts to up to 7% of its US workforce, roughly 8,750 employees, in the first such program in the company's 51-year history. Employees qualify if their age plus years of Microsoft service totals 70 or more.
Anthropic launched a public beta of memory for Claude Managed Agents, letting agents retain and share context across sessions via a filesystem-based memory layer. Rakuten reported 97% fewer first-pass errors and 27% lower costs after deploying agents with memory in production.
xAI introduced Grok Voice Think Fast 1.0, claiming the top spot on the Tau Voice Bench for complex, multi-step voice workflows. It's designed to handle background noise, accents, and interruptions better than competing voice models.
Disney built an internal AI Adoption Dashboard, with Business Insider reporting that a single employee made 460,000 Claude chatbot invocations in just nine days. The stat shows how unevenly AI adoption distributes inside large enterprises.
The Trump administration is backing off its standoff with Anthropic, according to Politico, which reported the administration is dialing back its adversarial posture toward what sources describe as a potential truce.

🧡 Enjoyed this issue?

🤝 Recommend our newsletter or leave a feedback.

How'd you like today's newsletter?

Your feedback helps me create better emails for you!

Cheers, Jason

Connect on LinkedIn, & Twitter.

DeepSeek drops a million-token open-source model

DEEPSEEK

DeepSeek launches V4 with million-token context

PRESENTED BY MAXIO

What 2,000 SaaS Companies Reveal About Growth in 2026

OPENAI

OpenAI ships GPT-5.5 for agentic work

GUIDES

Build a daily research digest with Brave Ocelot

ANTHROPIC

Anthropic explains three Claude Code bugs

TOOLS

Trending AI Tools

NEWS

What Matters in AI Right Now?

How'd you like today's newsletter?

Reply

Keep Reading

Recaply AI

Stay Updated

Resources

Company