April 2026 delivered back-to-back releases: two major open-weight models in one week, the next version of MCP, a massive product consolidation from OpenAI, and a revenue milestone from Anthropic that underscores just how fast this industry is monetizing. Here's what happened and why it matters.

Model Releases

Google Gemma 4: Open-Source, Multimodal, and RAG-Ready

Google released Gemma 4 on April 2, the most capable open model family the company has shipped. Four sizes cover everything from on-device inference to datacenter workloads:

E2B / E4B: Optimized for Android devices and developer laptops
26B MoE: Speed-efficiency tradeoff, ranks #6 on the open-source Arena AI leaderboard
31B Dense: Highest quality, ranks #3 — outperforming models 20× its size

Key specs that matter for builders:

256K token context window on the larger models
Native multimodal: text, image, audio, and video inputs out of the box
Structured JSON output and function calling for agentic workflows
140+ language support
Apache 2.0 license — commercial use, no restrictions

Deployment is available via Google AI Studio, vLLM, and Cloudflare Workers AI from day one. The model is built on Gemini 3 research, which makes the performance-to-size ratio genuinely surprising.

For RAG developers specifically, the combination of 256K context, native function calling, and local deployment options makes Gemma 4 one of the most practical open models to date.

Meta Llama 5: The Open-Source Giant Gets Bigger

Meta released Llama 5 on April 8, and the numbers are a significant jump from Llama 4:

600B+ parameters (up from Llama 4 Maverick's 400B)
5 million token context window — the longest in any publicly available model
Recursive Self-Improvement capability: the model can identify its own reasoning errors mid-task and correct them
Benchmark performance matches GPT-5 and Gemini 2.0 on major evaluations

The 5M context window is the headline. At that scale, an entire enterprise codebase, a year of Slack messages, or a full legal case file fits in a single prompt. The implications for RAG architectures are real: some retrieval steps become optional when you can just feed the whole corpus.

That said, running Llama 5 locally requires serious hardware. For most teams, this is an API model or a cloud-hosted inference story, not a laptop deployment.

Industry Moves

MCP 2.0: Long-Running Workflows and a Formal Standard

The MCP Dev Summit on April 2 announced the next major version of the Model Context Protocol, with three headline additions:

Task capability: servers can now handle long-running autonomous workflows instead of single request-response cycles
Triggers: servers can initiate actions proactively, not just respond to model requests
Native streaming + retry semantics: production-grade reliability for agentic pipelines

Alongside this, six companies — Anthropic, OpenAI, Google, AWS, Microsoft, and Salesforce — co-founded the Agentic AI Foundation (AAIF) under the Linux Foundation. MCP is now the official universal protocol for agent tooling, sitting alongside OpenAI's AGENTS.md and Block's Goose project.

MCP SDK downloads currently run at 110 million per month. Standardization under a neutral foundation removes the "will this be deprecated?" risk that slowed enterprise adoption of proprietary tool-use APIs.

OpenAI's Superapp: One App to Replace Three

In late March, OpenAI announced it was consolidating ChatGPT, Codex, and Atlas into a single desktop superapp. CEO Fidji Simo cited product fragmentation as the driver — users were context-switching between tools for conversation, code, and computer use.

The same week, OpenAI also:

Shut down Sora after six months — citing usage data showing video generation wasn't sticking as a standalone product
Confirmed $25B+ in annualized revenue
Acquired Astral, the Python tooling company behind uv and ruff
Positioned GPT-5.4 (released March 5) as the enterprise model for autonomous professional workflows

The Astral acquisition is the sleeper story here. Controlling the fastest Python package manager and linter gives OpenAI a credible path into developer infrastructure — beyond the model API layer.

Anthropic: $30B Run-Rate, 1,000+ Enterprise Customers

Anthropic disclosed that its run-rate revenue has surpassed $30 billion, up from $9 billion at the end of 2025. More telling: over 1,000 business customers are now spending $1 million or more annually.

The company also signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, expected online starting in 2027. At this revenue trajectory, compute capacity is the binding constraint — not model capability or customer demand.

Claude Code, Anthropic's terminal-native coding agent, was cited as a primary growth driver and reportedly triggered a product acceleration response from OpenAI.

What to Watch

Next two weeks:

Google I/O (May 20): Expected Gemini 3 Pro and further agentic platform announcements
MCP 2.0 SDKs (Python and TypeScript): Final spec release expected late April
EU AI Act enforcement: General-purpose AI model obligations take full effect for frontier models
Llama 5 fine-tuning community: First wave of fine-tuned variants likely appearing on Hugging Face within days

AI Digest: April 2026

Model Releases

Google Gemma 4: Open-Source, Multimodal, and RAG-Ready

Meta Llama 5: The Open-Source Giant Gets Bigger

Industry Moves

MCP 2.0: Long-Running Workflows and a Formal Standard

OpenAI's Superapp: One App to Replace Three

Anthropic: $30B Run-Rate, 1,000+ Enterprise Customers

What to Watch

Related Articles

How to Design a Claude Skill

Meta Releases Llama 5: 600B Parameters, 5M Token Context, and Recursive Self-Improvement

OpenAI's Superapp: ChatGPT, Codex, and Atlas Merge Into One