Microsoft's hill-climbing machine is here

Presented by

Good morning, AI enthusiasts. Microsoft built seven AI models from scratch, trained none of them on its competitors' outputs, and then had the audacity to put one of them in a blind evaluation against Claude Sonnet 4.6 and win.

If that sounds like a company that's done depending on OpenAI and Anthropic to tell them where the frontier is, you're reading it right.

In today's recap:

Microsoft launches seven in-house AI models
AI tops law professors in Stanford blind study
Audit and cap your team's AI tool budget
Trump's AI order: voluntary oversight, no licensing
4 new AI tools, prompts, and more

MICROSOFT

Microsoft launches seven models to rival Anthropic

Microsoft

Recaply: Microsoft just unveiled seven in-house AI models. The flagship, MAI-Thinking-1, beat Claude Sonnet 4.6 in blind human evaluations.

Key details:

The family covers five areas: reasoning, coding, images, voice, and transcription. MAI-Code-1-Flash has 5 billion parameters and beats Claude Haiku 4.5 on every coding benchmark, using 60% fewer tokens. MAI-Transcribe-1.5 ranks first on FLEURS across 43 languages and runs 5x faster than rivals.
Microsoft trained all seven from scratch. No distillation from competitors. Clean data only, with Maia 200 silicon co-design delivering a 1.4x efficiency gain.
Microsoft also launched Frontier Tuning, which uses RL to adapt models to a company's own workflows. A MAI model tuned for Excel matched GPT-5.4 at up to 10x lower cost. A McKinsey-tuned version had the highest win rate of any model tested at roughly 10x lower cost.
The models are on OpenRouter, Fireworks, and Baseten. MAI-Code-1-Flash is built into GitHub Copilot and VS Code. Microsoft and Mayo Clinic also announced a co-created frontier health AI model.

Why it matters: There's been plenty of talk about Microsoft as an AI reseller. That story doesn't hold anymore. With seven models trained from scratch and a flagship that beats Anthropic's Sonnet in blind tests, Microsoft is now a real frontier lab. The trillion-dollar compute scaleup isn't one company's bet. It's the new standard for anyone who wants to stay in the game.

PRESENTED BY PITCH

From prompt to polished deck, without losing your brand

Recaply: Most AI presentation tools take your brand colors and drop them on generic layouts. Pitch Agent builds from your actual template, so the output looks like your team made it.

With Pitch Agent, you get:

Full decks generated from a prompt or additional context files, with visuals matched to your template's style
Conversational edits: rewrite slides, split dense content, swap images
Chat to pressure-test your story: "What objections am I missing?" "Does this flow?"
Follow-up content drafted from your deck: emails, talking points, summaries

Create with Pitch Agent.

AI RESEARCH

AI beats law professors in Stanford's blind study

Standford

Recaply: Stanford Law researchers just found that AI beat law professors in a blind study. In 75% of nearly 3,000 comparisons, 16 professors preferred AI-generated answers over responses from their peers.

Key details:

The study was led by Professor Julian Nyarko at Stanford's liftlab, with co-authors from Yale, NYU, and the University of Chicago. All 16 professors evaluated answers without knowing whether AI or a colleague wrote them.
AI won 75% of matchups. Professors flagged AI responses as harmful to students only 3.5% of the time. Peer-written answers got flagged 12% of the time.
The team tested AI on contract law questions that require judgment and nuanced reasoning, not just factual recall. They also tested Google's NotebookLM, with varying results.
Even when AI's context limits hurt its answers, professors still frequently preferred them over the human alternatives.

Why it matters: Studies like this usually get dismissed. People say AI can't handle judgment calls. But this study chose law specifically to test that claim, and AI won. If AI tutors now outperform the professors who wrote the questions, on their own subject, with their own students, the simple-tasks-only argument gets harder to make. Nyarko's team isn't pushing for wholesale AI adoption. But they say blanket skepticism may be equally unwarranted.

GUIDES

Audit and cap your team's AI tool budget

Recaply: In this tutorial, you will learn how to track, categorize, and cap your team's AI tool spending before it runs over, eliminating the budget surprises that hit Uber, Microsoft, and thousands of other teams running high-token workflows.

Step-by-step:

Pull AI tool invoices from the last 90 days and group by tool (Claude Code, GitHub Copilot, Cursor, ChatGPT Enterprise), team, and use case. Most providers export usage CSVs directly from their billing portal or admin dashboard.
Calculate cost per task type: identify which workflows (code generation, code review, debugging, documentation, agent loops) consume the most tokens, then benchmark against output quality to find where cost-to-value is lowest. Agentic tasks and multi-turn agent loops are usually the biggest surprises.
Set per-team monthly token budgets by converting your targets into dollar caps in each tool's settings. Claude Code, GitHub Copilot Enterprise, and ChatGPT Enterprise all support spending limits or usage caps at the team or org level.
Build a lightweight dashboard in a spreadsheet or your existing BI tool: actual spend vs. cap per team per tool, refreshed weekly from provider API exports or manual CSV pulls. Automate the pull if your team is comfortable with the provider's API.
Review monthly: drop or downgrade tools where actual usage stays below 20% of cap, and shift high-usage teams to cheaper alternative models (MAI-Code-1-Flash, Claude Haiku, Gemini Flash Lite) for routine tasks that don't need frontier performance.

Pro tip: Most enterprise AI tools let you route task types to different models through an API or configuration layer. Set your system to auto-route simple completion tasks (linting, comment generation, test scaffolding) to lower-cost models and reserve frontier models for complex multi-step agent tasks that actually benefit from the extra capability.

TOGETHER WITH ADQUICK

Real-World Ads, Simple to Run

With AdQuick, executing Out Of Home campaigns is as easy as running digital ads. Plan, deploy, and measure your real-world advertising effortlessly — so your team can scale campaigns and maximize impact without the headaches.

Visit AdQuick.com

POLICY

Trump signs AI executive order on security, not regulation

Getty Images

Recaply: Trump just signed a new AI executive order. It's about cybersecurity and keeping AI innovation going, not regulating it.

Key details:

Federal agencies have 30 days to strengthen their cyber defenses. The Treasury will build a clearinghouse to share data on software vulnerabilities across agencies and private sector partners.
AI companies can share new frontier models with the government up to 30 days before public release. This is optional. The order is explicit: no mandatory licensing, no approval process, no permitting.
Section 4 adds criminal enforcement: anyone who uses AI to break into computer systems faces federal charges, covering both public and private networks.
The EO focuses on cybersecurity and innovation, a deliberate shift from Biden's 2023 order that covered safety, bias, and environmental risk, which Trump repealed in his first weeks in office.

Why it matters: The 30-day sharing window is the part people will debate. It isn't a government veto. It's closer to an early warning window, where developers share access and the government shares what it finds. Whether that feels truly optional when your competitors don't do the same is a fair question. But it's a softer approach than anything Congress has proposed, and it doesn't require approval to ship.

TOOLS

Trending AI Tools

🤖 Microsoft Scout - Microsoft's always-on Autopilot agent that runs across Teams, Outlook, OneDrive, and SharePoint, acting on your behalf without needing a prompt each time
🧠 Hermes Desktop - Nous Research's open-source, self-improving agent app (macOS, Windows, Linux) that learns your projects and creates its own skills from experience
🌐 OpenAI Sites - Codex's new output format that turns any result into an interactive website or app your team can explore and share with a URL
⚙️ Devin Desktop - Cognition's f

NEWS

What Matters in AI Right Now?

Uber is capping employee usage of AI tools including Claude Code after exhausting its entire 2026 AI budget in just four months. Teams must now justify spending before accessing high-cost models.
Anthropic just expanded Project Glasswing to approximately 150 new organizations across 15+ countries, bringing total partners scanning codebases with Claude Mythos for vulnerabilities to more than 200. Existing partners have already found over 10,000 high- or critical-severity flaws.
OpenAI just launched six role-specific plugins for Codex connecting 62 apps and 110 skills for analysts, marketers, sales teams, designers, investors, and bankers, alongside Sites, which turns any Codex output into an interactive website shareable via URL. More than 5 million people now use Codex weekly, with non-developers growing 3x faster than developers.
OpenAI is leading a $40M Series B into Opal, which is rebranding from a webcam maker to Opal Electronics. An AI audio product currently being tested by Sam Altman and researchers at OpenAI, xAI, and Anthropic is expected to launch in three to four months, with Opal now valued at $275M.
Factory AI just introduced Factory Router, an automatic model router for coding tasks that delivers 99% of Opus 4.7's pass rate on Terminal-Bench 2 at 20% lower cost and 96% on Legacy-Bench at 25% lower cost. It's in private research preview via the Factory CLI and Desktop App.
Microsoft is canceling Claude Code licenses for thousands of its developers, who will move to GitHub Copilot CLI instead, a notable shift given Microsoft's simultaneous role as an OpenAI investor and Anthropic customer.
Researchers found that the COVID pandemic left 55,000 cancer cases undiagnosed across seven nations, with delays in screening and care access creating a backlog that health systems are still working through.
America's data center build-out is falling way behind schedule, according to the Wall Street Journal, with permitting delays, construction labor shortages, and equipment backlogs threatening to constrain AI infrastructure at the worst possible time.

🧡 Enjoyed this issue?

🤝 Recommend our newsletter or leave a feedback.

How'd you like today's newsletter?

Your feedback helps me create better emails for you!

Cheers, Jason

Connect on LinkedIn, & Twitter.

Microsoft's hill-climbing machine is here

MICROSOFT

Microsoft launches seven models to rival Anthropic

PRESENTED BY PITCH

From prompt to polished deck, without losing your brand

AI RESEARCH

AI beats law professors in Stanford's blind study

GUIDES

Audit and cap your team's AI tool budget

TOGETHER WITH ADQUICK

Real-World Ads, Simple to Run

POLICY

Trump signs AI executive order on security, not regulation

TOOLS

Trending AI Tools

NEWS

What Matters in AI Right Now?

How'd you like today's newsletter?

Reply

Keep Reading

Recaply AI

Stay Updated

Resources

Company