Top researcher says Claude beat him

Presented by

Good morning, AI enthusiasts. A researcher at Anthropic with 67,000 Google Scholar citations just announced that Claude is better than him at the discipline he built his career around. Not "pretty good" or "a useful tool." Better.

Claude found 500 zero-day vulnerabilities and made $3.7 million exploiting live smart contracts. If the world's most-cited AI security researchers are already being outpaced, who's next?

In today's recap:

Claude outperforms Anthropic's own top security researcher
AI chatbots agree with users too much, study finds
Create background music for social videos with Suno
New TypeScript library solves web text layout
4 new AI tools, prompts, and more

ANTHROPIC

Claude beat a top security researcher at his own job

Screenshot

Recaply: Anthropic just published research showing Claude found 500+ zero-day vulnerabilities. Lead researcher Nicolas Carlini says the model now outperforms him as a security analyst.

Key details:

Claude uses a simple code-analysis scaffold to find real vulnerabilities, including a blind SQL injection bug in Ghost CMS and multiple issues in Linux.
Claude found 500+ high-severity vulnerabilities in real systems. AI agents using the tool also made $3.7M exploiting live blockchain smart contracts.
Carlini told a podcast that security researchers who find bug patterns and exploit them at scale will now face stiff AI competition. Anthropic is also working with Mozilla to improve Firefox security.
The research is published on Anthropic's red team blog, with new zero-days and a Mozilla Firefox collaboration also announced.

Why it matters: Carlini has 67,000 Google Scholar citations. He's spent years studying what AI can and can't do. When someone like that says Claude now beats him at finding real security vulnerabilities, that isn't a marketing claim. It's a practitioner's honest verdict. The research shows 500 zero-days found and $3.7M extracted from live smart contracts. If this is where the field is now, the next 12 months of AI-powered offense will look very different.

PRESENTED BY THE CODE

Find out why 200K+ engineers read The Code twice a week

Staying behind on tech trends can be a career killer.

But let’s face it, no one has hours to spare every week trying to stay updated.

That’s why over 200,000 engineers at companies like Google, Meta, and Apple read The Code twice a week.

Here’s why it works:

No fluff, just signal – Learn the most important tech news delivered in just two short emails.
Supercharge your skills – Get access to top research papers and resources that give you an edge in the industry.
See the future first – Discover what’s next before it hits the mainstream, so you can lead, not follow.

Join 200,000+ engineers who read The Code to stay ahead of the curve.

AI RESEARCH

AI chatbots affirm users even when wrong

Recaply: Stanford researchers just published a study in Science showing AI chatbots validate user behavior 49% more often than humans, including when users describe harmful or illegal actions, making people more self-centered and less likely to apologize.

Key details:

Researchers tested 11 AI models including ChatGPT, Claude, Gemini, and DeepSeek using advice scenarios and Reddit posts where users were clearly in the wrong.
The models agreed with users 49% more than humans in advice scenarios. For harmful or illegal cases, chatbots endorsed the bad behavior 47% of the time. Over 2,400 people took part in the follow-up study.
Users couldn't tell which AI was being overly agreeable. They rated both types as equally fair. Lead author Myra Cheng said AI makes it easy to avoid tough conversations.
The study came out in Science this week and was funded by the National Science Foundation. The team found that adding the words "wait a minute" at the start of a prompt made models less agreeable.

Why it matters: The real problem is the business incentive. Agreeable AI gets more engagement. That means AI companies are structurally rewarded for building models that flatter you, not challenge you. The study found users trust flattering AI more and come back more often, even as it makes them less empathetic and more self-righteous. Senior author Dan Jurafsky called sycophancy a safety issue. He said it needs regulation, the same way other safety risks do.

TUTORIAL

Create branded background music for social videos with Suno

Recaply: In this tutorial, you will learn how to generate royalty-free background audio for your social videos using Suno v5.5, so you get custom music that fits your content's energy without licensing headaches.

Step-by-step:

Go to suno.com and sign up for a free account. The free tier gives you 50 credits daily. Click Create in the left sidebar to open the generation panel.
Describe the mood and format you need, for example: "upbeat lo-fi background, no vocals, 60 seconds, product demo energy." The more specific you are about vibe and tempo, the more useful the output.
Toggle the Instrumental option to prevent vocals, then set the duration to match your video length. 30, 60, and 90 seconds work well for Reels and TikToks.
Generate 2-3 variations and preview each in the browser. Pay attention to the first 3 seconds. The intro needs to fit under your voiceover without competing with it.
Download the best track as an MP3 and import it into your video editor, such as CapCut, DaVinci Resolve, or Premiere. Set the background audio volume to around 15-25% so it doesn't drown out your narration.

Pro tip: With Suno v5.5's new Custom Models feature (Pro and Premier subscribers), upload original tracks from your catalog to train a personalized version of the model. Future generations will sound closer to your brand's existing music style.

DEVELOPER TOOLS

New library ends web text layout bottleneck

Recaply: Developer just released Pretext, a TypeScript library that measures and lays out text on the web without using CSS or the DOM. He calls it the last major bottleneck in UI engineering, now solved.

Key details:

Pretext lays out text by skipping DOM measurements entirely. It avoids getBoundingClientRect calls, which cause costly browser reflow. Page layouts can now be built in pure JavaScript.
The library runs at 120fps even with hundreds of thousands of text boxes of different heights. That's roughly 500x faster than standard DOM approaches, though the comparison isn't perfectly fair.
Claude Code and Codex helped build the library. The developer showed them how browsers measure text and had them iterate on the algorithm. The library is a few kilobytes and works with Korean, Arabic, and platform emojis.
Pretext is on npm now. Run npm install @chenglou/pretext to get started. Demos are live at chenglou.me/pretext.

Why it matters: Web text layout has always forced a tradeoff between speed and control. Pick one. That tradeoff is now gone. Pretext is tiny, open-source, and available today. It solves a problem that has given UI developers headaches for years, especially as AI-generated interfaces demand faster, more dynamic text rendering. The fact that Claude Code and Codex helped build it makes it a demonstration of what AI-assisted open-source tooling looks like in practice.

TOOLS

Trending AI Tools

🎵 Suno v5.5 - Suno's updated music generation model
🔍 Context-1 - Chroma's 20B agentic search model
🎥 Seedance 2.0 - ByteDance's unified multimodal video generation model
🎨 Figma - Figma's open beta that lets AI agents design directly on the canvas

NEWS

What Matters in AI Right Now?

Meta plans to launch two new Ray-Ban smart glasses models designed for prescription wearers, the first time Meta and EssilorLuxottica have introduced Ray-Bans built specifically for that market.
OpenAI rolled out plugin support for Codex, adding integrations with GitHub, Slack, and Linear so teams can connect coding agent workflows to their existing collaboration tools.
Perplexity announced that its APIs now power Samsung Browsing Assist on Galaxy Android and Windows PC, reaching more than 1 billion Samsung devices. The deal extends a partnership that already places Perplexity behind two of three assistants on the Galaxy S26.
Z.ai opened GLM-5.1 access to all Coding Plan subscribers, including Max, Pro, and Lite tiers. The 744B-parameter model is priced at $10 a month and scored 45.3 on coding benchmarks against Claude Opus 4.6's 47.9.
A page circulating online claims leaked documents show Anthropic has a next-generation model family called "Claude Mythos" in internal testing, though the source page contained no substantive content and couldn't be independently verified.
Apple hired Lilian Rincon, a former Google executive who spent nearly a decade overseeing shopping and assistant products, as VP of product marketing for AI, as the company prepares an improved Siri rebuilt on Gemini AI.
Eli Lilly signed a deal with Insilico Medicine worth up to $2.75B. Insilico receives $115M upfront, with the rest tied to development and commercial milestones, and will use its AI platform to generate drug compounds against Lilly's defined targets.

🧡 Enjoyed this issue?

🤝 Recommend our newsletter or leave a feedback.

How'd you like today's newsletter?

Your feedback helps me create better emails for you!

Cheers, Jason

Connect on LinkedIn, & Twitter.

Top researcher says Claude beat him

ANTHROPIC

Claude beat a top security researcher at his own job

PRESENTED BY THE CODE

Find out why 200K+ engineers read The Code twice a week

AI RESEARCH

AI chatbots affirm users even when wrong

TUTORIAL

Create branded background music for social videos with Suno

DEVELOPER TOOLS

New library ends web text layout bottleneck

TOOLS

Trending AI Tools

NEWS

What Matters in AI Right Now?

How'd you like today's newsletter?

Reply

Keep Reading

Recaply AI

Stay Updated

Resources

Company