Good morning, AI enthusiasts. Google just open-sourced a multimodal AI model that runs on any laptop with 16GB of RAM, handles audio and images natively, and costs exactly nothing to deploy.
For most builders, that combination didn't exist before today. The question isn't whether to run it locally. It's what you'll build first.
In today's recap:
Google's Gemma 4 12B, free multimodal AI for any laptop
Gopuff predicts your order before you open the app
Run Gemma 4 12B locally with Ollama
Martin Scorsese joins Black Forest Labs, Hollywood reacts
4 new AI tools, prompts, and more
Gemma 4 12B runs free on your laptop
Recaply: Google just released Gemma 4 12B on Hugging Face, a multimodal model with native audio and vision support that runs on any laptop with at least 16GB of RAM, under Apache 2.0 licensing.
Key details:
Gemma 4 12B handles text, audio, and image inputs natively, letting builders run multimodal workflows on local hardware without API calls or cloud costs.
The model fits on 16GB of RAM, a standard consumer spec, and has already recorded 11.8M downloads on Ollama, with a 26B variant also available.
The Apache 2.0 license allows full commercial use without restrictions, unlike earlier Gemma releases that came with tighter terms.
Available now on Hugging Face and via Ollama with
ollama run gemma4:12b, no waitlist or account required.
Why it matters: There's been lots of talk about local AI being too weak for real work. Gemma 4 12B pushes back, and the Apache 2.0 license makes it hard to argue. A free multimodal model that handles audio and vision on your own hardware means builders can ship image and voice features with no API costs. That shifts the math for a lot of small teams.
PRESENTED BY WISPR FLOW
Talk to your AI tools the way you'd talk to a colleague.
You don't send a colleague a three-word brief. You explain the context, the constraints, what you've already tried. But typing all that into ChatGPT takes forever — so you don't.
Wispr Flow lets you speak your prompts instead. Talk through your thinking naturally and get clean, paste-ready text. No filler words. No cleanup. Just detailed prompts that actually get you useful answers on the first try.
Millions of users worldwide. Works system-wide on Mac, Windows, and iPhone.
GOPUFF & SPACEX
Gopuff predicts your order before you open the app
Recaply: Gopuff just launched Go, an AI shopping assistant co-developed with SpaceXAI that builds a personalized cart automatically when a user opens the app, with delivery available in 15 minutes from 400+ centers.
Key details:
Go uses SpaceXAI reasoning and voice models with Gopuff's 13-year order dataset and real-time signals from X to predict what users want before they search, then prepopulates a cart.
Returning customers can check out in a single tap across 100+ product categories; Gopuff's 400+ micro-fulfillment centers handle delivery in as fast as 15 minutes.
A TikTok-style shoppable feed generates personalized product scenes using behavioral data and local inventory; Grok voice integration lets users ask, adjust, and check out hands-free.
Go is available now in the Gopuff app; tap the Go icon in the main navigation to access it.
Why it matters: Gopuff spent 13 years building fast delivery, but the real friction was always the moment before checkout: the thinking, deciding, and remembering. Go skips it by betting a model knows your next order before you do. If the prediction holds, it turns a convenience app into a habit. Users stop shopping. They start approving.
GUIDES
Run Google's Gemma 4 12B locally with Ollama

Recaply: In this tutorial, you will learn how to run Google's Gemma 4 12B on your laptop using Ollama, so you can test its audio and vision features offline without spending on API credits.
Step-by-step:
Go to ollama.com, download and install Ollama for your operating system (Mac, Windows, or Linux). It's free and takes under two minutes to set up.
Open your terminal and run
ollama pull gemma4:12bto download the 12B model. You'll need about 16GB of free disk space and at least 16GB of RAM.Run
ollama run gemma4:12bto start a chat session in your terminal. Type a test message to confirm it responds before moving on to multimodal inputs.To test vision, use Ollama's Python library: import
ollama, pass an image file in your message content alongside a question, and send it. For audio, pass an audio file the same way using the same API structure.Connect Gemma 4 12B to any OpenAI-compatible app by pointing it at
http://localhost:11434/v1in your API settings, usinggemma4:12bas the model name. Most tools with OpenAI support will work with no code changes.
Pro tip: Run ollama serve as a background process so the model stays loaded across sessions. Any app that supports the OpenAI API format can hit it immediately without restarting.
TOGETHER WITH PULLEY
Better cap table management starts here
Cap table management doesn’t have to be frustrating. From issuing grants to 409A valuations or ASC 718 reporting Pulley can make it simple.
Just ask Linear. They knew they needed a partner who could handle the complexity of their equity management. That’s why they migrated to Pulley.
BLACK FOREST LABS
Scorsese backs Black Forest Labs, Hollywood fights back
Recaply: Martin Scorsese just joined Black Forest Labs as an adviser, saying he used the company's FLUX image model to create storyboards and found it creatively freeing, with Hollywood artists firing back online within hours.
Key details:
Scorsese used FLUX to generate visual storyboards in pre-production, saying it helped him communicate ideas faster to production designers, art directors, and cinematographers while saving time and costs.
Concept artist Karla Ortiz and director Sam Deats both criticized the move on X within hours; Deats wrote that Scorsese was throwing "every storyboard artist under the bus."
BFL frames Scorsese's role as helping shape visual intelligence for cinema more broadly, not a one-off campaign; James Cameron joined Stability AI's board in a comparable move last September.
Scorsese's adviser role is active now; Black Forest Labs published a video of him using FLUX in a working storyboarding session alongside the announcement.
Why it matters: There's been a clear line in Hollywood: artists fight AI as theft, studios test it for cost cuts. Scorsese crossing that line doesn't just break the rule. It erases it. The most decorated filmmaker alive calling AI storyboarding creatively freeing gives studios and AI companies real cover. Whether that holds depends on what FLUX was trained on, and Ortiz asked that question right away.
TOOLS
Trending AI Tools
🤖 Gemma 4 12B - Google's free open-source multimodal model with audio and vision support, runs locally on any 16GB laptop under Apache 2.0
🎨 Ideogram 4.0 - Open image generation model ranked top-8 on LM Arena, with commercial-friendly licensing for production use
🖼️ Reve 2.0 - Text-to-image model using layout-based prompting instead of pure text, 4K quality, currently ranked #2 on image arena
🔊 Miso One - Miso Labs' expressive open-source TTS model for natural voice generation with emotional range
NEWS
What Matters in AI Right Now?
University of Toronto published research showing AI-powered worms built with free open-weight models can adapt their attacks, steal computing power from infected devices, and spread across networks with no single viable defense, with the team sharing findings with national security bodies before release.
Ideogram launched Ideogram 4.0, a new open image model that ranks in the top 8 on LM Arena, with commercial-friendly licensing positioning it as a competitive alternative to paid image generation APIs.
Google Labs introduced Dreambeans, an experimental app that generates AI-powered daily stories from a user's personal data, including calendar events, photos, and location history.
xAI partnered with Cloudflare to make Grok LLMs, audio, image, and video models available through Cloudflare AI Gateway, letting developers access xAI models through the same interface they use for other providers.
Lassie raised $35M led by a16z to automate dental billing with AI agents, with co-founders who worked as dental practice administrators to build the product around real workflow pain.
Miso Labs unveiled Miso One, an expressive open-source text-to-speech model designed for natural voice output with emotional range, available for developer use.
Robinhood launched agentic trading and an agentic credit card for its 27M funded customers, letting AI agents execute stock trades and make purchases on a user's behalf with spending caps and manual approval options.
Microsoft revealed internal planning documents for Scout, its new always-on personal agent in Microsoft 365, that explicitly label Phase 1 of its launch plan as "make people addicted," describing a "three phases from addictive app to agentic platform" strategy.
🧡 Enjoyed this issue?
🤝 Recommend our newsletter or leave a feedback.
How'd you like today's newsletter?
Cheers, Jason









