Google DeepMind released Gemma 4 on April 2, 2026 under Apache 2.0. It’s their fourth-generation open model family, and it runs locally with surprisingly little friction. Here are three ways to get it going, depending on what hardware you have in front of you. Table of contents Option 1: On your phone No account, no […]

Read More →

Most people blame Claude for strict limits. The blame is justified to an extent. Until Anthropic eases its usage limits, users are better off optimizing token usage. All you need to do is use tokens wisely, but not everyone knows how to do that and ends up losing a lot of tokens and money as […]

Read More →

A format designed for bloggers in 2004 now sits at the center of how AI systems read, write, and think. Table of contents How we got here If you work with LLMs at all, you’ve probably noticed something: Markdown is everywhere. Ask Claude a question, you get Markdown back. Ask GPT-4, same thing. Feed a […]

Read More →

Large Language Models (LLMs) all predict text, but they differ a lot in how they follow instructions, use context, handle tools, and optimize for safety, speed, or cost. If you treat them as interchangeable, you’ll ship brittle prompts. If you treat them as different runtimes with different affordances, you’ll get reliable results. This post explains the major differences across […]

Read More →

If you’ve been anywhere near the AI development world lately, you’ve probably heard about MCP — the Model Context Protocol. And your first reaction was probably: “Isn’t this just… an API?” Fair question. Both let systems talk to each other. Both move data around. But MCP and APIs solve fundamentally different problems, and once you see […]

Read More →