The ecosystem of “AI skills” — modular instruction packs that extend an LLM with task-specific know-how, whether they’re called Skills, plugins, agents, MCP servers, system-prompt templates, or tool bundles — has exploded fast enough that “which one should I use?” has become the dominant question. The answer is rarely obvious from a README, and almost […]

Read More →

A plain-English reference guide covering the jargon that shows up every time a new language model drops, from parameter counts to quantization methods. Contents 01 · Architecture & Model Design — Transformer · Dense Model · Mixture of Experts · Active Parameters · Feed-Forward Network · Layers · Hidden Dimension · Attention Heads 02 · Attention Mechanisms — Multi-Head Attention · Multi-Query Attention · Grouped-Query Attention · KV Cache · Sliding Window Attention · RoPE · RoPE Theta 03 · Sizing, Scale & Counting — Parameters · Embedding Parameters · Non-Embedding […]

Read More →

Why the biggest problem with MCP isn’t the protocol — it’s the context window tax. And how Code Mode solves it. Table of Contents What Is MCP? A Quick Refresher Model Context Protocol (MCP) is an open standard that lets AI agents connect to external tools and services. Think of it as a universal adapter: […]

Read More →

LLMs are stateless. Agents aren’t. Here’s what sits in between. Table of contents Introduction LLMs are stateless by design. Each API call is independent — the model has no mechanism to remember what happened in a previous request. But somehow, the agents built on top of these models maintain context across long conversations, recall user […]

Read More →

Large Language Models (LLMs) all predict text, but they differ a lot in how they follow instructions, use context, handle tools, and optimize for safety, speed, or cost. If you treat them as interchangeable, you’ll ship brittle prompts. If you treat them as different runtimes with different affordances, you’ll get reliable results. This post explains the major differences across […]

Read More →