A plain-English reference guide covering the jargon that shows up every time a new language model drops, from parameter counts to quantization methods. Contents 01 · Architecture & Model Design — Transformer · Dense Model · Mixture of Experts · Active Parameters · Feed-Forward Network · Layers · Hidden Dimension · Attention Heads 02 · Attention Mechanisms — Multi-Head Attention · Multi-Query Attention · Grouped-Query Attention · KV Cache · Sliding Window Attention · RoPE · RoPE Theta 03 · Sizing, Scale & Counting — Parameters · Embedding Parameters · Non-Embedding […]

Read More →

A format designed for bloggers in 2004 now sits at the center of how AI systems read, write, and think. Table of contents How we got here If you work with LLMs at all, you’ve probably noticed something: Markdown is everywhere. Ask Claude a question, you get Markdown back. Ask GPT-4, same thing. Feed a […]

Read More →

LLMs are stateless. Agents aren’t. Here’s what sits in between. Table of contents Introduction LLMs are stateless by design. Each API call is independent — the model has no mechanism to remember what happened in a previous request. But somehow, the agents built on top of these models maintain context across long conversations, recall user […]

Read More →