MMLU - Rushi's

Apr 05

2026

A plain-English reference guide covering the jargon that shows up every time a new language model drops, from parameter counts to quantization methods. Contents 01 · Architecture & Model Design — Transformer · Dense Model · Mixture of Experts · Active Parameters · Feed-Forward Network · Layers · Hidden Dimension · Attention Heads 02 · Attention Mechanisms — Multi-Head Attention · Multi-Query Attention · Grouped-Query Attention · KV Cache · Sliding Window Attention · RoPE · RoPE Theta 03 · Sizing, Scale & Counting — Parameters · Embedding Parameters · Non-Embedding […]

Rushi's

Ctrl+AI+Ship

Tag: MMLU

The LLM Vocabulary Sheet

Stop Yelling at the Chatbot: An Engineer’s Guide to Mastering Model Personalities