Building a Custom Spec-Driven Development Framework

You give your AI coding agent a task. It gets to work. Thirty tool calls later you have code — and it’s not what you needed. The agent understood the words but missed the intent. It made a dozen small decisions that individually seemed reasonable, and collectively built the wrong thing.

This isn’t a capability problem. It’s a specification problem.

Spec-Driven Development (SDD) is a practice-layer on top of your AI agent that enforces a specific discipline: no code before a written spec exists, and no spec before the right questions have been asked. This post walks through how to build a custom SDD framework that works for your codebase, whether you’re in a monorepo, a multirepo setup, or somewhere in between — and that you can actually maintain.

Two purpose-built SDD frameworks are worth understanding before you decide whether to adopt or build: GitHub Spec-Kit and OpenSpec. Both are real, actively maintained, and genuinely good. This post covers what makes them strong, the circumstances where a custom framework outperforms them, and how to build one when you get to that decision point. We’ll also borrow structural patterns from agent workflow systems like Superpowers and agent-skills along the way.

The Core Problem SDD Solves
The Purpose-Built SDD Frameworks: Spec-Kit and OpenSpec
Why Build Custom Instead
What a Custom SDD Framework Actually Is
Building the Framework
Adapting for Different Repository Strategies
The SKILL.md Anatomy
Trade-offs, Failure Modes, and When to Skip It
Putting It Together: Directory Layout
Practical Starting Point
More Use Cases for SDD
Conclusion

The Core Problem SDD Solves

AI agents default to the shortest path. Given “add a user settings page,” they’ll pick a directory, create components, wire up a route, and commit — all without asking whether settings should be server-side or client-side, what the auth model looks like, or how this feature relates to the existing profile page.

The result is code that works in isolation and doesn’t compose with your system. Hours of review and rework follow.

SDD inserts a mandatory gate between “here’s what I want” and “write the code.” That gate surfaces assumptions, documents decisions, and gives both you and the agent a shared definition of done.

Several open-source AI agent workflow frameworks have converged on spec-first practices as a core component — not as their sole purpose, but as an essential guard against the “agent went straight to code” failure mode. Three are worth understanding before building your own:

agent-skills (Addy Osmani) is a comprehensive AI coding agent skill pack with 23 SKILL.md files covering the full development lifecycle: Define → Plan → Build → Verify → Review → Ship. It is not an SDD framework, but it includes a spec-driven-development skill as part of the Define phase, and its structural conventions (skill anatomy, anti-rationalization tables, and verification gates) provide useful patterns for building custom skills of any kind.

mattpocock/skills is a collection of engineering and productivity skills for coding agents. Its primary focus is on solving agent failure modes (misalignment, verbosity, bad feedback loops, architectural drift), not on SDD specifically. Its grill-with-docs skill is particularly relevant here: it tackles the “shared language” problem by building a CONTEXT.md glossary so the agent and developer share a vocabulary. The composable, low-ceremony approach is worth borrowing.

superpowers is a complete software development methodology for AI coding agents, covering brainstorming, spec writing, implementation planning, subagent-driven execution, code review, and branch management. It enforces a hard-gate workflow where the agent cannot proceed to code until each phase is human-approved. Like agent-skills, it is not an SDD framework but an agent workflow system that treats spec-first behavior as one of several enforced disciplines.

The distinction matters: none of these is purpose-built to solve the specification problem alone. They’re comprehensive systems where SDD is one layer. Building a custom SDD framework means extracting that layer (the structural patterns, file conventions, and gating logic) and fitting it to your codebase without adopting the rest of the methodology wholesale.

The Purpose-Built SDD Frameworks: Spec-Kit and OpenSpec

Before building your own, you should genuinely evaluate whether an existing framework covers your needs. Two have emerged as the most adopted purpose-built SDD frameworks for AI-assisted development.

GitHub Spec-Kit

GitHub Spec-Kit (108k stars as of June 2026) is an open-source toolkit from GitHub that implements SDD as a structured, multi-phase workflow: Constitution → Specify → Clarify → Plan → Tasks → Implement. Each phase has a dedicated slash command (/speckit.constitution, /speckit.specify, /speckit.plan, etc.) and produces a versioned artifact. The specify CLI bootstraps a project and writes all agent instructions into the appropriate directories for 30+ supported AI coding agents.

The constitution step is Spec-Kit’s most distinctive feature. Before writing any spec, you establish project-level governing principles: code quality standards, testing requirements, UX consistency rules, compliance constraints. Everything that follows inherits from these principles. Governing context before feature intent is the right order.

The clarify step (/speckit.clarify) is a sequential, coverage-based questioning workflow that runs between spec creation and planning. It’s the place where underspecified requirements get resolved before they become assumptions baked into a technical plan. The structured form (not free-form conversation, but systematic coverage checking) means things don’t slip through.

The extensions and presets system is well-architected for teams that need to standardize across projects. Extensions add new capabilities (Jira integration, V-Model test traceability, post-implementation review gates). Presets override how existing templates and commands behave without changing their structure: enforce a compliance-oriented spec format, localize the workflow to another language, mandate security review gates in plans. Multiple presets stack with priority ordering.

Spec-Kit’s multi-agent support is comprehensive: 30+ integrations with detailed notes, and the same workflow runs consistently whether your team uses Claude Code, Codex, Cursor, or Gemini CLI.

It’s the right choice for greenfield projects, teams standardizing SDD practice across multiple engineers, GitHub-native workflows (native Issues integration via /speckit.taskstoissues), and organizations with compliance requirements that need auditable spec artifacts.

The honest trade-off: Spec-Kit’s phase gates are rigid by design. The workflow assumes you’ll complete each step before moving to the next, and the Python-based CLI adds a setup dependency that lightweight projects may not want. OpenSpec’s own README describes it as “thorough but heavyweight” — that’s a fair characterization. If you need the rigor, the overhead is justified. If you need to iterate freely on an existing codebase, it can feel like waterfall in new packaging.

OpenSpec

OpenSpec (52.6k stars as of June 2026) takes an explicitly different philosophy: fluid not rigid, iterative not waterfall, brownfield-first. Where Spec-Kit structures SDD around a project lifecycle, OpenSpec structures it around individual changes. Each proposed modification gets its own folder containing a proposal, specs directory, design document, and task list.

The workflow is built around three commands: /opsx:propose <change> creates the change folder and all artifacts, /opsx:apply implements the tasks, and /opsx:archive moves the completed change to the archive. This per-change artifact structure means specs don’t accumulate in a single growing document — each change is self-contained and traceable.

The delta-based approach is the right model for brownfield codebases. Most production software isn’t being built from scratch — it’s being incrementally modified. Spec-Kit and similar tools default to describing the full system state; OpenSpec describes the change. This keeps specs proportional to the actual work, and it makes the audit trail for any given change immediately obvious: you can look at openspec/changes/archive/2026-05-12-add-rate-limit/ and see exactly what was proposed, designed, and built.

The npm install path (npm install -g @fission-ai/openspec) removes the Python/uv dependency. For JavaScript and TypeScript teams especially, this eliminates a setup friction point that causes Spec-Kit adoption to stall.

The fluid iteration model lets you update any artifact at any time without a formal phase transition. If you realize mid-design that the proposal needs to change, you change it. The workflow doesn’t block you. This is a genuine trade-off against Spec-Kit’s rigor — OpenSpec trusts you to maintain artifact quality; Spec-Kit enforces it structurally.

The community schemas system (analogous to Spec-Kit’s extensions/presets) allows third parties to distribute opinionated workflow bundles that integrate OpenSpec with other tools.

OpenSpec is the right choice for teams working primarily on existing codebases, JavaScript/TypeScript projects, teams that find Spec-Kit’s phase structure too heavy for their velocity, and any situation where per-change traceability matters more than project-level spec coverage.

The honest trade-off: OpenSpec’s flexibility is also its risk. Without the structural enforcement of Spec-Kit, the quality of specs depends more on the team’s discipline. The artifact folders can accumulate messily if /opsx:archive isn’t run consistently. And the greenfield story is less developed: if you’re starting from scratch, Spec-Kit’s constitution-first approach provides more scaffolding.

Side-by-Side Comparison

	Spec-Kit	OpenSpec
Primary target	Greenfield + brownfield	Brownfield-first
Spec granularity	Project-level	Per-change
Phase gates	Rigid (required)	Fluid (discretionary)
Setup	Python/uv CLI	npm package
Customization	Extensions + presets	Community schemas
Agent support	30+	25+
Best for	Teams, compliance, standardization	Iterative development, JS/TS, existing codebases

Why Build Custom Instead

Spec-Kit and OpenSpec cover the majority of use cases well. Building a custom framework makes sense in specific circumstances where the general-purpose tools don’t fit.

Your codebase has invariants and vocabulary that no generic framework can express. Both Spec-Kit and OpenSpec support customizing templates, but their defaults are technology-agnostic. If your system has domain-specific rules — “settlement runs are always triggered by cron, never by user action,” “all external calls go through the gateway layer,” “the core package has zero dependencies on app packages” — these need to be enforced at the spec level, not left to the agent to infer. A custom framework bakes these invariants into every spec template, so an agent working on your codebase can’t propose a design that violates them without the spec explicitly flagging it.

You have unusual repo topology. Both frameworks are built around single-repo workflows. If you’re in a multirepo setup with shared cross-service contracts, a hybrid monorepo with multiple teams owning different packages, or a polyglot environment where a feature touches Python microservices and a TypeScript frontend simultaneously, you need spec templates that model cross-boundary impact. Neither Spec-Kit nor OpenSpec has a native cross-repo spec model.

Your process includes gates that aren’t software decisions. Legal review before a feature ships. Compliance sign-off before any data schema changes. Customer approval before UI changes to a specific product surface. These gates belong in the spec workflow, not appended to it afterward. A custom framework can make them first-class phases with their own verification criteria and approval artifacts.

You’re in a regulated industry. Healthcare, financial services, and government software often require audit trails that go beyond what standard spec frameworks produce: traceability matrices linking requirements to implementation tasks to test cases, evidence of review at each phase, change control records linked to ticket systems. Spec-Kit’s presets can move in this direction, but the compliance requirements in these industries are specific enough that you’ll likely end up writing most of the scaffolding yourself regardless.

Your team has strong existing conventions that would conflict with the framework’s opinions. Adopting Spec-Kit means adopting its directory structure (.specify/, specs/), its CLI, and its template format. If your team already has well-established spec and documentation conventions, the migration cost may exceed the benefit. A custom framework built on your existing structure imposes no migration.

You want to integrate SDD with an existing ticket workflow rather than replace it. Both frameworks assume they own the task-tracking layer. Spec-Kit integrates with GitHub Issues; OpenSpec manages its own artifact folders. If your team runs on Jira, Linear, or an internal ticketing system with specific fields, workflows, and states, a custom framework can generate spec artifacts that feed directly into those systems in the right format rather than requiring a translation step.

The decision heuristic: use Spec-Kit if you want structured enforcement and are willing to adopt its conventions; use OpenSpec if you’re in brownfield JavaScript/TypeScript work and want lightweight iteration; build custom if you have domain-specific invariants, unusual repo topology, non-standard process gates, or compliance requirements that the general-purpose frameworks can’t encode.

What a Custom SDD Framework Actually Is

An SDD framework is a collection of four things:

Context documents tell the agent about your project: what it does, what vocabulary to use, what patterns to follow, what decisions have already been made. These are CLAUDE.md, CONTEXT.md, ADRs, and similar files. They replace the background knowledge a human developer builds over months of working in a codebase.

Skill files (SKILL.md) define structured workflows for specific tasks — speccing, planning, implementing, reviewing. A skill is not a prompt; it’s a process. It has phases, exit criteria, verification steps, and anti-patterns to guard against.

Slash commands are entry points that trigger skills. /spec, /plan, /build, /review map to the development lifecycle. They’re thin wrappers that activate the right skill for the current phase.

Hooks are event-driven behaviors that fire automatically based on context, without requiring the developer to invoke a command. Seeing a new feature request might trigger the spec skill. Touching files in src/api/ might trigger the API design skill. Hooks reduce the discipline tax on the developer.

The design question is how opinionated to make each layer. A hard-gate agent workflow system like superpowers enforces every phase of the development cycle; a composable skill collection like mattpocock’s lets the developer choose which tools to invoke. When building just the SDD layer, you’re making the same choice for a narrower scope: how much ceremony should surround the spec phase specifically.

Building the Framework

Layer 1: Context Documents

Before you write a single skill, you need to give the agent situational awareness of your codebase. This is the highest-value, lowest-ceremony part of SDD.

The minimum viable context layer is a single CLAUDE.md (or equivalent: AGENTS.md for Codex, GEMINI.md for Gemini CLI, .cursor/rules/ for Cursor) that covers:

What the project is and what it does
The tech stack, with versions
Where things live (directory layout)
How to run, test, and build
Naming conventions and code style
What decisions have been made and why

For a non-trivial codebase, a CONTEXT.md alongside it defines your domain vocabulary — the shared language Matt Pocock describes. If your system has concepts like “materialization cascade” or “settlement run” or “canonical event,” define them here. The agent will use them consistently, and you’ll spend far fewer tokens clarifying jargon session after session.

ADRs (Architecture Decision Records) are the third piece. When you make a significant architectural decision — choosing a state management library, picking a particular caching strategy, deciding on a module boundary — write it down in docs/adr/YYYY-MM-DD-decision-name.md. The agent can consult these before proposing changes that would reverse or conflict with prior decisions.

Here’s a minimal CONTEXT.md for a monorepo serving a SaaS product:

# Project Context

## Domain Glossary

- **Workspace**: A tenant in the system. A user can belong to many workspaces.
- **Member**: A user's relationship to a workspace, with a specific role.
- **Materialization**: The process of resolving a workspace's feature flags to
  a concrete configuration set. This is expensive — never call it in a hot path.
- **Settlement run**: The nightly process that applies billing charges and
  records usage. Always idempotent; designed to be re-run safely.

## Architectural Invariants

- Settlement runs are triggered by cron, never by user action.
- All external API calls go through `src/lib/gateway/`. No direct HTTP calls
  from feature code.
- The `core/` package has zero dependencies on `app/` packages. Violations
  break the CI boundary check.

## Known Decisions (see docs/adr/ for rationale)

- We use Zod for all schema validation (ADR-003).
- React Query is the data-fetching layer; no direct fetch() in components (ADR-007).
- Soft deletes on all user-facing entities (ADR-012).

This file is checked in, versioned, and updated when invariants change. It’s not documentation for humans — it’s context for agents. Writing for a reader who will skim it in a 200k-token window changes the format significantly: be dense, not friendly.

Layer 2: The Spec Skill

The spec skill is the heart of the framework. It defines the gated workflow from vague requirement to reviewable specification.

Every spec skill should cover six areas (drawn from agent-skills’ spec-driven-development skill):

Objective: What are we building, why, and what does success look like? Express success as testable acceptance criteria, not prose intentions.

Tech stack: The specific frameworks, libraries, and versions. For multi-language monorepos, which packages are in scope.

Commands: The exact commands to build, test, lint, and run the affected code. Don’t be vague (“run tests”) — specify the exact invocation, including flags.

Project structure: Which directories are affected, where new code lives, where tests go.

Code style: One concrete code snippet beats three paragraphs of convention descriptions. Show a real example of what acceptable code looks like in this codebase.

Boundaries: A three-tier system — what the agent should always do, what it should ask before doing, and what it should never do. Superpowers calls these “constraints”; agent-skills calls them “boundaries.” The naming doesn’t matter; the explicitness does.

Here’s a minimal SKILL.md for a spec phase:

name: spec
description: Write a structured spec before coding. Use when requirements exist 
             but no written spec does. Required for any change touching >1 file.

# Spec Phase

## Hard Gate

Do NOT write code until the spec is approved. This applies to every feature,
regardless of apparent simplicity.

## Phase 1: Surface Assumptions

Before writing the spec, explicitly list what you're assuming:

    ASSUMPTIONS:
    1. This affects the workspace-level settings, not user-level
    2. The schema change is backward-compatible
    3. Tests will use Vitest (project standard)
    → Correct any of these before I proceed.

## Phase 2: Write the Spec

Use this template:

    # Spec: [Feature Name]

    ## Objective
    [What we're building and why. Acceptance criteria in "Given/When/Then" 
    or plain language — specific and testable.]

    ## Commands
    Build:  pnpm build
    Test:   pnpm test --filter=<package>
    Lint:   pnpm lint

    ## Files in Scope
    [Which packages/directories will change. What's explicitly out of scope.]

    ## Code Style
    [One real code snippet from this codebase showing the expected pattern.]

    ## Testing Strategy
    [What framework. Where tests live. What gets unit-tested vs. integration-tested.]

    ## Boundaries
    - Always: run tests before committing, validate inputs at package boundaries
    - Ask first: schema changes, new dependencies, changes to shared utilities
    - Never: bypass the gateway layer, add direct fetch() calls to components

    ## Success Criteria
    [Concrete, checkable list. Not "the feature works" — specific behaviors 
    that can be verified by running commands or manual steps.]

    ## Open Questions
    [Unresolved questions that need human input before proceeding.]

## Phase 3: Validate

Present the spec section by section. For each section, ask: "Does this match 
your intent?" Do not proceed until the human explicitly approves.

## Verification Gate

Before proceeding to planning:
- [ ] Spec covers all six areas
- [ ] Human has reviewed and approved each section
- [ ] Spec is saved to docs/specs/YYYY-MM-DD-<feature>.md
- [ ] Spec is committed to the repo

The decisions embedded in this template:

The assumption surfacing step is not optional. It’s the spec skill’s highest-value moment — assumptions are where misalignments live, and they’re cheapest to resolve before a line of code is written.

The verification gate is explicit and checkboxed. The agent cannot self-certify; it must present the spec to the human. You can make this lighter (async approval via a comment on an issue) or heavier (synchronous review meeting) depending on the stakes of the change.

The scope of the spec scales with the task. A two-line change to a utility function doesn’t need a full PRD. The trigger condition in the skill description — “any change touching >1 file” — is one reasonable threshold. Agent-skills uses “more than 30 minutes to implement.” Matt Pocock’s system generates full PRDs from conversation context without re-interviewing. Choose the threshold that fits your team’s velocity.

Layer 3: The Plan Skill

The plan skill takes a validated spec and produces a task list granular enough that an agent with no prior context could execute it correctly.

The failure mode this prevents: an agent writes a high-level plan (“set up the database schema, implement the API, add the UI”), then executes it in one pass, making dozens of micro-decisions along the way that accumulate into the wrong implementation.

A good plan has tasks that are 2–5 minutes each, every task includes exact file paths, every code step shows the actual code (not a description), and tests come before implementation.

Superpowers’ writing-plans skill articulates this well: tasks that say “add appropriate error handling” or “write tests for the above” without showing the test code are plan failures. The plan is only as good as its ability to be handed to an agent with zero context and produce the right output.

Here’s what a well-structured task looks like:

### Task 4: Add rate-limit check to workspace member invite

**Files:**
- Modify: `packages/core/src/invites/invite-service.ts`
- Modify: `packages/core/src/invites/invite-service.test.ts`

- [ ] **Step 1: Write the failing test**

  ```typescript
  it('throws RateLimitError when invite limit is exceeded', async () => {
    const service = new InviteService({ maxInvitesPerHour: 3 });
    await service.createInvite(workspaceId, email1);
    await service.createInvite(workspaceId, email2);
    await service.createInvite(workspaceId, email3);
    await expect(service.createInvite(workspaceId, email4))
      .rejects.toThrow(RateLimitError);
  });

Step 2: Run the test, confirm it failspnpm test --filter=core -- invites/invite-service Expected: FAIL — RateLimitError not implemented
Step 3: Implement the rate checkIn invite-service.ts, before creating the invite:const recentInvites = await this.store.countInvitesSince( workspaceId, subHours(new Date(), 1) ); if (recentInvites >= this.config.maxInvitesPerHour) { throw new RateLimitError('Invite rate limit exceeded'); }
Step 4: Run the test, confirm it passespnpm test --filter=core -- invites/invite-service Expected: PASS
Step 5: Commitgit add packages/core/src/invites/ git commit -m "feat(invites): enforce rate limit per workspace"


This level of specificity feels excessive until you've watched an agent write an entirely different implementation of rate limiting because the plan left it discretion.

### Layer 4: Hooks and Commands

Slash commands are the ergonomic layer. A command like `/spec` invokes your spec skill with the current conversation as context. `/plan` takes an approved spec and generates the implementation plan. `/build` executes the next unchecked task in the plan.

Commands are simple shell scripts or markdown files depending on your agent. In Claude Code, commands live in `.claude/commands/`. In Gemini CLI, they go in `.gemini/commands/`. They're thin wrappers that invoke skills.

Hooks are more interesting. They fire automatically based on the agent's current activity, without the developer having to issue a command. Agent-skills implements hooks that detect when you're working on an API and activate the `api-and-interface-design` skill; touching UI files triggers `frontend-ui-engineering`. This converts SDD from a discipline that requires developer effort into one that's largely automatic.

A simple hook that enforces spec-first behavior:

```bash
#!/bin/bash
# hooks/pre-build.sh — runs before the agent writes any code
# Checks if an approved spec exists for the current feature branch

BRANCH=$(git branch --show-current)
SPEC_FILE="docs/specs/${BRANCH}.md"

if [ ! -f "$SPEC_FILE" ]; then
  echo "ERROR: No spec found for branch $BRANCH"
  echo "Run /spec to create one before writing code."
  exit 1
fi

# Check for approval marker in spec
if ! grep -q "## Status: Approved" "$SPEC_FILE"; then
  echo "ERROR: Spec exists but has not been approved."
  echo "Review docs/specs/${BRANCH}.md and add '## Status: Approved' when ready."
  exit 1
fi

This is one enforcement strategy; there are others. Some teams use PR checks that require a linked spec. Some use commit-msg hooks that reject commits without a spec reference. The right choice depends on how much friction you’re willing to add to the fast path.

Adapting for Different Repository Strategies

The framework above describes a single-repo baseline. Monorepos, multirepos, and trunk-based development each introduce wrinkles that need explicit handling.

Monorepo

In a monorepo, the main challenge is that a single feature can span multiple packages — a schema change in packages/db, a new API endpoint in packages/api, a new component in packages/web. A spec written only from the web package’s perspective will miss the downstream contract implications.

Adapt the spec template to require explicit cross-package impact assessment:

## Cross-Package Impact

| Package       | Change Type                                   | Owner     |
| ------------- | --------------------------------------------- | --------- |
| packages/db   | Schema migration — adds `invite_limit` column | @platform |
| packages/core | New method on InviteService                   | @backend  |
| packages/api  | New route /workspaces/:id/invite-config       | @backend  |
| packages/web  | New UI on settings page                       | @frontend |

## Contract Changes

If any package exposes a public interface that changes, document it here:
- `InviteService.createInvite()` — unchanged
- `POST /workspaces/:id/invites` — adds 429 response case

The context layer also needs to reflect monorepo boundaries. Your CLAUDE.md should specify which packages can depend on which, what the build system is, and how tests are scoped per package. A misplaced import (app/ depending on core/ correctly, but core/ accidentally importing from app/) is the kind of thing an agent will do without thinking unless boundaries are explicit.

For monorepos with a pnpm or nx or turborepo setup, document the filter syntax:

## Commands (monorepo)

Run tests for one package:   pnpm test --filter=@myco/core
Run tests for all packages:  pnpm test
Build one package:           pnpm build --filter=@myco/api
Build everything:            pnpm build

## Package Dependency Rules
- packages/core: no dependencies on app packages
- packages/api: can depend on core, db, not on web
- packages/web: can depend on core, not on api internals

The plan skill in a monorepo context should group tasks by package and require that cross-package changes respect the dependency graph — implement the deepest layer first.

Multirepo

Multirepo setups have the inverse problem: there’s no single place to store the spec, and each repo has its own context documents. A feature that touches three repos needs a spec that can reference all three.

Two approaches work here. The first is a “platform spec repo” — a dedicated repository that contains only specs, context documents, and ADRs, and that every engineer references. Specs live there; implementations live in their respective repos. This is operationally simple but creates a coordination overhead (keeping the platform spec repo current requires discipline).

The second approach is a “distributed spec model” — each repo maintains its own context documents and skills, and a cross-repo feature spec is written in the repo where the change initiates, with explicit “downstream impact” sections that reference the other repos. The spec gets copied or linked to each downstream repo when that implementation phase begins.

My preference is the distributed model for teams up to ~50 engineers; the platform spec repo becomes worthwhile above that when cross-team coordination is genuinely complex. In both cases, the spec must be written before any repo’s implementation begins.

For context documents in a multirepo setup, each repo’s CLAUDE.md should include a “Cross-Repo Contracts” section:

## Cross-Repo Contracts

This repo is: @myco/notification-service

### Upstream dependencies (we consume these)
- @myco/auth-service: Provides user identity via JWT. 
  Schema: docs/contracts/auth-jwt-payload.md
- @myco/core-events: Event bus. 
  Schema: docs/contracts/events-v2.md

### Downstream dependents (they consume us)
- @myco/web: Reads notification preferences via REST.
  Contract: docs/contracts/notifications-api-v1.md
- @myco/mobile: Same contract as web (versioned together).

### Contract Change Process
Any change to our public API requires updating docs/contracts/ FIRST,
notifying #platform-contracts Slack channel, and waiting for acknowledgment
from downstream teams before deploying.

This transforms the context document from “what is this repo” into “where does this repo fit in the system” — the question an agent most needs answered when touching integration points.

Feature Branches vs. Trunk-Based Development

Trunk-based development (short-lived branches, frequent merges to main) interacts with SDD differently than feature-branch workflows.

In a trunk-based setup, the spec doesn’t live on a long-lived feature branch — it needs to be written, approved, and acted on quickly. Adapt the spec skill to produce lightweight specs for small changes (“any change in a single commit or same-day merge”) and full PRD-style specs only for larger features that will be implemented over several days. Matt Pocock’s to-prd skill takes the right approach here: rather than re-interviewing the developer, it synthesizes a PRD from conversation context and codebase exploration. This keeps the ceremony proportional to the risk.

For trunk-based teams, hooks are especially valuable. A pre-commit hook that requires a spec reference in the commit message (or a --spec-exempt flag with a mandatory justification) enforces SDD without requiring a separate review gate:

# hooks/commit-msg
COMMIT_MSG=$(cat "$1")

# Allow merge commits and fixup commits
if echo "$COMMIT_MSG" | grep -qE "^(Merge|fixup!)"; then
  exit 0
fi

# Require spec reference or exemption
if ! echo "$COMMIT_MSG" | grep -qE "(spec:|--spec-exempt:)"; then
  echo "Commit rejected: include a spec reference or exemption reason."
  echo "  Good: 'feat(invites): add rate limit [spec: docs/specs/invite-limits.md]'"
  echo "  Exempt: 'chore: update deps [--spec-exempt: dependency bump, no behavior change]'"
  exit 1
fi

Feature-flag workflows add another layer. In superpowers’ design, implementation plans begin by creating a feature flag that gates the new code, so incomplete features can be merged to main safely. Encoding this in your plan skill — “Step 0 of every feature plan: add feature flag in config/features.ts” — makes it automatic rather than something the developer has to remember.

Polyglot and Mixed-Technology Repos

If your codebase mixes languages — a Python backend, a TypeScript frontend, a Go service — the framework needs to accommodate different toolchains without creating a separate skill library per language.

The solution is skill parameterization. A core spec skill template works across languages; a “tech adapter” layer provides language-specific command templates, test framework conventions, and code style examples. Keep one spec skill, but reference the right adapter:

## Tech Adapters

This repo uses two runtimes. The spec MUST specify which runtime each task
belongs to. The following adapters define runtime-specific conventions:

- Python (backend): see docs/sdd/adapters/python.md
- TypeScript (frontend): see docs/sdd/adapters/typescript.md

Cross-runtime tasks (e.g., API contract changes) require sign-off in both 
adapter sections.

The SKILL.md Anatomy

Every skill file has the same structure regardless of which framework you draw from. Understanding this structure lets you write custom skills that feel native:

Frontmatter: name, description, trigger conditions. The description is especially important in agent systems that use it for auto-discovery: it should describe both what the skill does and when it applies.

Overview: one paragraph, what this skill does and the problem it solves.

When to use / when NOT to use: the explicit trigger conditions and exclusions. Omitting the exclusions leads to the skill being invoked for everything.

The workflow: phase-by-phase process with explicit exit criteria between phases. Don’t write prose instructions — write steps an agent can follow mechanically.

Anti-rationalization table: excuses the agent might generate to skip the skill, with documented rebuttals. This is agent-skills’ most useful structural innovation. A table that says “Skip this for simple changes” → “Simple changes don’t need long specs, but they still need acceptance criteria” actively prevents the most common circumvention.

Red flags: signs that the skill is being misapplied or that something is going wrong.

Verification gate: a checklist that must pass before the skill exits. Never let this be self-assessed; it must produce output that a human can review.

name: api-contract-review
description: Review any proposed API change for backward compatibility before 
             implementation. Use when any public API method, endpoint, or event
             schema is being added, changed, or removed.

# API Contract Review

## Overview

Public APIs have consumers that don't deploy in lockstep with the API itself.
This skill ensures that any API change is backward-compatible or, if it's 
breaking, is planned as a versioned migration rather than a surprise.

## When to Use
- Adding or removing fields from a response schema
- Changing the type or validation of an existing field
- Adding a new required parameter to an existing endpoint
- Removing an endpoint

## When NOT to Use
- Adding purely optional fields to a response (backward-compatible)
- Internal implementation changes with no public interface change
- New endpoints that don't modify existing contracts

## Process

### Phase 1: Identify consumers

Before evaluating the change, list every known consumer:
- Internal services (check cross-repo contracts docs)
- External integrations (check webhook documentation)
- Versioned SDKs or clients

If the consumer list is incomplete, stop and research before proceeding.

### Phase 2: Assess compatibility

For each field/endpoint being changed:

| Change             | Backward Compatible? | Migration Required?      |
| ------------------ | -------------------- | ------------------------ |
| Add optional field | Yes                  | No                       |
| Add required field | No                   | Yes — version bump       |
| Remove field       | No                   | Yes — deprecation period |
| Change field type  | No                   | Yes — version bump       |
| Rename field       | No                   | Yes — alias period       |

### Phase 3: Plan the migration (if needed)

Breaking changes require a migration plan:
1. Add new version alongside old (no removal yet)
2. Deprecation notice in changelog and docs
3. Consumer acknowledgment (see cross-repo contract process)
4. Removal only after agreed deprecation period

## Anti-Rationalization Table

| Excuse                            | Response                                                                   |
| --------------------------------- | -------------------------------------------------------------------------- |
| "The consumer should just update" | Consumers may not deploy when we do. Breaking changes break them silently. |
| "It's an internal API"            | Internal doesn't mean no consumers. Check first.                           |
| "The field was barely used"       | One consumer relying on it is enough to cause an incident.                 |

## Verification Gate

- [ ] Consumer list is complete
- [ ] Each changed field has a compatibility assessment
- [ ] Breaking changes have a migration plan with timeline
- [ ] Migration plan has been shared with consuming team leads

Trade-offs, Failure Modes, and When to Skip It

SDD is not free. It has real costs and real failure modes. A team that adopts it without understanding both will end up with ritual documents that nobody reads.

The ceremony tax is real. Full SDD with gated phases adds 20–40 minutes of overhead to any feature. For a one-person project or a startup in early exploration mode, this is often too much. The right response isn’t to skip specs; it’s to scale them. A two-sentence spec and a three-item task list take five minutes and still prevent most of the “it built the wrong thing” problems.

Specs go stale fast. A spec written on day one of implementation is partially wrong by day five. If you treat specs as living documents — updated when decisions change, committed alongside code — this is manageable. If you treat them as approval artifacts and never look at them again, they become noise. The discipline required is the same as keeping tests up to date, and it fails for the same reasons.

Agents can satisfy the form without the substance. An agent instructed to “write a spec before coding” will write a spec. Whether that spec reflects genuine understanding of the problem or is just plausible-looking boilerplate depends on whether the spec template forces concrete, testable claims. “The user can manage their account settings” is a spec sentence that tells nobody anything. “When a user submits the settings form, all changed fields are persisted within 200ms and a success toast is shown” is a spec sentence.

SDD doesn’t replace code review. Some developers adopt SDD expecting it to catch everything. It catches requirement misalignment and architectural disagreements early — which is where they’re cheapest to fix. It doesn’t catch implementation bugs, performance regressions, or security vulnerabilities that emerge during execution. Those require different tools.

When to skip entirely:

Single-file changes with unambiguous requirements (fix this typo, update this version number)
Pure refactors with no behavior change, where the test suite is the spec
Exploratory prototypes that are explicitly throwaway — the point is to learn, not to build correctly

When SDD is especially valuable:

Any change that crosses a service, package, or team boundary
Features with ambiguous or contested requirements
Changes in critical paths (auth, billing, data migration)
Any time you’ve had an “it built the wrong thing” incident in the last sprint

Putting It Together: Directory Layout

Here’s a directory structure for a monorepo with a complete SDD framework:

/
├── CLAUDE.md                          # Agent context (root level)
├── CONTEXT.md                         # Domain vocabulary and glossary
├── docs/
│   ├── adr/                           # Architecture Decision Records
│   │   ├── 0001-use-zod-validation.md
│   │   └── 0002-event-sourcing-billing.md
│   ├── sdd/
│   │   ├── adapters/                  # Language/framework-specific conventions
│   │   │   ├── typescript.md
│   │   │   └── python.md
│   │   └── templates/                 # Spec and plan templates
│   │       ├── spec-template.md
│   │       └── plan-template.md
│   ├── specs/                         # Approved specs (committed, versioned)
│   └── contracts/                     # Cross-repo API contracts
│       ├── auth-jwt-payload.md
│       └── events-v2.md
├── .claude/
│   └── commands/                      # Slash commands (Claude Code)
│       ├── spec.md
│       ├── plan.md
│       ├── build.md
│       └── review.md
├── skills/                            # SKILL.md files
│   ├── spec/SKILL.md
│   ├── plan/SKILL.md
│   ├── build/SKILL.md
│   ├── api-contract-review/SKILL.md
│   └── cross-package-review/SKILL.md
├── hooks/                             # Agent lifecycle hooks
│   ├── pre-build.sh
│   └── commit-msg
└── packages/
    ├── core/
    │   └── CLAUDE.md                  # Package-level context
    ├── api/
    │   └── CLAUDE.md
    └── web/
        └── CLAUDE.md

Package-level CLAUDE.md files inherit from the root but add package-specific detail — which tests to run, which patterns to follow, what this package owns and what it doesn’t. An agent working in packages/api reads both the root context and the package-level context.

Practical Starting Point

If you’re starting from scratch, don’t try to build the full framework at once. Build it incrementally, in order of value:

Start with a single CLAUDE.md at the root of your repo. Write what you wish an onboarding engineer knew on day one. Commit it. Use it for one week and note where the agent still misses context. Update it.

Next, write a minimal spec skill and a slash command that invokes it. Use it for every feature-sized change for two weeks. Note where the spec template asks for information you never need, and where it fails to ask for information you always end up clarifying manually. Revise.

After that, add a plan skill and a docs/specs/ directory. Establish the habit of committing specs alongside code.

Finally, add hooks to automate the parts that require the most discipline to remember. Hooks should enforce what the team has already agreed to, not introduce new requirements.

The framework that’s maintained and actually used is more valuable than the perfect framework that accumulates dust.

More Use Cases for SDD

The obvious use case for SDD is new feature development — writing a spec before building something. But the same discipline applies well beyond that.

Legacy codebase modernization

When migrating or refactoring a legacy system incrementally, SDD’s delta model fits naturally. A spec for “migrate billing service from legacy Rails to Go” doesn’t describe the destination system from scratch — it describes what’s changing, what must remain compatible, what the rollback plan looks like, and how traffic will be split during the transition. OpenSpec’s per-change artifact structure is particularly well suited here. The alternative (“the agent has context about both the old and new system and will figure it out”) has a poor track record in production migrations.

Cross-team API negotiation

When a backend team is building an endpoint that a frontend team will consume, both teams need to agree on the contract before either writes a line of code. An SDD spec for the backend feature that includes an explicit “API Contract” section (request/response shapes, error codes, versioning intent) serves as the artifact both teams review and approve. The backend builds to the spec; the frontend mocks against it. Misalignments surface in spec review, not in integration testing the day before launch.

Open source contribution workflows

For maintainers of actively developed open source projects, requiring a spec before a PR reduces the volume of well-intentioned contributions that implement the wrong thing. A lightweight spec template in the contributing guide (“what problem does this solve, what’s the proposed solution, what are the trade-offs, what alternatives did you consider”) gives maintainers enough signal to review intent before reviewing implementation. It also makes the contributor think harder before coding, which improves both the code and the review conversation.

Consulting and contract work

A spec serves as a client approval mechanism before implementation begins. A consultant who delivers a spec and gets explicit sign-off before writing code has documented evidence that the client understood and agreed to the scope. This protects both sides: the client gets to correct misunderstandings before they become expensive rework; the consultant avoids the “this isn’t what we asked for” conversation after delivery. The spec also serves as the basis for change orders when scope inevitably evolves.

Post-incident prevention

When a team has a production incident caused by a feature that behaved unexpectedly, the postmortem often reveals that different engineers held different mental models of what the feature was supposed to do. SDD prevents this by making the mental model explicit before implementation. After an incident, adding a mandatory spec review for the affected system area (specifically requiring that the spec name the failure mode that occurred and document how the new implementation prevents it) turns postmortems into systematic prevention rather than one-off fixes.

Technical debt remediation

Refactoring is often treated as an implementation concern, but the riskiest refactors fail because the spec was never clear to begin with. An agent tasked with “clean up this module” without a spec will make refactoring decisions based on local code patterns, not on the system’s intended behavior. A spec that describes what the module is supposed to do (what it owns, what it must not do, what contracts it upholds) gives the agent a target to refactor toward rather than just cleaning up what it can see.

Junior developer onboarding

New engineers don’t know what questions to ask before they start coding. An SDD spec template embedded in your contributing guide teaches them: here are the things you should know before you write code on this codebase — the commands, the affected packages, the invariants, the success criteria. Filling out the template is the onboarding exercise. Teams that have done this report that junior engineers produce significantly fewer “build the wrong thing” PRs in their first quarter.

Staged rollouts and feature flags

Any feature released incrementally (dark launch, canary rollout, A/B test) should have a spec that covers the full lifecycle, not just the implementation. The spec should describe the flag name and location, the rollout stages and their percentages, the metrics that determine when to advance to the next stage, the rollback criteria, and who has the authority to flip the flag. This transforms feature flags from implementation details into explicit policy decisions that the whole team can reason about.

Compliance and regulated workloads

In healthcare (HIPAA), financial services (SOX, PCI-DSS), and government (FedRAMP), specific changes require documented evidence that requirements were reviewed, implementation matched intent, and testing verified the behavior. SDD produces these artifacts naturally: a spec shows the requirement was explicit, a plan shows the implementation was deliberate, a verification gate shows the tests passed. With a custom framework, you can shape these artifacts to match the exact format your compliance team needs, reducing the “translate engineering docs into compliance evidence” burden.

Conclusion

AI agents are good at executing plans. What they can’t do is figure out the right plan from a vague brief. Spec-Kit and OpenSpec both handle this well: Spec-Kit with structured phase enforcement for greenfield and standardization use cases, OpenSpec with a lighter delta model for iterative brownfield development.

Building a custom framework is worth the overhead when your codebase has specific invariants and vocabulary, your process includes non-software gates, your repo topology doesn’t fit single-repo assumptions, or you’re in a regulated industry where the artifact format matters as much as the artifact content. In those cases, the discipline overhead of adapting a general framework often exceeds the cost of building a narrow, purpose-fit one.

The decision to build custom is wrong if you want something working today. Start with Spec-Kit or OpenSpec, get reps with the discipline, and build custom only once you’ve outgrown what they offer.

Either way, the most important step is the same: write the spec before you write the code.

All examples in this post were written for illustrative purposes. Test them against your actual toolchain before treating them as production configuration. Hooks in particular vary significantly by OS, shell, and agent version.

“Without explicit specifications, LLM agents optimize for fluency, not correctness. With them, they optimize for alignment.”-Rushi

Rushi's

Ctrl+AI+Ship