Spec-First, Spec-Anchored, Spec-as-Truth: The Three Levels of Spec-Driven Development
A practitioner’s guide to knowing which one you actually need.
“Spec-driven” is not one practice. It is a ladder—spec-first (write then ship, spec often rots), spec-anchored (spec and code stay checked in CI), and spec-as-truth / spec-as-source (humans edit only the spec; code is generated). This piece maps the three levels (after Birgitta Böckeler on martinfowler.com), their tradeoffs, a cheat sheet, and a short heuristic for choosing. Match the rung to the problem; do not pay for discipline you will not enforce.
Table of contents
- Why this distinction matters
- The shared premise
- Level 1: Spec-First
- Level 2: Spec-Anchored
- Level 3: Spec-as-Truth (Spec-as-Source)
- A side-by-side cheat sheet
- How to choose (a practical heuristic)
- Common pitfalls
- Where the industry is heading
Why this distinction matters
Spec-Driven Development (SDD) is getting renewed attention for a simple reason: AI agents exposed how expensive ambiguous intent is. A junior who misreads a ticket asks a question; an agent may ship 800 lines of plausible code that quietly violates assumptions nobody wrote down.
The industry has pushed intent—the spec—upstream of the code. But “spec-driven” is a spectrum, not one practice. Contract-style specs (OpenAPI, Protobuf, AsyncAPI, JSON Schema) have long been spec-first; that pattern has spread to full systems. On martinfowler.com, Birgitta Böckeler (Thoughtworks) names three levels of rigor that many teams hit by accident, without deciding explicitly:
- Spec-first — write the spec, then build.
- Spec-anchored — keep the spec alive and enforced as the system evolves.
- Spec-as-truth (Böckeler: spec-as-source) — the spec is the artifact; code is regenerated output.
These are not competing philosophies. They are increasing levels of investment, each solving a failure mode of the one before it. If you don’t know which one you’re practicing, you’re almost certainly getting the cost of the higher level and the benefit of the lower one.
The shared premise
Before the differences, the thing all three agree on:
The specification — not the code — is the primary expression of intent.
Code is a projection of intent into an executable form. Docs are a projection into a human-readable form. Tests are a projection into a verifiable form. In a code-first world, all three drift apart because each is maintained separately. SDD says: pick one artifact, make it authoritative, and derive the rest.
What changes across the three levels is how authoritative, how enforced, and how long-lived the spec is.
Level 1: Spec-First
Definition. The spec is written before the code, used to align humans and agents during the build, and then… often quietly forgotten.
Workflow.
write spec → generate/build code → ship → maintain code
↑
spec drifts here
What it looks like in practice. A developer writes a one-page spec (inputs, outputs, constraints, edge cases, errors) in markdown, hands it to a tool or teammate, and ships code from it. The file lands in /specs or a PR; after that, people edit the code, not the spec.
Tools such as GitHub Spec Kit and Amazon Kiro are largely spec-first: Spec Kit often uses a branch per spec, so the spec’s lifetime matches the change request, not the feature.
What it buys you.
- Fewer wasted AI-generation cycles — the agent stops improvising on decisions that should be yours.
- A forcing function for thinking. Most of the value of a spec is extracted during the act of writing it.
- Parallelism. Consumers can build against the spec before the provider has shipped.
What it doesn’t buy you.
- Long-term alignment. Once the code exists, the spec is a historical document. Six months later it describes a system that no longer exists.
- Enforcement. Nothing fails a build when the code diverges from the spec. Enforcement is social and manual.
When it’s the right call.
- Greenfield features with clear requirements.
- One-off scripts or prototypes.
- Teams taking their first step into SDD. Honestly, this is where you should start.
Analogy. Spec-first is like reading the recipe once before cooking. Useful, but nobody’s checking the pot against the recipe at step 12.
Level 2: Spec-Anchored
Definition. The spec is a living artifact that evolves with the system and is mechanically enforced against the code. Drift is detected automatically, not by a human remembering to look.
Workflow.
spec ⇄ code
↕ ↕
CI gates, contract tests, regeneration pipelines
The spec and the code coexist as peers. When either changes, the other is checked against it, and the build fails if they disagree.
What it looks like in practice. A shared, governed spec (not owned by a single team). On every PR, CI runs contract tests (e.g. Pact, Specmatic), schema or API validators, BDD or consumer-driven checks, and where the stack allows, static type checks (e.g. Sorbet on Ruby). Failing checks block merges. The spec is updated before the behavior change, then the implementation is verified against it.
Real examples:
- Netflix federated GraphQL treats the shared schema as the contract across Domain Graph Services.
- Shopify uses Sorbet (Ruby static types) so types at boundaries and on methods are checked in CI on a large monolith.
- A financial-services OpenAPI SDD case study reported a large drop in integration cycle time (one figure cited is ~75%) after anchoring specs in CI; your mileage will vary.
What it buys you.
- Drift becomes impossible to ignore. The build breaks.
- The spec stays trustworthy. Newcomers and AI agents can rely on it as documentation.
- Parallel development gets safer — consumers aren’t building against a mirage.
What it costs you.
- Organizational friction. Provider teams have to give up sole ownership of the contract.
- Tooling investment. Someone has to wire up contract tests, own the central repo, maintain the mocks.
- Discipline. The spec has to be updated first, not last.
When it’s the right call.
- Production systems with multiple teams depending on the same interface.
- Microservice architectures where integration mismatches are expensive.
- Anywhere “works on my machine” has ever cost you a weekend.
This is the sweet spot for many production systems: most of the benefit of spec-as-truth with much less tooling.
Analogy. Spec-anchored is like a building with inspectors. The blueprint and the construction are checked against each other at each milestone. You can still do the work by hand, but you can’t quietly add a wall that isn’t on the plan.
Level 3: Spec-as-Truth (Spec-as-Source)
Definition. The spec is the only thing humans edit. Code is a generated, disposable output. Any behavioral change means changing the spec and regenerating — never editing code directly.
Workflow.
spec (human-edited)
↓
[generation pipeline / coding agent]
↓
code (machine-produced)
Generated files typically carry a header like // GENERATED FROM SPEC — DO NOT EDIT. If someone edits the code directly, the next regeneration wipes their change. That’s not a bug; that’s the whole point. It forces all intent through the spec.
What it looks like in practice.
- Protocol Buffers / gRPC. You edit the
.protofile. You never edit the generated stubs. - Uber has described spec-as-source for a design system across multiple stacks (e.g. iOS, Android, web, server-driven UI). One spec, many generated clients.
- Tessl and similar frameworks push this pattern into general application code with LLM-driven regeneration.
- Embedded / safety-critical systems have done this for decades via model-driven engineering.
The rebuild test. A good litmus test for whether you’re actually at this level: delete src/, hand an agent the spec, and regenerate. Does the result match? If not, you have implicit decisions living in developer memory that aren’t in the spec. The gaps that surface are rarely missing endpoints — they’re things like “why did we use this caching strategy?” or “why does cancellation return 200 on duplicate requests?” Those rationales need to move from people’s heads into the spec, or your system isn’t actually rebuildable.
What it buys you.
- Zero drift. By construction.
- Multi-target generation. One spec, many implementations.
- True source-of-truth ergonomics. When something changes, there is exactly one place to change it.
- A codebase you can rebuild from documentation alone.
What it costs you.
- Spec expressiveness becomes the bottleneck. If the spec language can’t describe it, you can’t build it — or you smuggle it in via escape hatches that break the pure model.
- Debugging crosses an abstraction boundary. The thing running is not the thing you edited.
- With LLM-based generation, non-determinism enters. The same spec may not produce byte-identical code twice, which complicates diffs, reviews, and auditability.
- It’s high-ceremony. Bad fit for exploratory work, research code, or anything where the shape of the solution is still being discovered.
When it’s the right call.
- Multi-target SDKs (you need the same logic in five languages).
- Compliance-heavy or safety-critical domains.
- Stable, well-understood problem spaces with slow-changing requirements.
- API stubs, schemas, wire protocols — the places it already dominates.
Analogy. Spec-as-truth is like a 3D printer and a CAD file. You don’t sand the printed part into shape. You fix the CAD file and print again.
A side-by-side cheat sheet
| Dimension | Spec-First | Spec-Anchored | Spec-as-Truth |
|---|---|---|---|
| Primary artifact humans edit | Code | Both spec and code | Spec only |
| Spec lifetime | Until code ships | Lifetime of the system | Lifetime of the system |
| Drift risk | High | Low (enforced by CI) | None (by construction) |
| Enforcement | Social | Automated gates, contract tests | Regeneration |
| Upfront investment | Low | Medium | High |
| Organizational friction | Low | Medium | High |
| Best for | Features, prototypes, onboarding to SDD | Production systems, multi-team APIs | Multi-target SDKs, protocols, safety-critical code |
| Failure mode if neglected | Drift | Brittle CI, ignored tests | Escape hatches into hand-edited code |
How to choose (a practical heuristic)
Ask three questions, in order.
1. Will more than one team or agent need to agree on this interface? If no, spec-first is probably enough. Write the spec, use it during the build, move on.
2. Will this system outlive the team that built it? If yes, you need spec-anchored. The spec must be mechanically enforced, or it will rot — and code-only documentation rots faster than most teams expect.
3. Do you need the same logic in multiple languages / runtimes, or does compliance require a rebuildable system? If yes, spec-as-truth earns its keep. If no, it’s overkill, and the ceremony will slow you down more than the drift would.
A useful pattern: start at spec-first, upgrade specific contracts to spec-anchored when drift starts hurting, and only pull the spec-as-truth lever for the narrow slice where multi-target or compliance demands it. Most mature teams end up with all three coexisting at different layers of the system — spec-first for internal modules, spec-anchored at service boundaries, spec-as-truth for SDKs and wire protocols.
Common pitfalls
Claiming spec-anchored without CI gates. If the build still passes when the spec and code disagree, you are spec-first with extra steps. Enforcement is the point.
Treating the spec as a design doc. A spec says what the system does. A design doc says why. They are not the same artifact. Specs that conflate the two are too long to maintain and too vague to generate from.
Editing generated code “just this once.” In a spec-as-truth workflow, this is the beginning of the end. Either the spec language was inadequate (fix that) or the discipline broke (fix that). Both are fixable. “Just this once” is not.
Over-specifying too early. Exhaustive specs for exploratory code waste the one commodity exploratory code needs: speed of iteration. Spec-first is a feature here, not a limitation.
Assuming AI agents want the same spec humans want. Humans tolerate ambiguity and infer context. Agents need explicit preconditions, error cases, and examples. A spec optimized for human onboarding may still under-specify for an agent. Write for both audiences, or pick one and commit.
Where the industry is heading
Specs are becoming shared contracts for several agents in a pipeline (plan, implement, review) and for people. That only works if the spec is trustworthy, so teams will often graduate from spec-first to spec-anchored on the boundaries that matter.
Spec-as-truth is still the hard edge for general application code: it demands a spec language and generators good enough to replace hand edits. For APIs, schemas, and protocols, the pattern is old news. For whole apps, LLM-assisted regeneration is making spec-as-truth more plausible, with non-determinism as a real operational concern.
The useful split is not “most advanced wins” but fit: spec-first where exploration speed matters, spec-anchored where drift is expensive, spec-as-truth where one definition must drive many targets or audits—and clarity about which mode you are in.