Spec-Driven Development (SDD): A Technical Deep Dive into the Methodologies Reshaping AI-Assisted Engineering
Vibe coding broke as fast as it shipped. Spec-Driven Development is the industry’s course correction — putting structured specs, not chat prompts, at the center of AI-assisted engineering. This post compares the leading SDD frameworks, their trade-offs, and when (or whether) to adopt one.
Table of Contents
- Introduction: From Vibe Coding to Verified Intent
- The SDD Maturity Spectrum
- The Core SDD Lifecycle
- Framework Deep Dives
- Comparative Analysis
- The Case Against SDD
- The Case For SDD
- Practical Recommendations
- Looking Ahead
Introduction: From Vibe Coding to Verified Intent
In early 2025, Andrej Karpathy coined the term “vibe coding” — the practice of casually prompting AI models to generate code with minimal structure or planning. Within months, the approach became ubiquitous. By mid-2025, 25% of Y Combinator’s Winter 2025 cohort reportedly had codebases that were 95% AI-generated.
The results were impressive — and deeply fragile. Projects that spun up in hours collapsed under maintenance. AI agents hallucinated APIs, mixed library versions, introduced unintended side effects, and confidently declared tasks “complete” when they were anything but. The core problem was not one of AI capability, but of specification: developers struggled to articulate precise intent, and AI had no persistent memory of architectural decisions or constraints.
Spec-Driven Development (SDD) emerged as the industry’s response: a paradigm that places structured, often machine-readable specifications at the center of the development lifecycle, treating the spec, not the code, as the primary artifact. Code becomes the “last-mile” output, generated from and validated against a rigorously defined specification that serves as the single source of truth for both human developers and AI agents.
This post examines the leading SDD frameworks — GitHub Spec-Kit, OpenSpec, BMAD Method, Amazon Kiro, Tessl, and the deliberate choice of using no framework at all — along with the arguments for and against the entire paradigm.
The SDD Maturity Spectrum
Before comparing tools, it helps to understand that SDD is not monolithic. The community has broadly identified three maturity levels:
| Maturity Level | Description | Spec Lifecycle | Who Targets This |
|---|---|---|---|
| Level 1: Spec-First | A spec is written for a specific task and consumed by AI during development. It may be discarded afterward. | Ephemeral — created, used, forgotten | Manual SDD workflows, lightweight tools |
| Level 2: Spec-Anchored | The spec is a living document maintained throughout a feature’s lifecycle. Changes start with the spec; code is regenerated accordingly. | Persistent — updated with each change | Spec-Kit, OpenSpec, Kiro, BMAD |
| Level 3: Spec-as-Source | The spec is the only artifact humans edit. Code is a transient, compiled output — never directly modified by hand. | Permanent — code is disposable | Tessl Framework (aspirational), GitHub’s long-term vision |
Most current tools operate at Level 2 and aspire toward Level 3. The distinction matters because it sets expectations: a Level 1 tool brings discipline to prototyping, while a Level 3 tool proposes a fundamental redefinition of what software maintenance means.
The Core SDD Lifecycle
Despite differences in philosophy and tooling, virtually all SDD implementations follow a four-phase lifecycle:
- Specify — Define the what and why. User journeys, business requirements, success criteria, and acceptance criteria are written in structured natural language (typically Markdown).
- Plan — Define the how. Technical architecture, stack selection, dependency analysis, and design decisions are documented.
- Task — Decompose the plan into discrete, atomic implementation units, ordered by dependency.
- Implement — AI agents execute the tasks, guided and constrained by the spec artifacts. Humans review, validate, and iterate.
The critical distinction from traditional waterfall is that these phases are meant to be iterative within a change scope — you specify, plan, and implement a single feature or bugfix, not the entire system upfront.
Framework Deep Dives
1. GitHub Spec-Kit
Philosophy: Tool-agnostic, spec-anchored development with a constitution-first approach.
Spec-Kit is GitHub’s open-source CLI for structured AI-assisted development, released in September 2025. It provides templates, prompts, and a bash-based workflow engine that works across multiple coding assistants including GitHub Copilot, Claude Code, Gemini CLI, and Cursor.
Workflow: Constitution → 𝄆 Specify → Plan → Tasks 𝄇
The “constitution” is Spec-Kit’s distinguishing concept — a set of immutable project principles (coding standards, architectural constraints, security policies) that govern all subsequent development. Every specification, plan, and task is generated in the context of this constitution. The inner loop of specify → plan → tasks repeats for each feature.
Strengths:
- Agent-agnostic: templates are optimized for different AI assistants, ensuring workflow consistency regardless of tool.
- The constitution concept provides persistent architectural governance that survives across many change cycles.
- Each phase requires explicit human approval before proceeding, preventing runaway automation.
- Generates rich artifacts: OpenAPI specs for HTTP APIs, structured user stories, and acceptance criteria.
Limitations:
- Heavyweight for small changes. A single date-display feature reportedly produced 8 files and 1,300 lines of specification text.
- Python 3.11+ dependency (via
uvx) adds friction for teams not in the Python ecosystem. - Still experimental (version 0.0.30+ as of late 2025) with rapidly changing features and documentation that can lag behind.
- Greenfield bias — retrofitting into existing codebases is difficult.
- Once initialized with a specific AI tool, switching is non-trivial.
2. OpenSpec
Philosophy: Lightweight, iterative, brownfield-friendly spec management.
OpenSpec, created by Fission AI, is positioned as the minimalist alternative in the SDD space. It emphasizes fluidity over rigid phase gates, and is specifically designed to handle existing codebase evolution (the “1→n” problem), not just greenfield projects.
Workflow: Propose → Apply → Archive
OpenSpec’s key architectural innovation is its separation of the Source of Truth (what is currently implemented) from proposed changes (deltas). Each feature request or bugfix becomes an independent subfolder under changes/, containing a proposal.md, specs/, design.md, and tasks.md. Only after a change is implemented and accepted does the delta merge back into the main specs/ directory.
Strengths:
- No rigid phase gates — artifacts can be updated at any point in the workflow.
- Explicit brownfield support through the “explore” workflow for incrementally applying SDD to existing projects.
- Broad tool compatibility: supports 20+ AI assistants via slash commands.
- No Python dependency — Node.js 20.19+ only, with npm/pnpm/yarn/bun support.
- The
AGENTS.mdfile (a “README for robots”) gives any AI tool — even those without native OpenSpec support — the ability to follow specs by reading this file. - Delta-based change management (inspired by Git branches) prevents spec pollution during iteration.
Limitations:
- Lighter-weight means fewer guardrails — teams must self-enforce discipline.
- Less opinionated about spec structure, which can lead to inconsistency across teams.
- The single unified specification document can become unwieldy for very large systems.
- Still maturing: the expanded workflow commands (
/opsx:verify,/opsx:sync, etc.) are newer and less battle-tested.
3. BMAD Method
Philosophy: Agentic Agile — simulating a complete human development team with specialized AI personas.
BMAD (Build More, Architect Dreams) is the most comprehensive framework in this space. Rather than providing a specification workflow, BMAD provides an entire multi-agent system (MAS) that simulates a full agile team: Business Analyst, Product Manager, System Architect, Scrum Master, Developer, UX Designer, QA Engineer, and more — up to 21 specialized agents with over 50 guided workflows.
Workflow: Analyst → PM (PRD) → Architect → Scrum Master (Stories) → Developer → QA
Each agent is defined by its own Markdown persona file governing behavior, expertise, and interaction style. Artifacts flow downstream: the Analyst produces a product brief, the PM transforms it into a PRD, the Architect creates a technical design, the Scrum Master shards the design into user stories with embedded context, and the Developer implements story by story.
Strengths:
- Scale-adaptive intelligence: automatically adjusts planning depth from bug fixes to enterprise systems.
- The most complete end-to-end workflow, from ideation through QA.
- 100% free, open source, with no paywalls or gated content.
- Cross-domain applicability: also works for creative writing, business strategy, and other non-software contexts.
- Active community (19.1k+ GitHub stars) and extensive documentation.
- Works with Claude Code (recommended), Cursor, VS Code, and other assistants.
- Modular architecture: extends with official modules for specialized domains.
Limitations:
- Highest learning curve and initial setup investment of any SDD tool.
- The multi-agent approach adds ceremony that may be excessive for small teams or simple features.
- Agent role-switching within a single AI session can confuse context-limited models.
- The v4.x → v6-alpha transition represents a full rewrite, introducing potential instability.
- One critic noted that it largely follows the classic product development flow (Vision → PRD → Architecture → Stories → Implementation), dressing up traditional waterfall in agent terminology.
- Some practitioners report that the framework’s strength lies more in organizing AI activity than in improving the quality of specifications themselves.
4. Amazon Kiro
Philosophy: IDE-native spec-driven development with event-driven automation.
Kiro is Amazon’s VS Code fork (built on Code OSS), launched at AWS Summit NYC in July 2025. It embeds SDD directly into the IDE experience rather than providing it as an external CLI or framework.
Workflow: Requirements (EARS) → Design → Tasks → Implementation + Agent Hooks
Kiro translates natural language prompts into user stories with acceptance criteria written in EARS (Easy Approach to Requirements Syntax) notation, generates technical design documents with diagrams, and breaks work into sequenced implementation tasks. “Agent Hooks” (event-driven triggers on file changes or commits) automate documentation updates, test generation, and other maintenance tasks in the background.
Strengths:
- Lowest friction entry point: the entire SDD workflow is built into the IDE with no external setup.
- Agent Hooks provide continuous automation that keeps specs and code synchronized.
- Two workflow variants (Requirements-First and Design-First) accommodate different development styles.
- Supports both “vibe” and “spec” modes, letting developers choose the appropriate level of rigor per task.
- Backed by AWS infrastructure and enterprise-grade security features.
- Free during public preview with access to Claude Sonnet models at no cost.
- Bugfix Specs provide structured root-cause analysis, not just feature development.
Limitations:
- Vendor lock-in: tied to the Kiro IDE and its supported models (Claude Sonnet 4.5 and “Auto” mode).
- Cannot bring your own model or switch to a different LLM provider.
- Specs generated by Kiro can be verbose, with unnecessary bloat and assumptions that require manual trimming.
- Specs and code can fall out of sync — the tool doesn’t yet fully automate bidirectional spec-code reconciliation.
- As a proprietary tool, the community has less visibility into roadmap decisions.
- The forced structure adds overhead that some developers report as constraining for rapid iteration.
5. Tessl
Philosophy: Spec-as-Source — specifications are the maintained artifact, code is regenerated.
Founded by Guy Podjarny (creator of Snyk), Tessl represents the most radical SDD vision. Its thesis is that specifications should be the only artifact humans maintain, with code being a transient, generated output. Tessl launched its Spec Registry (open beta) and Tessl Framework (closed beta) in September 2025.
Workflow: Spec (Description + Capabilities + API) → Build → Verify
A Tessl spec defines three things for each software component: a natural language description, a list of capabilities with linked tests, and an API showing how to use it. The @generate annotation tells Tessl to produce code from the spec; @describe means the spec documents existing code.
Strengths:
- The Spec Registry provides 10,000+ specs for popular open-source libraries, directly addressing the API hallucination problem.
- “Tiles” (installable methodology and library context packages) allow composable workflow customization.
- Low-abstraction, per-file specs reduce the LLM’s interpretation steps, improving generation reliability.
- Integrates with any MCP-compatible agent (Claude Code, Cursor, etc.) via standard context injection.
- The framework enforces a plan → spec → implement → verify loop that prevents agents from skipping steps.
Limitations:
- The Framework is still in closed/limited beta with restricted access.
- The spec-as-source vision remains aspirational — current tooling still requires significant human oversight.
- Per-file granularity means teams must write and maintain many specification files.
- Non-determinism in code generation (even from the same spec) requires ongoing spec refinement to achieve reproducibility.
- The approach draws parallels to Model-Driven Development (MDD), which historically failed to achieve mainstream adoption for business applications due to overhead and constraints.
6. No Framework (Manual SDD / NONE)
Philosophy: Use AI agents’ built-in planning modes with lightweight personal conventions.
Many experienced developers practice SDD principles without adopting any formal framework. Tools like Cursor (with its plan mode), Claude Code (with CLAUDE.md memory and subagents), and GitHub Copilot already have built-in task planning capabilities. The “no framework” approach treats SDD as a discipline rather than a toolchain.
Typical Workflow: Write requirements in Markdown → Configure rules/CLAUDE.md → Use agent plan mode → Iterate
Strengths:
- Zero setup overhead and no additional dependencies.
- Maximum flexibility to adapt the workflow to the problem at hand.
- No learning curve beyond the AI tool itself.
- Avoids the “80% of your time reading instead of thinking” trap that critics attribute to formal SDD.
- Follows the Lean Startup philosophy: identify the next riskiest assumption and validate it, rather than specifying everything upfront.
- Better suited for rapid prototyping, small projects, and exploratory work.
Limitations:
- No persistent governance — architectural decisions exist only in conversation history, which has finite context windows.
- Inconsistency across team members: each developer invents their own approach.
- Knowledge doesn’t survive across sessions without deliberate documentation effort.
- Scales poorly: as codebase complexity grows, ad-hoc prompting produces increasingly unreliable results.
- No audit trail for architectural decisions or requirement traceability.
Comparative Analysis
At a Glance
| Dimension | Spec-Kit | OpenSpec | BMAD | Kiro | Tessl | No Framework |
|---|---|---|---|---|---|---|
| Creator | GitHub | Fission AI | BMad Code | Amazon/AWS | Guy Podjarny (ex-Snyk) | N/A |
| License | Open source | Open source | Open source (MIT) | Proprietary | Registry: Free; Framework: Beta | N/A |
| Maturity | Experimental (v0.0.30+) | Stable (v1.0+) | Stable v4.x; v6 alpha | Public preview | Registry: Open beta; Framework: Closed beta | N/A |
| SDD Level | Level 2 (Spec-Anchored) | Level 2 (Spec-Anchored) | Level 2 (Spec-Anchored) | Level 2 (Spec-Anchored) | Level 3 (Spec-as-Source) | Level 1 (Spec-First) |
| Runtime | Python 3.11+ | Node.js 20.19+ | Node.js v20+ | Standalone IDE | CLI + MCP | None |
| Agent Support | Multi-agent (Copilot, Claude Code, Gemini, Cursor) | 20+ tools | Claude Code (recommended), Cursor, VS Code | Kiro IDE only (Claude Sonnet) | Any MCP-compatible agent | Any |
| Greenfield | Strong | Strong | Strong | Strong | Strong | Strong |
| Brownfield | Weak | Strong | Moderate | Moderate | Moderate | Varies |
| Team Scale | Medium–Large | Small–Large | Medium–Enterprise | Solo–Enterprise | Medium–Large | Solo–Small |
| Setup Time | ~30 min | ~5 min | ~15 min | ~5 min (IDE install) | ~10 min | 0 min |
| Spec Verbosity | High | Low–Medium | High | Medium–High | Low (per-file) | User-defined |
| Phase Gates | Rigid (human approval per phase) | Fluid (no forced gates) | Rigid (agent handoffs) | Semi-rigid (per-phase review) | Moderate | None |
Best-Fit Scenarios
| Scenario | Recommended Approach | Why |
|---|---|---|
| Rapid prototype or proof-of-concept | No Framework or Kiro (vibe mode) | Minimize overhead; validate ideas fast |
| Solo developer, greenfield side project | OpenSpec | Lightweight, iterative, low ceremony |
| Small team, existing codebase | OpenSpec | Brownfield-first design with delta-based changes |
| Enterprise greenfield with multiple teams | Spec-Kit or BMAD | Constitutional governance and structured handoffs |
| Enterprise with compliance/audit needs | BMAD or Spec-Kit | Full traceability from requirements to implementation |
| Reducing open-source API hallucinations | Tessl Registry | 10,000+ library specs as agent context |
| Non-technical stakeholders involved | Kiro or BMAD | Visual IDE workflow (Kiro) or PM/Analyst agents (BMAD) |
| Team already using Cursor/Claude Code | OpenSpec or No Framework | Minimal disruption to existing workflows |
| Maximizing agent autonomy | Tessl Framework | Spec-as-source reduces human-in-loop surface area |
The Case Against SDD
A balanced treatment requires honest engagement with SDD’s critics. Several substantive objections have been raised:
“It’s Waterfall in Markdown.” Perhaps the most common criticism. Formal SDD workflows — especially Spec-Kit and BMAD — follow the classic waterfall sequence of requirements → design → implementation. While proponents argue that the iteration scope is narrower (per-feature, not per-system), critics counter that the rigidity of phase gates recreates the overhead that agile methodologies were designed to eliminate. One practitioner described SDD as spending “80% of your time reading instead of thinking.”
Diminishing returns at scale. SDD reportedly works well for greenfield projects but degrades as the codebase grows. Specs miss the point more often, become stale, and slow development. For large existing codebases, some practitioners find the approach nearly unusable without significant adaptation.
Agents don’t reliably follow specs. In documented cases, coding agents marked “verify implementation” tasks as complete without writing unit tests, instead producing manual testing instructions. The spec provided the intent, but the agent’s execution still drifted. SDD reduces but does not eliminate the need for careful review.
Context blindness persists. SDD agents discover context through text search and file navigation, just like coding agents. They lack deep understanding of runtime behavior, implicit business logic, and cross-system dependencies. The spec captures what can be articulated, but much critical knowledge remains tacit.
The “No-Code” parallel. Like no-code platforms that promised to eliminate developers but ultimately required developer expertise to use effectively, SDD requires mastery of both software architecture and prompt engineering. It doesn’t remove developers — it shifts their role from coder to specification author and AI output reviewer.
The Case For SDD
Proponents push back with their own arguments:
The specification problem is real. AI models optimize for functional correctness but not for architectural consistency, API contract compliance, or security constraints. When the only input is a chat prompt, the agent has no way to know about existing patterns, team conventions, or regulatory requirements. Specs fill that gap.
Context engineering is the new core skill. SDD’s artifacts — constitutions, specs, plans, tasks — are fundamentally outputs of context engineering. They provide AI agents with persistent, structured context that survives across sessions, team members, and tool changes.
It democratizes AI productivity. Without SDD, a senior developer who knows how to prompt effectively might see 3x productivity, while a junior developer struggles with the same tools. A standardized specify → plan → implement flow produces more consistent results regardless of individual prompting skill.
It creates an audit trail. Every architectural decision, requirement, and design choice is documented and traceable. For regulated industries (healthcare, finance, government), this traceability is not optional — it’s mandatory.
Research supports planning over rework. Studies cited by AWS indicate that addressing issues during the planning phase is 5–7x less costly than resolving them during implementation. SDD front-loads this planning investment.
Practical Recommendations
For teams considering SDD adoption, the following principles apply regardless of which tool you choose:
Start with a pilot. Find a small, low-risk project with a 3–5 person team that can complete it in a few weeks. The project should be significant enough to demonstrate value but not mission-critical.
Match the tool to the problem. Don’t adopt BMAD for bug fixes or use no framework for enterprise greenfield. The right level of ceremony depends on the complexity and risk of the work.
Specs should be concise. Several practitioners report that AI-generated specs are often bloated with unnecessary assumptions. Manual trimming and refinement of specs consistently improves the quality of downstream generated code. Garbage in, garbage out.
Treat specs as living documents. The single biggest failure mode is writing a spec once and never updating it. If the spec drifts from reality, it becomes worse than no spec at all — it becomes misleading context.
Invest in the human skill shift. The premium skills in an SDD world are requirements engineering, system design, and critical evaluation of AI output — not syntax mastery. Training should reflect this.
Looking Ahead
SDD is still in its early chapters. The ThoughtWorks Technology Radar (Volume 33) placed it in the “Assess” ring, acknowledging its fascination while noting that workflows remain elaborate and opinionated. As of early 2026, we’re seeing rapid convergence: tools are learning from each other (OpenSpec’s fluidity influencing Spec-Kit’s rigidity, BMAD’s agent model inspiring Tessl’s composable tiles), and the definition of what a “spec” even means continues to evolve.
The thesis is clear, even if the implementation details remain in flux: structured, spec-centric workflows unlock generative AI’s full potential, transforming it from a novelty into a reliable, scalable engineering tool. The developers who master writing specifications — not code — will architect the next generation of software.
Whether that future looks more like Tessl’s radical spec-as-source vision or OpenSpec’s pragmatic delta-based iteration remains to be seen. But the era of unstructured vibe coding as a professional practice is drawing to a close.
Further Reading:
- Birgitta Böckeler, “Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl” — martinfowler.com
- Liu Shangqi, “Spec-driven development: Unpacking one of 2025’s key new AI-assisted engineering practices” — Thoughtworks
- ThoughtWorks Technology Radar, Volume 33 — Spec-driven development (Assess)