v1.0.0
This commit is contained in:
352
skills/software-development/subagent-driven-development/SKILL.md
Normal file
352
skills/software-development/subagent-driven-development/SKILL.md
Normal file
@@ -0,0 +1,352 @@
|
||||
---
|
||||
name: subagent-driven-development
|
||||
description: "Execute plans via delegate_task subagents (2-stage review)."
|
||||
version: 1.1.0
|
||||
author: Hermes Agent (adapted from obra/superpowers)
|
||||
license: MIT
|
||||
platforms: [linux, macos, windows]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [delegation, subagent, implementation, workflow, parallel]
|
||||
related_skills: [writing-plans, requesting-code-review, test-driven-development]
|
||||
---
|
||||
|
||||
# Subagent-Driven Development
|
||||
|
||||
## Overview
|
||||
|
||||
Execute implementation plans by dispatching fresh subagents per task with systematic two-stage review.
|
||||
|
||||
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
- You have an implementation plan (from writing-plans skill or user requirements)
|
||||
- Tasks are mostly independent
|
||||
- Quality and spec compliance are important
|
||||
- You want automated review between tasks
|
||||
|
||||
**vs. manual execution:**
|
||||
- Fresh context per task (no confusion from accumulated state)
|
||||
- Automated review process catches issues early
|
||||
- Consistent quality checks across all tasks
|
||||
- Subagents can ask questions before starting work
|
||||
|
||||
## The Process
|
||||
|
||||
### 1. Read and Parse Plan
|
||||
|
||||
Read the plan file. Extract ALL tasks with their full text and context upfront. Create a todo list:
|
||||
|
||||
```python
|
||||
# Read the plan
|
||||
read_file("docs/plans/feature-plan.md")
|
||||
|
||||
# Create todo list with all tasks
|
||||
todo([
|
||||
{"id": "task-1", "content": "Create User model with email field", "status": "pending"},
|
||||
{"id": "task-2", "content": "Add password hashing utility", "status": "pending"},
|
||||
{"id": "task-3", "content": "Create login endpoint", "status": "pending"},
|
||||
])
|
||||
```
|
||||
|
||||
**Key:** Read the plan ONCE. Extract everything. Don't make subagents read the plan file — provide the full task text directly in context.
|
||||
|
||||
### 2. Per-Task Workflow
|
||||
|
||||
For EACH task in the plan:
|
||||
|
||||
#### Step 1: Dispatch Implementer Subagent
|
||||
|
||||
Use `delegate_task` with complete context:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Implement Task 1: Create User model with email and password_hash fields",
|
||||
context="""
|
||||
TASK FROM PLAN:
|
||||
- Create: src/models/user.py
|
||||
- Add User class with email (str) and password_hash (str) fields
|
||||
- Use bcrypt for password hashing
|
||||
- Include __repr__ for debugging
|
||||
|
||||
FOLLOW TDD:
|
||||
1. Write failing test in tests/models/test_user.py
|
||||
2. Run: pytest tests/models/test_user.py -v (verify FAIL)
|
||||
3. Write minimal implementation
|
||||
4. Run: pytest tests/models/test_user.py -v (verify PASS)
|
||||
5. Run: pytest tests/ -q (verify no regressions)
|
||||
6. Commit: git add -A && git commit -m "feat: add User model with password hashing"
|
||||
|
||||
PROJECT CONTEXT:
|
||||
- Python 3.11, Flask app in src/app.py
|
||||
- Existing models in src/models/
|
||||
- Tests use pytest, run from project root
|
||||
- bcrypt already in requirements.txt
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
#### Step 2: Dispatch Spec Compliance Reviewer
|
||||
|
||||
After the implementer completes, verify against the original spec:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review if implementation matches the spec from the plan",
|
||||
context="""
|
||||
ORIGINAL TASK SPEC:
|
||||
- Create src/models/user.py with User class
|
||||
- Fields: email (str), password_hash (str)
|
||||
- Use bcrypt for password hashing
|
||||
- Include __repr__
|
||||
|
||||
CHECK:
|
||||
- [ ] All requirements from spec implemented?
|
||||
- [ ] File paths match spec?
|
||||
- [ ] Function signatures match spec?
|
||||
- [ ] Behavior matches expected?
|
||||
- [ ] Nothing extra added (no scope creep)?
|
||||
|
||||
OUTPUT: PASS or list of specific spec gaps to fix.
|
||||
""",
|
||||
toolsets=['file']
|
||||
)
|
||||
```
|
||||
|
||||
**If spec issues found:** Fix gaps, then re-run spec review. Continue only when spec-compliant.
|
||||
|
||||
#### Step 3: Dispatch Code Quality Reviewer
|
||||
|
||||
After spec compliance passes:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review code quality for Task 1 implementation",
|
||||
context="""
|
||||
FILES TO REVIEW:
|
||||
- src/models/user.py
|
||||
- tests/models/test_user.py
|
||||
|
||||
CHECK:
|
||||
- [ ] Follows project conventions and style?
|
||||
- [ ] Proper error handling?
|
||||
- [ ] Clear variable/function names?
|
||||
- [ ] Adequate test coverage?
|
||||
- [ ] No obvious bugs or missed edge cases?
|
||||
- [ ] No security issues?
|
||||
|
||||
OUTPUT FORMAT:
|
||||
- Critical Issues: [must fix before proceeding]
|
||||
- Important Issues: [should fix]
|
||||
- Minor Issues: [optional]
|
||||
- Verdict: APPROVED or REQUEST_CHANGES
|
||||
""",
|
||||
toolsets=['file']
|
||||
)
|
||||
```
|
||||
|
||||
**If quality issues found:** Fix issues, re-review. Continue only when approved.
|
||||
|
||||
#### Step 4: Mark Complete
|
||||
|
||||
```python
|
||||
todo([{"id": "task-1", "content": "Create User model with email field", "status": "completed"}], merge=True)
|
||||
```
|
||||
|
||||
### 3. Final Review
|
||||
|
||||
After ALL tasks are complete, dispatch a final integration reviewer:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review the entire implementation for consistency and integration issues",
|
||||
context="""
|
||||
All tasks from the plan are complete. Review the full implementation:
|
||||
- Do all components work together?
|
||||
- Any inconsistencies between tasks?
|
||||
- All tests passing?
|
||||
- Ready for merge?
|
||||
""",
|
||||
toolsets=['terminal', 'file']
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Verify and Commit
|
||||
|
||||
```bash
|
||||
# Run full test suite
|
||||
pytest tests/ -q
|
||||
|
||||
# Review all changes
|
||||
git diff --stat
|
||||
|
||||
# Final commit if needed
|
||||
git add -A && git commit -m "feat: complete [feature name] implementation"
|
||||
```
|
||||
|
||||
## Task Granularity
|
||||
|
||||
**Each task = 2-5 minutes of focused work.**
|
||||
|
||||
**Too big:**
|
||||
- "Implement user authentication system"
|
||||
|
||||
**Right size:**
|
||||
- "Create User model with email and password fields"
|
||||
- "Add password hashing function"
|
||||
- "Create login endpoint"
|
||||
- "Add JWT token generation"
|
||||
- "Create registration endpoint"
|
||||
|
||||
## Red Flags — Never Do These
|
||||
|
||||
- Start implementation without a plan
|
||||
- Skip reviews (spec compliance OR code quality)
|
||||
- Proceed with unfixed critical/important issues
|
||||
- Dispatch multiple implementation subagents for tasks that touch the same files
|
||||
- Make subagent read the plan file (provide full text in context instead)
|
||||
- Skip scene-setting context (subagent needs to understand where the task fits)
|
||||
- Ignore subagent questions (answer before letting them proceed)
|
||||
- Accept "close enough" on spec compliance
|
||||
- Skip review loops (reviewer found issues → implementer fixes → review again)
|
||||
- Let implementer self-review replace actual review (both are needed)
|
||||
- **Start code quality review before spec compliance is PASS** (wrong order)
|
||||
- Move to next task while either review has open issues
|
||||
|
||||
## Handling Issues
|
||||
|
||||
### If Subagent Asks Questions
|
||||
|
||||
- Answer clearly and completely
|
||||
- Provide additional context if needed
|
||||
- Don't rush them into implementation
|
||||
|
||||
### If Reviewer Finds Issues
|
||||
|
||||
- Implementer subagent (or a new one) fixes them
|
||||
- Reviewer reviews again
|
||||
- Repeat until approved
|
||||
- Don't skip the re-review
|
||||
|
||||
### If Subagent Fails a Task
|
||||
|
||||
- Dispatch a new fix subagent with specific instructions about what went wrong
|
||||
- Don't try to fix manually in the controller session (context pollution)
|
||||
|
||||
## Efficiency Notes
|
||||
|
||||
**Why fresh subagent per task:**
|
||||
- Prevents context pollution from accumulated state
|
||||
- Each subagent gets clean, focused context
|
||||
- No confusion from prior tasks' code or reasoning
|
||||
|
||||
**Why two-stage review:**
|
||||
- Spec review catches under/over-building early
|
||||
- Quality review ensures the implementation is well-built
|
||||
- Catches issues before they compound across tasks
|
||||
|
||||
**Cost trade-off:**
|
||||
- More subagent invocations (implementer + 2 reviewers per task)
|
||||
- But catches issues early (cheaper than debugging compounded problems later)
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With writing-plans
|
||||
|
||||
This skill EXECUTES plans created by the writing-plans skill:
|
||||
1. User requirements → writing-plans → implementation plan
|
||||
2. Implementation plan → subagent-driven-development → working code
|
||||
|
||||
### With test-driven-development
|
||||
|
||||
Implementer subagents should follow TDD:
|
||||
1. Write failing test first
|
||||
2. Implement minimal code
|
||||
3. Verify test passes
|
||||
4. Commit
|
||||
|
||||
Include TDD instructions in every implementer context.
|
||||
|
||||
### With requesting-code-review
|
||||
|
||||
The two-stage review process IS the code review. For final integration review, use the requesting-code-review skill's review dimensions.
|
||||
|
||||
### With systematic-debugging
|
||||
|
||||
If a subagent encounters bugs during implementation:
|
||||
1. Follow systematic-debugging process
|
||||
2. Find root cause before fixing
|
||||
3. Write regression test
|
||||
4. Resume implementation
|
||||
|
||||
## Example Workflow
|
||||
|
||||
```
|
||||
[Read plan: docs/plans/auth-feature.md]
|
||||
[Create todo list with 5 tasks]
|
||||
|
||||
--- Task 1: Create User model ---
|
||||
[Dispatch implementer subagent]
|
||||
Implementer: "Should email be unique?"
|
||||
You: "Yes, email must be unique"
|
||||
Implementer: Implemented, 3/3 tests passing, committed.
|
||||
|
||||
[Dispatch spec reviewer]
|
||||
Spec reviewer: ✅ PASS — all requirements met
|
||||
|
||||
[Dispatch quality reviewer]
|
||||
Quality reviewer: ✅ APPROVED — clean code, good tests
|
||||
|
||||
[Mark Task 1 complete]
|
||||
|
||||
--- Task 2: Password hashing ---
|
||||
[Dispatch implementer subagent]
|
||||
Implementer: No questions, implemented, 5/5 tests passing.
|
||||
|
||||
[Dispatch spec reviewer]
|
||||
Spec reviewer: ❌ Missing: password strength validation (spec says "min 8 chars")
|
||||
|
||||
[Implementer fixes]
|
||||
Implementer: Added validation, 7/7 tests passing.
|
||||
|
||||
[Dispatch spec reviewer again]
|
||||
Spec reviewer: ✅ PASS
|
||||
|
||||
[Dispatch quality reviewer]
|
||||
Quality reviewer: Important: Magic number 8, extract to constant
|
||||
Implementer: Extracted MIN_PASSWORD_LENGTH constant
|
||||
Quality reviewer: ✅ APPROVED
|
||||
|
||||
[Mark Task 2 complete]
|
||||
|
||||
... (continue for all tasks)
|
||||
|
||||
[After all tasks: dispatch final integration reviewer]
|
||||
[Run full test suite: all passing]
|
||||
[Done!]
|
||||
```
|
||||
|
||||
## Remember
|
||||
|
||||
```
|
||||
Fresh subagent per task
|
||||
Two-stage review every time
|
||||
Spec compliance FIRST
|
||||
Code quality SECOND
|
||||
Never skip reviews
|
||||
Catch issues early
|
||||
```
|
||||
|
||||
**Quality is not an accident. It's the result of systematic process.**
|
||||
|
||||
## Further reading (load when relevant)
|
||||
|
||||
When the orchestration involves significant context usage, long review loops, or complex validation checkpoints, load these references for the specific discipline:
|
||||
|
||||
- **`references/context-budget-discipline.md`** — Four-tier context degradation model (PEAK / GOOD / DEGRADING / POOR), read-depth rules that scale with context window size, and early warning signs of silent degradation. Load when a run will clearly consume significant context (multi-phase plans, many subagents, large artifacts).
|
||||
- **`references/gates-taxonomy.md`** — The four canonical gate types (Pre-flight, Revision, Escalation, Abort) with behavior, recovery, and examples. Load when designing or reviewing any workflow that has validation checkpoints — use the vocabulary explicitly so each gate has defined entry, failure behavior, and resumption rules.
|
||||
|
||||
Both references adapted from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson).
|
||||
@@ -0,0 +1,53 @@
|
||||
# Context Budget Discipline
|
||||
|
||||
Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.
|
||||
|
||||
Adapted from the GSD (Get Shit Done) project's context-budget reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
|
||||
|
||||
## Universal rules
|
||||
|
||||
Every workflow that spawns agents or reads significant content must follow these:
|
||||
|
||||
1. **Never read agent definition files.** `delegate_task` auto-loads them — you reading them too just doubles the cost.
|
||||
2. **Never inline large files into subagent prompts.** Tell the agent to read the file from disk with `read_file` instead. The subagent gets full content; your context stays lean.
|
||||
3. **Read depth scales with context window.** See the table below.
|
||||
4. **Delegate heavy work to subagents.** The orchestrator routes; it doesn't execute.
|
||||
5. **Proactively warn** the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").
|
||||
|
||||
## Read depth by context window
|
||||
|
||||
Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.
|
||||
|
||||
| Context window | Subagent output reading | Summary files | Verification files | Plans for other phases |
|
||||
|----------------|-------------------------|---------------|--------------------|-----------------------|
|
||||
| < 500k (e.g. 200k) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
|
||||
| >= 500k (1M models) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
|
||||
|
||||
"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.
|
||||
|
||||
## Four-tier degradation model
|
||||
|
||||
Monitor your context usage and shift behavior as you climb the tiers. The point is to notice *before* you hit the wall, not when responses start truncating.
|
||||
|
||||
| Tier | Usage | Behavior |
|
||||
|------|-------|----------|
|
||||
| **PEAK** | 0 – 30% | Full operations. Read bodies, spawn multiple agents in parallel, inline results freely. |
|
||||
| **GOOD** | 30 – 50% | Normal operations. Prefer frontmatter reads. Delegate aggressively. |
|
||||
| **DEGRADING** | 50 – 70% | Economize. Frontmatter-only reads, minimal inlining, **warn the user** about budget. |
|
||||
| **POOR** | 70%+ | Emergency mode. **Checkpoint progress immediately.** No new reads unless critical. Finish the current task and stop cleanly. |
|
||||
|
||||
## Early warning signs (before panic thresholds fire)
|
||||
|
||||
Quality degrades *gradually* before hard limits hit. Watch for these:
|
||||
|
||||
- **Silent partial completion.** Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
|
||||
- **Increasing vagueness.** Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
|
||||
- **Skipped protocol steps.** Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."
|
||||
|
||||
When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.
|
||||
|
||||
## Fundamental limitation
|
||||
|
||||
When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.
|
||||
|
||||
**Mitigation:** in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.
|
||||
@@ -0,0 +1,93 @@
|
||||
# Gates Taxonomy
|
||||
|
||||
Canonical gate types for validation checkpoints across any workflow that spawns subagents, runs review loops, or has human-approval pauses. Every validation checkpoint maps to one of these four types — naming them explicitly makes the workflow legible and prevents "what happens when this check fails?" confusion.
|
||||
|
||||
Adapted from the GSD (Get Shit Done) project's gates reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
|
||||
|
||||
## The four gate types
|
||||
|
||||
### 1. Pre-flight gate
|
||||
|
||||
**Purpose:** Validates preconditions before starting an operation.
|
||||
|
||||
**Behavior:** Blocks entry if conditions unmet. No partial work created — bail before anything changes.
|
||||
|
||||
**Recovery:** Fix the missing precondition, then retry.
|
||||
|
||||
**Examples:**
|
||||
- Implementation phase checks that the plan file exists before it starts writing code.
|
||||
- Delegated subagent checks that required env vars are set before making API calls.
|
||||
- Commit checks that tests passed before pushing.
|
||||
|
||||
### 2. Revision gate
|
||||
|
||||
**Purpose:** Evaluates output quality and routes to revision if insufficient.
|
||||
|
||||
**Behavior:** Loops back to the producer with specific feedback. Bounded by an iteration cap (typically 3).
|
||||
|
||||
**Recovery:** Producer addresses feedback; checker re-evaluates. The loop escalates early if issue count does not decrease between consecutive iterations (stall detection). After max iterations, escalates to the user unconditionally — never loop forever.
|
||||
|
||||
**Examples:**
|
||||
- Plan reviewer reads a draft plan, returns specific issues, planner revises, reviewer re-reads (max 3 cycles).
|
||||
- Code reviewer checks subagent-produced code against must-haves; dispatches fixes back to the implementer if any must-have failed.
|
||||
- Test coverage checker validates new tests exercise the new paths; if not, sends back to author.
|
||||
|
||||
### 3. Escalation gate
|
||||
|
||||
**Purpose:** Surfaces unresolvable issues to the human for a decision.
|
||||
|
||||
**Behavior:** Pauses workflow, presents options, waits for human input. Never guesses, never picks a default.
|
||||
|
||||
**Recovery:** Human chooses action; workflow resumes on the selected path.
|
||||
|
||||
**Examples:**
|
||||
- Revision loop exhausted after 3 iterations.
|
||||
- Merge conflict during automated worktree cleanup.
|
||||
- Ambiguous requirement — two reasonable interpretations and the choice changes the approach.
|
||||
- Subagent reports "the plan says X but the codebase actually does Y" — human decides which is right.
|
||||
|
||||
### 4. Abort gate
|
||||
|
||||
**Purpose:** Terminates the operation to prevent damage or waste.
|
||||
|
||||
**Behavior:** Stops immediately, preserves state (checkpoint current progress), reports the specific reason.
|
||||
|
||||
**Recovery:** Human investigates root cause, fixes, restarts from checkpoint.
|
||||
|
||||
**Examples:**
|
||||
- Context window critically low during execution (POOR tier, >70%) — abort cleanly rather than produce truncated output.
|
||||
- Critical dependency unavailable mid-run (network down, API key revoked).
|
||||
- Unrecoverable filesystem state (disk full, permissions lost).
|
||||
- Safety invariant violated (agent attempted an irreversible destructive action outside approved scope).
|
||||
|
||||
## How to use this in a skill
|
||||
|
||||
When you write an orchestration skill that has validation checkpoints, **name each checkpoint by its gate type explicitly** and answer three questions:
|
||||
|
||||
1. **What condition triggers this gate?** (e.g., "plan file missing", "issue count didn't decrease", "context >70%")
|
||||
2. **What happens when it fails?** (block / loop back / ask human / abort)
|
||||
3. **Who resumes, and from where?** (fix precondition + retry, revise + re-check, human decision, restart from checkpoint)
|
||||
|
||||
Answering these three up front means your skill never hits "what do we do now?" at runtime.
|
||||
|
||||
## Example — a review loop with all four gate types
|
||||
|
||||
```
|
||||
[Pre-flight] plan.md exists and is non-empty? → no: bail, ask user to write a plan first
|
||||
↓ yes
|
||||
[Execute] subagent implements task
|
||||
↓
|
||||
[Revision] reviewer checks against must-haves → fail: loop back to subagent (max 3)
|
||||
↓ pass
|
||||
[Pre-flight] tests pass? → no: bail, report failing tests
|
||||
↓ yes
|
||||
[Commit]
|
||||
↓
|
||||
(on revision loop exhaustion)
|
||||
[Escalation] "3 review cycles failed to converge on issue X — pick: force-merge, rewrite task, abandon"
|
||||
↓ user picks
|
||||
(on any tier-POOR context pressure during loop)
|
||||
[Abort] "context at 73%, checkpointing and stopping"
|
||||
```
|
||||
|
||||
The vocabulary is small on purpose. Every gate in every workflow should fit one of these four. If you find yourself inventing a fifth, it's probably a revision gate with extra branching, or an escalation gate in disguise.
|
||||
Reference in New Issue
Block a user