Introduction
Prodigy is an AI-powered workflow orchestration tool that enables development teams to automate complex tasks using Claude AI through structured YAML workflows.
What is Prodigy?
Prodigy combines the power of Claude AI with workflow orchestration to:
- Automate repetitive development tasks - Code reviews, refactoring, testing
- Process work in parallel - MapReduce-style parallel execution across git worktrees
- Maintain quality - Built-in validation, error handling, and retry mechanisms
- Track changes - Full git integration with automatic commits and merge workflows
Quick Start
Create a simple workflow in workflow.yml
:
- shell: "cargo build"
- shell: "cargo test"
on_failure:
claude: "/fix-failing-tests"
- shell: "cargo clippy"
Run it:
prodigy run workflow.yml
Key Concepts
- Workflows: YAML files defining sequences of commands
- Commands: Shell commands, Claude AI invocations, or control flow
- Variables: Dynamic values captured and interpolated across steps
- MapReduce: Parallel processing across multiple git worktrees
- Validation: Automatic testing and quality checks
Next Steps
- Workflow Basics - Learn workflow fundamentals
- Command Types - Explore available command types
- Examples - See real-world workflows
Workflow Basics
This chapter covers the fundamentals of creating Prodigy workflows. You’ll learn about workflow structure, basic commands, and configuration options.
Overview
Prodigy workflows are YAML files that define a sequence of commands to execute. They can be as simple as a list of shell commands or as complex as parallel MapReduce jobs.
Two Main Workflow Types:
- Standard Workflows: Sequential command execution (covered here)
- MapReduce Workflows: Parallel processing with map/reduce phases (see MapReduce chapter)
Simple Workflows
The simplest workflow is just an array of commands:
# Simple array format - just list your commands
- shell: "echo 'Starting workflow...'"
- claude: "/prodigy-analyze"
- shell: "cargo test"
This executes each command sequentially. No additional configuration needed.
Full Workflow Structure
For more complex workflows, use the full format with explicit configuration:
# Full format with environment and merge configuration
commands:
- shell: "cargo build"
- claude: "/prodigy-test"
# Global environment variables (available to all commands)
env:
NODE_ENV: production
API_URL: https://api.example.com
# Secret environment variables (masked in logs)
secrets:
API_KEY: "${env:SECRET_API_KEY}"
# Environment files to load (.env format)
env_files:
- .env.production
# Environment profiles (switch contexts easily)
profiles:
development:
NODE_ENV: development
DEBUG: "true"
# Custom merge workflow (for worktree integration)
merge:
- shell: "git fetch origin"
- claude: "/merge-worktree ${merge.source_branch}"
timeout: 600 # Optional timeout in seconds
Available Fields
Standard workflows support these top-level fields:
Field | Type | Required | Description |
---|---|---|---|
commands | Array | Yes* | List of commands to execute sequentially |
env | Map | No | Global environment variables |
secrets | Map | No | Secret environment variables (masked in logs) |
env_files | Array | No | Paths to .env files to load |
profiles | Map | No | Named environment profiles |
merge | Object | No | Custom merge workflow for worktree integration |
Note: commands
is only required in the full format. Simple array format doesn’t use the commands
key.
Command Types
Prodigy supports several types of commands in workflows:
Core Commands
shell:
- Execute shell commands
- shell: "cargo build --release"
- shell: "npm install"
claude:
- Invoke Claude Code commands
- claude: "/prodigy-lint"
- claude: "/analyze codebase"
Advanced Commands
goal_seek:
- Goal-seeking operations with validation (see Advanced Features)foreach:
- Iterate over lists with nested commands (see Advanced Features)validate:
- Validation steps with configurable thresholds (see Commands)
Deprecated:
test:
- Deprecated in favor ofshell:
withon_failure:
handlers
For detailed information on each command type and their fields, see the Command Types chapter.
Command-Level Options
All command types support additional fields for advanced control:
Basic Options
- shell: "cargo test"
id: "run-tests" # Step identifier for output referencing
commit_required: true # Expect git commit after this step
timeout: 300 # Timeout in seconds
Conditional Execution
Run commands based on conditions:
- shell: "deploy.sh"
when: "${branch} == 'main'" # Only run on main branch
Error Handling
Handle failures gracefully:
- shell: "risky-command"
on_failure:
shell: "cleanup.sh" # Run on failure
on_success:
shell: "notify.sh" # Run on success
Output Capture
Capture command output to variables:
- shell: "git rev-parse HEAD"
id: "get-commit"
capture: "commit_hash" # Capture to variable
capture_format: "string" # Format: string|json|lines|number|boolean
For comprehensive coverage of these options, see:
- Advanced Features - Conditional execution, output capture, timeouts
- Error Handling - on_failure and on_success handlers
- Variables - Variable interpolation and capture formats
Environment Configuration
Environment variables can be configured at multiple levels:
Global Environment Variables
env:
NODE_ENV: production
DATABASE_URL: postgres://localhost/mydb
Secret Variables
Secret variables are masked in logs for security:
secrets:
API_KEY: "${env:SECRET_API_KEY}"
DB_PASSWORD: "${env:DATABASE_PASSWORD}"
Environment Files
Load variables from .env files:
env_files:
- .env
- .env.production
Environment Profiles
Switch between different environment contexts:
profiles:
development:
NODE_ENV: development
DEBUG: "true"
API_URL: http://localhost:3000
production:
NODE_ENV: production
DEBUG: "false"
API_URL: https://api.example.com
Activate a profile with: prodigy run --profile development
For more details, see the Environment Variables chapter.
Merge Workflows
Merge workflows execute when merging worktree changes back to the main branch. This feature enables custom validation, testing, and conflict resolution before integrating changes.
When to use merge workflows:
- Run tests before merging
- Validate code quality
- Handle merge conflicts automatically
- Sync with upstream changes
merge:
commands:
- shell: "git fetch origin"
- shell: "git merge origin/main"
- shell: "cargo test"
- claude: "/prodigy-merge-worktree ${merge.source_branch}"
timeout: 600 # Optional: overall timeout for merge workflow
Available merge variables:
${merge.worktree}
- Worktree name (e.g., “prodigy-session-abc123”)${merge.source_branch}
- Source branch (worktree branch)${merge.target_branch}
- Target branch (usually main/master)${merge.session_id}
- Session ID for correlation
These variables are only available within the merge workflow context.
Complete Example
Here’s a complete workflow combining multiple features:
# Environment configuration
env:
RUST_BACKTRACE: 1
env_files:
- .env
profiles:
ci:
CI: "true"
VERBOSE: "true"
# Workflow commands
commands:
- shell: "cargo fmt --check"
- shell: "cargo clippy -- -D warnings"
- shell: "cargo test --all"
- claude: "/prodigy-lint"
# Custom merge workflow
merge:
commands:
- shell: "cargo test"
- claude: "/prodigy-merge-worktree ${merge.source_branch}"
timeout: 300
Next Steps
Now that you understand basic workflows, explore these topics:
- Command Reference - Detailed guide to all command types and options
- Environment Variables - Advanced environment configuration
- Error Handling - Handle failures gracefully
- MapReduce Workflows - Parallel processing for large-scale tasks
- Conditional Execution - Run commands based on conditions
- Output Capture - Capture and use command outputs
MapReduce Workflows
Complete Structure
name: parallel-processing
mode: mapreduce
# Optional setup phase
setup:
- shell: "generate-work-items.sh"
- shell: "debtmap analyze . --output items.json"
# Map phase: Process items in parallel
map:
# Input source (JSON file or command)
input: "items.json"
# JSONPath expression to extract items
json_path: "$.items[*]"
# Agent template (commands run for each item)
# Modern syntax: Commands directly under agent_template
agent_template:
- claude: "/process '${item}'"
- shell: "test ${item.path}"
on_failure:
claude: "/fix-issue '${item}'"
# DEPRECATED: Nested 'commands' syntax (still supported)
# agent_template:
# commands:
# - claude: "/process '${item}'"
# Maximum parallel agents
max_parallel: 10
# Optional: Filter items
filter: "item.score >= 5"
# Optional: Sort items
sort_by: "item.priority DESC"
# Optional: Limit number of items
max_items: 100
# Optional: Skip items
offset: 10
# Optional: Deduplicate by field
distinct: "item.id"
# Optional: Agent timeout in seconds
agent_timeout_secs: 300
# Reduce phase: Aggregate results
# Modern syntax: Commands directly under reduce
reduce:
- claude: "/summarize ${map.results}"
- shell: "echo 'Processed ${map.successful}/${map.total} items'"
# DEPRECATED: Nested 'commands' syntax (still supported)
# reduce:
# commands:
# - claude: "/summarize ${map.results}"
# Optional: Custom merge workflow (supports two formats)
merge:
# Simple array format
- shell: "git fetch origin"
- claude: "/merge-worktree ${merge.source_branch}"
- shell: "cargo test"
# OR full format with timeout
# merge:
# commands:
# - shell: "git fetch origin"
# - claude: "/merge-worktree ${merge.source_branch}"
# timeout: 600 # Timeout in seconds
# Error handling policy
error_policy:
on_item_failure: dlq # dlq, retry, skip, stop, or custom handler name
continue_on_failure: true
max_failures: 5
failure_threshold: 0.2 # 20% failure rate
error_collection: aggregate # aggregate, immediate, or batched:N
# Circuit breaker configuration
circuit_breaker:
failure_threshold: 5 # Open circuit after N failures
success_threshold: 2 # Close circuit after N successes
timeout: "60s" # Duration before attempting half-open (e.g., "60s", "1m", "5m")
half_open_requests: 3 # Test requests in half-open state
# Retry configuration with backoff
retry_config:
max_attempts: 3
backoff:
type: exponential # fixed, linear, exponential, fibonacci
initial: "1s" # Initial delay (e.g., "1s", "500ms")
multiplier: 2 # For exponential
# Note: max_delay is NOT supported - use max_attempts to limit retries
# Convenience fields (alternative to nested error_policy)
# These top-level fields map to error_policy for simpler syntax
on_item_failure: dlq
continue_on_failure: true
max_failures: 5
Setup Phase (Advanced)
The setup phase supports two formats: simple array OR full configuration object.
# Simple array format
setup:
- shell: "prepare-data.sh"
- shell: "analyze-codebase.sh"
# Full configuration format with timeout and capture
setup:
commands:
- shell: "prepare-data.sh"
- shell: "analyze-codebase.sh"
# Timeout for entire setup phase (seconds)
timeout: 300
# Capture outputs from setup commands
capture_outputs:
# Simple format (legacy - just index)
file_count: 0 # Capture from command at index 0
# Full CaptureConfig format
analysis_result:
command_index: 1
format: json # string, number, json, lines, boolean
Setup Phase Fields:
commands
- Array of commands to execute (or use simple array format at top level)timeout
- Timeout for entire setup phase in secondscapture_outputs
- Map of variable names to command outputs (supports Simple(index) or full CaptureConfig)
Global Storage Architecture
MapReduce workflows use a global storage architecture located in ~/.prodigy/
(not .prodigy/
in your project). This enables:
- Cross-worktree event aggregation: Multiple worktrees working on the same job share event logs
- Persistent state management: Job checkpoints survive worktree cleanup
- Centralized monitoring: All job data accessible from a single location
- Efficient storage: Deduplication across worktrees
Storage Locations
~/.prodigy/
├── events/
│ └── {repo_name}/ # Events grouped by repository
│ └── {job_id}/ # Job-specific events
│ └── events-{timestamp}.jsonl # Event log files
├── dlq/
│ └── {repo_name}/ # DLQ grouped by repository
│ └── {job_id}/ # Job-specific failed items
└── state/
└── {repo_name}/ # State grouped by repository
└── mapreduce/ # MapReduce job states
└── jobs/
└── {job_id}/ # Job-specific checkpoints
Event Tracking
All MapReduce execution events are logged to ~/.prodigy/events/{repo_name}/{job_id}/
for debugging and monitoring:
Events Tracked:
- Agent lifecycle events (started, completed, failed)
- Work item processing status
- Checkpoint saves for resumption
- Error details with correlation IDs
- Cross-worktree event aggregation for parallel jobs
Event Log Format: Events are stored in JSONL (JSON Lines) format, with each line representing a single event:
{"timestamp":"2024-01-01T12:00:00Z","event_type":"agent_started","agent_id":"agent-1","item_id":"item-001"}
{"timestamp":"2024-01-01T12:05:00Z","event_type":"agent_completed","agent_id":"agent-1","item_id":"item-001","status":"success"}
Viewing Events:
# View all events for a job
prodigy events <job_id>
# Stream events in real-time
prodigy events <job_id> --follow
Checkpoint and Resume
MapReduce workflows automatically save checkpoints to enable resumption after interruption.
Checkpoint Structure
Checkpoints are stored in ~/.prodigy/state/{repo_name}/mapreduce/jobs/{job_id}/
and contain:
{
"job_id": "mapreduce-1234567890",
"workflow_file": "workflow.yml",
"phase": "map",
"items_processed": 45,
"items_total": 100,
"items_remaining": ["item-046", "item-047", "..."],
"successful_items": 43,
"failed_items": 2,
"started_at": "2024-01-01T12:00:00Z",
"last_checkpoint_at": "2024-01-01T12:30:00Z"
}
Resume Behavior
When resuming a MapReduce job:
- Checkpoint Loading: Prodigy loads the most recent checkpoint from
~/.prodigy/state/
- Work Item Recovery: Items marked as “in progress” are reset to “pending”
- Failed Item Handling: Previously failed items are moved to DLQ (not retried automatically)
- Partial Results: Successfully processed items are preserved
- Phase Continuation: Job resumes from the phase it was interrupted in
Resume Command:
# Resume from checkpoint
prodigy resume-job <job_id>
# Resume with different parallelism
prodigy resume-job <job_id> --max-parallel 20
# Resume and show detailed logs
prodigy resume-job <job_id> -v
Dead Letter Queue (DLQ)
Failed work items are automatically stored in the DLQ for review and retry.
DLQ Storage
Failed items are stored in ~/.prodigy/dlq/{repo_name}/{job_id}/
with this structure:
{
"item_id": "item-047",
"item_data": {
"path": "src/module.rs",
"score": 8,
"priority": "high"
},
"failure_reason": "Command failed: cargo test",
"error_details": "test failed: expected X but got Y",
"failed_at": "2024-01-01T12:15:00Z",
"attempt_count": 3,
"correlation_id": "agent-7-item-047"
}
DLQ Retry
The prodigy dlq retry
command allows you to reprocess failed items:
# Retry all failed items for a job
prodigy dlq retry <job_id>
# Retry with custom parallelism (default: 5)
prodigy dlq retry <job_id> --max-parallel 10
# Dry run to see what would be retried
prodigy dlq retry <job_id> --dry-run
# Verbose output for debugging
prodigy dlq retry <job_id> -v
DLQ Retry Features:
- Streams items to avoid memory issues with large queues
- Respects original workflow’s
max_parallel
setting - Preserves correlation IDs for tracking
- Updates DLQ state (removes successful, keeps failed)
- Supports interruption and resumption
- Retried items inherit original workflow configuration
DLQ Retry Workflow:
- Load failed items from
~/.prodigy/dlq/{repo_name}/{job_id}/
- Process items using original workflow’s agent template
- Successfully processed items are removed from DLQ
- Still-failing items remain in DLQ with updated attempt count
- New failures during retry are logged and added to DLQ
Viewing DLQ Contents
# List all failed items
prodigy dlq list <job_id>
# Show details for specific item
prodigy dlq show <job_id> <item_id>
# Clear DLQ after manual fixes
prodigy dlq clear <job_id>
Command Types
1. Shell Commands
# Simple shell command
- shell: "cargo test"
# With output capture
- shell: "ls -la | wc -l"
capture: "file_count"
# With failure handling
- shell: "cargo clippy"
on_failure:
claude: "/fix-warnings ${shell.output}"
# With timeout
- shell: "cargo bench"
timeout: 600 # seconds
# With conditional execution
- shell: "cargo build --release"
when: "${tests_passed}"
2. Claude Commands
# Simple Claude command
- claude: "/prodigy-analyze"
# With arguments
- claude: "/prodigy-implement-spec ${spec_file}"
# With commit requirement
- claude: "/prodigy-fix-bugs"
commit_required: true
# With output capture
- claude: "/prodigy-generate-plan"
capture: "implementation_plan"
3. Goal-Seeking Commands
Iteratively refine code until a validation threshold is met.
- goal_seek:
goal: "Achieve 90% test coverage"
claude: "/prodigy-coverage --improve"
validate: "cargo tarpaulin --print-summary | grep 'Coverage' | sed 's/.*Coverage=\\([0-9]*\\).*/score: \\1/'"
threshold: 90
max_attempts: 5
timeout_seconds: 300
fail_on_incomplete: true
commit_required: true
Fields:
goal
: Human-readable descriptionclaude
orshell
: Command to execute for refinementvalidate
: Command that outputsscore: N
(0-100)threshold
: Minimum score to consider completemax_attempts
: Maximum refinement iterationstimeout_seconds
: Optional timeout per attemptfail_on_incomplete
: Whether to fail workflow if threshold not met
4. Foreach Commands
Iterate over a list with optional parallelism.
- foreach:
input: "find . -name '*.rs' -type f" # Command
# OR
# input: ["file1.rs", "file2.rs"] # List
parallel: 5 # Number of parallel executions (or true/false)
do:
- claude: "/analyze-file ${item}"
- shell: "cargo check ${item}"
continue_on_error: true
max_items: 50
5. Validation Commands
Validate implementation completeness with automatic retry.
- claude: "/implement-auth-spec"
validate:
shell: "debtmap validate --spec auth.md --output result.json"
# DEPRECATED: 'command' field (use 'shell' instead)
result_file: "result.json"
threshold: 95 # Percentage completion required (default: 100.0)
timeout: 60
expected_schema: "validation-schema.json" # Optional JSON schema
# What to do if incomplete
on_incomplete:
claude: "/complete-implementation ${validation.gaps}"
max_attempts: 3
fail_workflow: true
commit_required: true
prompt: "Implementation incomplete. Continue?" # Optional interactive prompt
ValidationConfig Fields:
shell
orclaude
- Single validation command (useshell
, not deprecatedcommand
)commands
- Array of commands for multi-step validationresult_file
- Path to JSON file with validation resultsthreshold
- Minimum completion percentage (default: 100.0)timeout
- Timeout in secondsexpected_schema
- JSON schema for validation output structure
OnIncompleteConfig Fields:
shell
orclaude
- Single gap-filling commandcommands
- Array of commands for multi-step gap fillingmax_attempts
- Maximum retry attemptsfail_workflow
- Whether to fail workflow if validation incompletecommit_required
- Whether to require commit after gap fillingprompt
- Optional interactive prompt for user guidanceretry_original
- Whether to retry the original command (default: false). When true, re-executes the original command instead of gap-filling commandsstrategy
- Retry strategy configuration (similar to OnFailureConfig strategy)
Alternative: Array format for multi-step validation
- claude: "/implement-feature"
validate:
# When using array format, ValidationConfig uses default threshold (100.0)
# and creates a commands array
- shell: "run-tests.sh"
- shell: "check-coverage.sh"
- claude: "/validate-implementation --output validation.json"
result_file: "validation.json"
Alternative: Multi-step gap filling
- claude: "/implement-feature"
validate:
shell: "validate.sh"
result_file: "result.json"
on_incomplete:
commands:
- claude: "/analyze-gaps ${validation.gaps}"
- shell: "run-fix-script.sh"
- claude: "/verify-fixes"
max_attempts: 2
Alternative: Retry original command on incomplete
- claude: "/implement-auth"
validate:
shell: "validate-auth.sh"
result_file: "result.json"
threshold: 95
on_incomplete:
retry_original: true # Re-run "/implement-auth" instead of gap-filling
max_attempts: 3
fail_workflow: true
Command Reference
Command Fields
All command types support these common fields:
Field | Type | Description |
---|---|---|
id | string | Unique identifier for referencing outputs |
timeout | number | Command timeout in seconds |
commit_required | boolean | Whether command should create a git commit |
when | string | Conditional execution expression |
capture | string | Variable name to capture output (replaces deprecated capture_output: true/false ) |
capture_format | enum | Format: string (default), number , json , lines , boolean (see examples below) |
capture_streams | object | Configure which streams to capture (see CaptureStreams section below) |
on_success | object | Command to run on success |
on_failure | object | OnFailureConfig with nested command, max_attempts, fail_workflow, strategy |
on_exit_code | map | Maps exit codes to full WorkflowStep objects (e.g., 101: {claude: "/fix"} ) |
validate | object | Validation configuration |
output_file | string | Redirect command output to a file |
Note: The following fields exist in internal structs but are NOT exposed in WorkflowStepCommand YAML:
handler
- Internal HandlerStep (not user-facing)retry
- Internal RetryConfig (not user-facing)working_dir
- Not available (use shellcd
command instead)env
- Not available (use shell environment syntax:ENV=value command
)auto_commit
- Not in WorkflowStepCommandcommit_config
- Not in WorkflowStepCommandstep_validate
- Not in WorkflowStepCommandskip_validation
- Not in WorkflowStepCommandvalidation_timeout
- Not in WorkflowStepCommandignore_validation_failure
- Not in WorkflowStepCommand
CaptureStreams Configuration
The capture_streams
field controls which output streams are captured:
- shell: "cargo test"
capture: "test_results"
capture_streams:
stdout: true # Capture standard output (default: true)
stderr: false # Capture standard error (default: false)
exit_code: true # Capture exit code (default: true)
success: true # Capture success boolean (default: true)
duration: true # Capture execution duration (default: true)
Examples:
# Capture only stdout and stderr
- shell: "build.sh"
capture: "build_output"
capture_streams:
stdout: true
stderr: true
exit_code: false
success: false
duration: false
# Capture only timing information
- shell: "benchmark.sh"
capture: "bench_time"
capture_streams:
stdout: false
stderr: false
exit_code: false
success: false
duration: true
Capture Format Examples
The capture_format
field controls how captured output is parsed:
# String format (default) - raw text output
- shell: "git rev-parse HEAD"
capture: "commit_hash"
capture_format: "string"
# Number format - parses numeric output
- shell: "wc -l < file.txt"
capture: "line_count"
capture_format: "number"
# JSON format - parses JSON output
- shell: "cargo metadata --format-version 1"
capture: "project_metadata"
capture_format: "json"
# Lines format - splits output into array of lines
- shell: "git diff --name-only"
capture: "changed_files"
capture_format: "lines"
# Boolean format - true if command succeeds, false otherwise
- shell: "grep -q 'pattern' file.txt"
capture: "pattern_found"
capture_format: "boolean"
Deprecated Fields
These fields are deprecated but still supported for backward compatibility:
test:
- Useshell:
withon_failure:
insteadcommand:
in ValidationConfig - Useshell:
instead- Nested
commands:
inagent_template
andreduce
- Use direct array format instead - Legacy variable aliases (
$ARG
,$ARGUMENT
,$FILE
,$FILE_PATH
) - Use modern${item.*}
syntax
Migration: capture_output to capture
Old syntax (deprecated):
- shell: "ls -la | wc -l"
capture_output: true
New syntax (recommended):
- shell: "ls -la | wc -l"
capture: "file_count"
The modern capture
field allows you to specify a variable name, making output references clearer and more maintainable
Variable Interpolation
Overview
Prodigy provides two complementary variable systems:
- Built-in Variables: Automatically available based on workflow context (workflow state, step info, work items, etc.)
- Custom Captured Variables: User-defined variables created via the
capture:
field in commands
Both systems use the same ${variable.name}
interpolation syntax and can be freely mixed in your workflows.
Variable Availability by Phase
Variable Category | Setup | Map | Reduce | Merge |
---|---|---|---|---|
Standard Variables | ✓ | ✓ | ✓ | ✓ |
Output Variables | ✓ | ✓ | ✓ | ✓ |
Item Variables (${item.*} ) | ✗ | ✓ | ✗ | ✗ |
Map Aggregation (${map.total} , etc.) | ✗ | ✗ | ✓ | ✗ |
Merge Variables | ✗ | ✗ | ✗ | ✓ |
Custom Captured Variables | ✓ | ✓ | ✓ | ✓ |
Available Variables
Standard Variables
${workflow.name}
- Workflow name${workflow.id}
- Workflow unique identifier${workflow.iteration}
- Current iteration number${step.name}
- Current step name${step.index}
- Current step index${step.files_changed}
- Files changed in current step${workflow.files_changed}
- All files changed in workflow
Output Variables
Primary Output Variables:
${shell.output}
- Output (stdout) from last shell command${claude.output}
- Output from last Claude command${last.output}
- Output from last executed command (any type)${last.exit_code}
- Exit code from last command
Note: ${shell.output}
is the correct variable name for shell command output. The code uses shell.output
, not shell.stdout
.
Legacy/Specialized Output Variables:
${handler.output}
- Output from handler command (used in error handling)${test.output}
- Output from test command (used in validation)${goal_seek.output}
- Output from goal-seeking command
Best Practice: For most workflows, use custom capture variables (via capture:
field) instead of relying on these automatic output variables. This provides explicit naming and better readability.
MapReduce Variables
Map Phase Variables (available in agent_template:
commands):
${item}
- Current work item in map phase (scope: map phase only)${item.value}
- Value of current item (for simple items)${item.path}
- Path field of current item${item.name}
- Name field of current item${item.*}
- Access any item field using wildcard pattern (e.g.,${item.id}
,${item.priority}
)${item_index}
- Index of current item in the list${item_total}
- Total number of items being processed${map.key}
- Current map key${worker.id}
- ID of the current worker agent
Reduce Phase Variables (available in reduce:
commands):
${map.total}
- Total items processed across all map agents${map.successful}
- Number of successfully processed items${map.failed}
- Number of failed items${map.results}
- Aggregated results from all map agents (JSON array)
Note: ${item}
and related item variables are only available within the map phase. The aggregation variables (${map.total}
, ${map.successful}
, ${map.failed}
, ${map.results}
) are only available in the reduce phase.
Merge Variables
${merge.worktree}
- Worktree name${merge.source_branch}
- Source branch${merge.target_branch}
- Target branch${merge.session_id}
- Session ID
Validation Variables
${validation.completion}
- Completion percentage${validation.completion_percentage}
- Completion percentage (numeric)${validation.implemented}
- List of implemented features${validation.missing}
- Missing requirements${validation.gaps}
- Gap details${validation.status}
- Status (complete/incomplete/failed)
Git Context Variables
${step.commits}
- Commits in current step (array of commit objects)${workflow.commits}
- All workflow commits (array of commit objects)
Note: These are arrays of commit data. Use in foreach loops or access individual commits with array indexing. Each commit object contains fields like hash, message, timestamp, etc.
Legacy Variable Aliases
These legacy aliases are supported for backward compatibility but should be replaced with modern equivalents:
$ARG
/$ARGUMENT
- Legacy aliases for${item.value}
(available in WithArguments mode)$FILE
/$FILE_PATH
- Legacy aliases for${item.path}
(available in WithFilePattern mode)
Note: Use the modern ${item.*}
syntax in new workflows instead of legacy aliases.
Custom Variable Capture
Custom capture variables allow you to save command output with explicit names for later use. This is the recommended approach for most workflows instead of relying on automatic output variables.
Basic Capture Examples
# Capture to custom variable
- shell: "ls -la | wc -l"
capture: "file_count"
capture_format: number # Default: string
# Use in next command
- shell: "echo 'Found ${file_count} files'"
Capture Formats
The capture_format
field determines how output is parsed and stored:
# String format (default) - stores raw output
- shell: "echo 'Hello World'"
capture: "greeting"
capture_format: string
# Access: ${greeting} → "Hello World"
# Number format - parses numeric output
- shell: "echo 42"
capture: "answer"
capture_format: number
# Access: ${answer} → 42 (as number, not string)
# Boolean format - converts to true/false
- shell: "[ -f README.md ] && echo true || echo false"
capture: "has_readme"
capture_format: boolean
# Access: ${has_readme} → true or false
# JSON format - parses JSON output
- shell: "echo '{\"name\": \"project\", \"version\": \"1.0\"}'"
capture: "package_info"
capture_format: json
# Access nested fields: ${package_info.name} → "project"
# Access nested fields: ${package_info.version} → "1.0"
# Lines format - splits into array by newlines
- shell: "ls *.md"
capture: "markdown_files"
capture_format: lines
# Access: ${markdown_files} → array of filenames
Capture Streams
Control which output streams to capture (useful for detailed command analysis):
# Capture specific streams
- shell: "cargo test 2>&1"
capture: "test_results"
capture_streams:
stdout: true # Default: true
stderr: true # Default: false
exit_code: true # Default: true
success: true # Default: true
duration: true # Default: true
# Access captured stream data
- shell: "echo 'Exit code: ${test_results.exit_code}'"
- shell: "echo 'Success: ${test_results.success}'"
- shell: "echo 'Duration: ${test_results.duration}s'"
Default Behavior: By default, stdout
, exit_code
, success
, and duration
are captured (all true
). Set stderr: true
to also capture error output.
Nested JSON Field Access
For JSON-formatted captures, use dot notation to access nested fields:
# Example: API response with nested data
- shell: "curl -s https://api.example.com/user/123"
capture: "user"
capture_format: json
# Access nested fields with dot notation
- shell: "echo 'Name: ${user.profile.name}'"
- shell: "echo 'Email: ${user.contact.email}'"
- shell: "echo 'City: ${user.address.city}'"
Variable Scope and Precedence
Variables follow a parent/child scope hierarchy:
- Local Scope: Variables defined in the current command block
- Parent Scope: Variables from enclosing blocks (foreach, map phase, etc.)
- Built-in Variables: Standard workflow context variables
Precedence: Local variables override parent scope variables, which override built-in variables.
# Parent scope
- shell: "echo 'outer'"
capture: "message"
# Child scope (foreach creates new scope)
- foreach:
items: [1, 2, 3]
commands:
# This creates a local 'message' that shadows parent
- shell: "echo 'inner-${item}'"
capture: "message"
- shell: "echo ${message}" # Uses local 'message'
# After foreach, parent 'message' is still accessible
- shell: "echo ${message}" # Uses parent 'message' → "outer"
Environment Configuration
Prodigy provides flexible environment configuration for workflows, allowing you to manage environment variables, secrets, profiles, and step-specific settings. This chapter explains the user-facing configuration options available in workflow YAML files.
Architecture Overview
Prodigy uses a two-layer architecture for environment management:
- WorkflowConfig: User-facing YAML configuration with
env
,secrets
,profiles
, andenv_files
fields - EnvironmentConfig: Internal runtime configuration that extends workflow config with additional features
This chapter documents the user-facing WorkflowConfig layer - what you write in your workflow YAML files.
Note on Internal Features: The EnvironmentConfig
runtime layer includes a StepEnvironment
struct with fields like env
, working_dir
, clear_env
, and temporary
. These are internal implementation details not exposed in WorkflowStepCommand
YAML syntax. Per-command environment changes must use shell syntax (e.g., ENV=value command
).
Global Environment Variables
Define static environment variables that apply to all commands in your workflow:
# Global environment variables (static strings only)
env:
NODE_ENV: production
PORT: "3000"
API_URL: https://api.example.com
DEBUG: "false"
commands:
- shell: "echo $NODE_ENV" # Uses global environment
Important: The env
field at the workflow level only supports static string values. Dynamic or conditional environment variables are handled internally by the runtime but are not directly exposed in workflow YAML.
Environment Inheritance: Parent process environment variables are always inherited by default. All global environment variables are merged with the parent environment.
MapReduce Environment Variables
In MapReduce workflows, environment variables provide powerful parameterization across all phases (setup, map, reduce, and merge). This enables workflows to be reusable across different projects and configurations.
Defining Environment Variables in MapReduce
Environment variables for MapReduce workflows follow the same global env
field structure:
name: mapreduce-workflow
mode: mapreduce
env:
# Plain variables for parameterization
PROJECT_NAME: "prodigy"
PROJECT_CONFIG: "prodigy.yml"
FEATURES_PATH: "specs/"
OUTPUT_DIR: "results"
# Workflow-specific settings
MAX_RETRIES: "3"
TIMEOUT_SECONDS: "300"
setup:
- shell: "echo Starting $PROJECT_NAME workflow"
- shell: "mkdir -p $OUTPUT_DIR"
map:
input: "${FEATURES_PATH}/items.json"
agent_template:
- claude: "/process '${item.name}' --config $PROJECT_CONFIG"
- shell: "test -f $PROJECT_NAME/${item.path}"
reduce:
- shell: "echo Processed ${map.total} items for $PROJECT_NAME"
- shell: "cp results.json $OUTPUT_DIR/"
merge:
commands:
- shell: "echo Merging $PROJECT_NAME changes"
Variable Interpolation in MapReduce
MapReduce workflows support two interpolation syntaxes:
$VAR
- Shell-style variable reference${VAR}
- Bracketed reference (recommended for clarity)
Both syntaxes work throughout all workflow phases:
env:
PROJECT_NAME: "myproject"
CONFIG_FILE: "config.yml"
setup:
- shell: "echo $PROJECT_NAME" # Shell-style
- shell: "echo ${PROJECT_NAME}" # Bracketed
- shell: "test -f ${CONFIG_FILE}" # Recommended in paths
map:
agent_template:
- claude: "/analyze '${item}' --project $PROJECT_NAME"
Environment Variables in All MapReduce Phases
Setup Phase
Environment variables are available for initialization:
env:
WORKSPACE_DIR: "/tmp/workspace"
INPUT_SOURCE: "https://api.example.com/data"
setup:
- shell: "mkdir -p $WORKSPACE_DIR"
- shell: "curl $INPUT_SOURCE -o items.json"
- shell: "echo Setup complete for ${WORKSPACE_DIR}"
Map Phase
Variables are available in agent templates:
env:
PROJECT_ROOT: "/path/to/project"
CONFIG_PATH: "config/settings.yml"
map:
agent_template:
- claude: "/analyze ${item.file} --config $CONFIG_PATH"
- shell: "test -f $PROJECT_ROOT/${item.file}"
- shell: "cp ${item.file} $OUTPUT_DIR/"
Reduce Phase
Variables work in aggregation commands:
env:
PROJECT_NAME: "myproject"
REPORT_DIR: "reports"
reduce:
- claude: "/summarize ${map.results} --project $PROJECT_NAME"
- shell: "mkdir -p $REPORT_DIR"
- shell: "cp summary.json $REPORT_DIR/${PROJECT_NAME}-summary.json"
Merge Phase
Variables are available during merge operations:
env:
PROJECT_NAME: "myproject"
NOTIFY_WEBHOOK: "https://hooks.example.com/notify"
merge:
commands:
- shell: "echo Merging $PROJECT_NAME changes"
- claude: "/validate-merge --branch ${merge.source_branch}"
- shell: "curl -X POST $NOTIFY_WEBHOOK -d 'project=$PROJECT_NAME'"
Secrets in MapReduce
Sensitive data can be marked as secrets to enable automatic masking:
env:
PROJECT_NAME: "myproject"
secrets:
API_TOKEN:
provider: env
key: "GITHUB_TOKEN"
WEBHOOK_SECRET:
provider: file
key: "~/.secrets/webhook.key"
setup:
- shell: "curl -H 'Authorization: Bearer $API_TOKEN' https://api.github.com/repos"
map:
agent_template:
- shell: "curl -X POST $WEBHOOK_URL -H 'X-Secret: $WEBHOOK_SECRET'"
Secrets are automatically masked in:
- Command output logs
- Error messages
- Event logs
- Checkpoint files
Profile Support in MapReduce
Profiles enable different configurations for different environments:
env:
PROJECT_NAME: "myproject"
profiles:
development:
description: "Development environment"
API_URL: "http://localhost:3000"
DEBUG: "true"
TIMEOUT_SECONDS: "60"
production:
description: "Production environment"
API_URL: "https://api.example.com"
DEBUG: "false"
TIMEOUT_SECONDS: "300"
map:
agent_template:
- shell: "curl $API_URL/data"
- shell: "timeout ${TIMEOUT_SECONDS}s ./process.sh"
Activate a profile:
prodigy run workflow.yml --profile production
Reusable Workflows with Environment Variables
Environment variables enable the same workflow to work for different projects:
# This workflow works for both Prodigy and Debtmap
name: check-book-docs-drift
mode: mapreduce
env:
# Override these when running the workflow
PROJECT_NAME: "prodigy" # or "debtmap"
PROJECT_CONFIG: "prodigy.yml" # or "debtmap.yml"
FEATURES_PATH: "specs/" # or "features/"
setup:
- shell: "echo Checking $PROJECT_NAME documentation"
- shell: "./${PROJECT_NAME} generate-book-items --output items.json"
map:
input: "items.json"
agent_template:
- claude: "/check-drift '${item}' --config $PROJECT_CONFIG"
- shell: "git diff --exit-code ${item.doc_path}"
reduce:
- claude: "/summarize-drift ${map.results} --project $PROJECT_NAME"
Run for different projects:
# For Prodigy
prodigy run workflow.yml
# For Debtmap
env PROJECT_NAME=debtmap PROJECT_CONFIG=debtmap.yml FEATURES_PATH=features/ \
prodigy run workflow.yml
Best Practices for MapReduce Environment Variables
-
Parameterize project-specific values:
env: PROJECT_NAME: "myproject" PROJECT_ROOT: "/workspace" CONFIG_FILE: "config.yml"
-
Use consistent naming:
- Use UPPER_CASE for environment variables
- Use descriptive names (PROJECT_NAME not PN)
- Group related variables with prefixes (AWS_, DB_)
-
Document variables:
env: # Project identifier used in logs and reports PROJECT_NAME: "prodigy" # Path to project configuration file PROJECT_CONFIG: "prodigy.yml" # Maximum concurrent agents (tune based on resources) MAX_PARALLEL: "10"
-
Use secrets for sensitive data:
secrets: GITHUB_TOKEN: provider: env key: "GH_TOKEN"
-
Prefer
${VAR}
syntax:# Good - explicit and safe - shell: "test -f ${CONFIG_PATH}" # Risky - may fail with special characters - shell: "test -f $CONFIG_PATH"
Environment Files
Load environment variables from .env
files:
# Environment files to load
env_files:
- .env
- .env.local
- config/.env.production
commands:
- shell: "echo $DATABASE_URL"
Environment File Format:
Environment files use the standard .env
format with KEY=value
pairs:
# .env file example
DATABASE_URL=postgresql://localhost:5432/mydb
REDIS_HOST=localhost
REDIS_PORT=6379
# Comments are supported
API_KEY=secret-key-here
# Multi-line values use quotes
PRIVATE_KEY="-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBg...
-----END PRIVATE KEY-----"
Loading Order and Precedence:
- Files are loaded in the order specified in
env_files
- Later files override earlier files
- Step-level
env
overrides environment files - Global
env
overrides environment files
Secrets Management
Store sensitive values securely using secret providers:
secrets:
# Provider-based secrets (recommended)
AWS_SECRET:
provider: aws
key: "my-app/api-key"
VAULT_SECRET:
provider: vault
key: "secret/data/myapp"
version: "v2" # Optional version
# Environment variable reference
API_KEY:
provider: env
key: "SECRET_API_KEY"
# File-based secret
DB_PASSWORD:
provider: file
key: "~/.secrets/db.pass"
# Custom provider (extensible)
CUSTOM_SECRET:
provider:
custom: "my-custom-provider"
key: "secret-id"
commands:
- shell: "echo $API_KEY" # Secrets are available as environment variables
Supported Secret Providers:
env
- Reference another environment variablefile
- Read secret from a filevault
- HashiCorp Vault integration (requires Vault setup)aws
- AWS Secrets Manager (requires AWS credentials)custom
- Custom provider (extensible for your own secret backends)
Security Notes:
- Secrets are masked in logs and output
- Secret values are only resolved at runtime
- Use secrets for API keys, passwords, tokens, and other sensitive data
Environment Profiles
Define named environment configurations for different contexts:
# Define profiles with environment variables
profiles:
development:
description: "Development environment with debug enabled"
NODE_ENV: development
DEBUG: "true"
API_URL: http://localhost:3000
production:
description: "Production environment configuration"
NODE_ENV: production
DEBUG: "false"
API_URL: https://api.example.com
# Global environment still applies
env:
APP_NAME: "my-app"
commands:
- shell: "npm run build"
Profile Structure:
Profiles use a flat structure where environment variables are defined directly at the profile level (not nested under an env:
key). The description
field is optional and helps document the profile’s purpose.
profiles:
staging:
description: "Staging environment" # Optional
NODE_ENV: staging # Direct key-value pairs
API_URL: https://staging.api.com
DEBUG: "true"
Note: Profile activation is managed internally by the runtime environment manager. The selection mechanism is not currently exposed in WorkflowConfig YAML. Profiles are defined for future use and internal runtime configuration.
Per-Command Environment Overrides
While WorkflowStepCommand does not support a dedicated env
field, you can override environment variables for individual commands using shell syntax:
env:
RUST_LOG: info
API_URL: "https://api.example.com"
# Steps go directly in the workflow
- shell: "cargo run" # Uses RUST_LOG=info from global env
# Override environment for this command only using shell syntax
- shell: "RUST_LOG=debug cargo run --verbose"
# Change directory and set environment in shell
- shell: "cd frontend && PATH=./node_modules/.bin:$PATH npm run build"
Shell-based Environment Techniques:
- Single variable override:
ENV_VAR=value command
- Multiple variables:
VAR1=value1 VAR2=value2 command
- Change directory:
cd path && command
- Combine both:
cd path && ENV_VAR=value command
Note: A StepEnvironment
struct exists in the internal runtime (EnvironmentConfig
), but it is not currently exposed in the WorkflowStepCommand YAML syntax. All per-command environment changes must use shell syntax as shown above.
Environment Precedence
Environment variables are resolved in the following order (highest to lowest precedence):
- Active profile - If a profile is activated (internal runtime feature)
- Global
env
- Defined at workflow level in WorkflowConfig - Environment files - Loaded from
env_files
(in order) - Parent environment - Always inherited from the parent process
Example demonstrating precedence:
# Parent environment: NODE_ENV=local
env_files:
- .env # Contains: NODE_ENV=development
env:
NODE_ENV: production # Overrides .env file
profiles:
test:
NODE_ENV: test # Would override global env if profile activation were exposed
description: "Test environment profile"
# Steps go directly in the workflow
- shell: "echo $NODE_ENV" # Prints: production (from global env)
# Override using shell syntax
- shell: "NODE_ENV=staging echo $NODE_ENV" # Prints: staging
Note: Profile activation is currently managed internally and not exposed in WorkflowConfig YAML.
Best Practices
1. Use Environment Files for Configuration
Store configuration in .env
files instead of hardcoding in YAML:
# Good: Load from files
env_files:
- .env
- .env.${ENVIRONMENT}
# Avoid: Hardcoding sensitive values
env:
API_KEY: "hardcoded-key-here" # Don't do this!
2. Use Secrets for Sensitive Data
Always use the secrets
field for sensitive information:
# Good: Use secrets provider
secrets:
DATABASE_PASSWORD:
provider: vault
key: "db/password"
# Bad: Sensitive data in plain env
env:
DATABASE_PASSWORD: "my-password" # Don't do this!
3. Leverage Profiles for Environments
Define profiles for different deployment environments:
profiles:
development:
NODE_ENV: development
LOG_LEVEL: debug
API_URL: http://localhost:3000
production:
NODE_ENV: production
LOG_LEVEL: error
API_URL: https://api.example.com
4. Use Shell Syntax for Command-Specific Overrides
Override global settings for specific commands using shell environment syntax:
env:
RUST_LOG: info
# Steps go directly in the workflow
# Most commands use info level
- shell: "cargo run"
# Override for this specific command using shell syntax
- shell: "RUST_LOG=debug cargo run --verbose"
5. Document Your Environment Variables
Add comments to explain environment variables:
env:
# Number of worker threads (adjust based on CPU cores)
WORKER_COUNT: "4"
# API rate limit (requests per minute)
RATE_LIMIT: "1000"
# Feature flags
ENABLE_BETA_FEATURES: "false"
Common Patterns
Multi-Environment Workflows
# Load environment-specific configuration
env_files:
- .env.${ENVIRONMENT}
env:
APP_NAME: "my-app"
commands:
- shell: "npm run deploy"
Secrets with Fallbacks
secrets:
# Try Vault first, fall back to environment variable
API_KEY:
provider: vault
key: "api/key"
env:
# Fallback for local development
API_KEY: "${API_KEY:-default-key}"
Build Matrix with Profiles
profiles:
debug:
CARGO_PROFILE: debug
RUST_BACKTRACE: "1"
release:
CARGO_PROFILE: release
RUST_BACKTRACE: "0"
commands:
- shell: "cargo build --profile ${CARGO_PROFILE}"
Temporary Environment Changes
# Steps go directly in the workflow
# Set PATH for this command only using shell syntax
- shell: "cd frontend && PATH=./node_modules/.bin:$PATH ./node_modules/.bin/webpack"
# PATH is back to normal for subsequent commands
- shell: "echo $PATH"
Configuration
Note: This chapter is currently being auto-generated by the book-docs-drift workflow. Content will be populated based on the latest codebase implementation.
Overview
Prodigy can be configured through multiple configuration files with a clear precedence hierarchy.
Topics to be Documented
- Configuration file locations and search order
- Configuration precedence rules
- Claude-specific settings (model, tokens)
- Worktree settings (parallelism, cleanup)
- Storage paths (events, state, DLQ)
- Global retry defaults
- Environment-specific configurations
This chapter will be automatically updated by running the book documentation workflow.
Advanced Features
This chapter covers advanced workflow features for building sophisticated automation pipelines. These features enable conditional execution, parallel processing, validation, and complex control flow.
Conditional Execution
Control when commands execute based on expressions or previous command results.
Expression-Based Conditions
Use the when
field to conditionally execute commands based on variable values:
# Execute only when variable is true
- shell: "cargo build --release"
when: "${tests_passed}"
# Execute based on complex expression
- shell: "deploy.sh"
when: "${environment == 'production' && tests_passed}"
On Success Handlers
Execute follow-up commands when a command succeeds:
- shell: "cargo test"
on_success:
shell: "cargo bench"
On Failure Handlers
Handle failures with automatic remediation:
- shell: "cargo clippy"
on_failure:
claude: "/fix-warnings"
max_attempts: 3
fail_workflow: false
The on_failure
configuration supports:
max_attempts
: Maximum retry attempts (default: 1)fail_workflow
: Whether to fail entire workflow on final failure (default: true)
Nested Conditionals
Chain multiple levels of conditional execution:
- shell: "cargo check"
on_success:
shell: "cargo build --release"
on_success:
shell: "cargo test --release"
on_failure:
claude: "/debug-failures '${shell.output}'"
Output Capture and Variable Management
Capture command output in different formats for use in subsequent steps.
Capture Variable
Capture output to a named variable using the capture_output
field:
# Capture as string (backward compatible)
- shell: "git rev-parse HEAD"
capture_output: "commit_hash"
# Reference in later steps
- shell: "echo 'Commit: ${commit_hash}'"
Capture Formats
Control how output is parsed with capture_format
:
# String (default) - trimmed output as single string
- shell: "git rev-parse HEAD"
capture_output: "commit_hash"
capture_format: "string"
# Number - parse output as number
- shell: "wc -l < file.txt"
capture_output: "line_count"
capture_format: "number"
# JSON - parse output as JSON object
- shell: "cargo metadata --format-version 1"
capture_output: "metadata"
capture_format: "json"
# Lines - split output into array of lines
- shell: "find . -name '*.rs'"
capture_output: "rust_files"
capture_format: "lines"
# Boolean - parse "true"/"false" as boolean
- shell: "test -f README.md && echo true || echo false"
capture_output: "has_readme"
capture_format: "boolean"
Stream Capture Control
Control which streams to capture using capture_streams
:
# Capture only stdout (default)
- shell: "cargo build"
capture_output: "build_log"
capture_streams: "stdout"
# Capture only stderr
- shell: "cargo test"
capture_output: "errors"
capture_streams: "stderr"
# Capture both streams
- shell: "npm install"
capture_output: "full_output"
capture_streams: "both"
Output File Redirection
Write command output directly to a file:
# Redirect output to file
- shell: "cargo test --verbose"
output_file: "test-results.txt"
# File is written to working directory
# Can be combined with capture_output to save and use output
Step Identification
Assign unique IDs to steps for referencing their outputs:
- shell: "cargo test"
id: "test-step"
capture_output: "test_results"
# Reference step output by ID
- shell: "echo 'Tests: ${test-step.output}'"
Timeout Configuration
Set execution timeouts at the command level:
# Command-level timeout (in seconds)
- shell: "cargo bench"
timeout: 600 # 10 minutes
# Timeout for long-running operations
- claude: "/analyze-codebase"
timeout: 1800 # 30 minutes
Note: Timeouts are only supported at the individual command level, not for MapReduce agents.
Implementation Validation
Validate that implementations meet requirements using the validate
field.
Basic Validation
Run validation commands after a step completes:
- claude: "/implement-feature"
validate:
shell: "cargo test"
threshold: 100 # Require 100% completion
Validation with Claude
Use Claude to validate implementation quality:
- shell: "generate-code.sh"
validate:
claude: "/verify-implementation"
threshold: 95
Multi-Step Validation
Run multiple validation commands in sequence:
- claude: "/refactor"
validate:
commands:
- shell: "cargo test"
- shell: "cargo clippy"
- shell: "cargo fmt --check"
threshold: 100
Validation with Result Files
Read validation results from a file instead of stdout:
- claude: "/implement-feature"
validate:
shell: "run-validator.sh"
result_file: "validation-results.json"
threshold: 95
Handling Incomplete Implementations
Automatically remediate when validation fails:
- claude: "/implement-spec"
validate:
shell: "check-completeness.sh"
threshold: 100
on_incomplete:
claude: "/fill-gaps"
max_attempts: 3
fail_workflow: true
The on_incomplete
configuration supports:
claude
: Claude command to execute for gap-fillingshell
: Shell command to execute for gap-fillingcommands
: Array of commands to executemax_attempts
: Maximum remediation attempts (default: 1)fail_workflow
: Whether to fail workflow if remediation fails (default: true)commit_required
: Whether remediation command should create a commit (default: false)
Parallel Iteration with Foreach
Process multiple items in parallel using the foreach
command.
Basic Foreach
Iterate over a list of items:
- foreach:
foreach: ["a", "b", "c"]
do:
- shell: "process ${item}"
Dynamic Item Lists
Generate items from a command:
- foreach:
foreach: "find . -name '*.rs'"
do:
- shell: "rustfmt ${item}"
Parallel Execution
Control parallelism with the parallel
field:
- foreach:
foreach: "ls *.txt"
parallel: 5 # Process 5 items concurrently
do:
- shell: "analyze ${item}"
Error Handling
Continue processing remaining items on failure:
- foreach:
foreach: ["test1", "test2", "test3"]
continue_on_error: true
do:
- shell: "run-test ${item}"
Limiting Items
Process only a subset of items:
- foreach:
foreach: "find . -name '*.log'"
max_items: 10 # Process first 10 items only
do:
- shell: "compress ${item}"
Goal-Seeking Operations
Iteratively refine implementations until they meet validation criteria.
Basic Goal Seek
Define a goal and validation command:
- goal_seek:
goal: "All tests pass"
command: "cargo fix"
validate: "cargo test"
threshold: 100
The goal-seeking operation will:
- Run the command
- Run the validation
- Retry if validation threshold not met
- Stop when goal achieved or max attempts reached
Advanced Goal Seek Configuration
Control iteration behavior:
- goal_seek:
goal: "Code passes all quality checks"
command: "auto-fix.sh"
validate: "quality-check.sh"
threshold: 95
max_attempts: 5
timeout: 300
fail_on_incomplete: true
Best Practices
1. Use Meaningful Variable Names
# Good
- shell: "cargo test --format json"
capture_output: "test_results"
capture_format: "json"
# Avoid
- shell: "cargo test --format json"
capture_output: "x"
2. Set Appropriate Timeouts
# Set timeouts for potentially long-running operations
- shell: "npm install"
timeout: 300
- claude: "/analyze-large-codebase"
timeout: 1800
3. Handle Failures Gracefully
# Provide automatic remediation
- shell: "cargo test"
on_failure:
claude: "/fix-failing-tests"
max_attempts: 2
fail_workflow: true
4. Validate Critical Changes
# Ensure implementation meets requirements
- claude: "/implement-feature"
validate:
commands:
- shell: "cargo test"
- shell: "cargo clippy -- -D warnings"
threshold: 100
on_incomplete:
claude: "/fix-issues"
max_attempts: 3
5. Use Step IDs for Complex Workflows
# Make output references explicit
- shell: "git diff --stat"
id: "git-changes"
capture_output: "diff"
- claude: "/review-changes '${git-changes.output}'"
id: "code-review"
Common Patterns
Test-Fix-Verify Loop
- shell: "cargo test"
on_failure:
claude: "/fix-tests"
on_success:
shell: "cargo test --release"
Parallel Processing with Aggregation
- foreach:
foreach: "find src -name '*.rs'"
parallel: 10
do:
- shell: "analyze-file ${item}"
capture_output: "analysis_${item}"
- shell: "aggregate-results.sh"
Gradual Quality Improvement
- goal_seek:
goal: "Code quality score above 90"
command: "auto-improve.sh"
validate: "quality-check.sh"
threshold: 90
max_attempts: 5
on_success:
shell: "git commit -m 'Improved code quality'"
Conditional Deployment
- shell: "cargo test"
capture_output: "test_results"
capture_format: "json"
- shell: "deploy.sh"
when: "${test_results.passed == test_results.total}"
on_success:
shell: "notify-success.sh"
on_failure:
shell: "rollback.sh"
Multi-Stage Validation
- claude: "/implement-feature"
validate:
commands:
- shell: "cargo build"
- shell: "cargo test"
- shell: "cargo clippy"
- shell: "cargo fmt --check"
threshold: 100
on_incomplete:
commands:
- claude: "/fix-build-errors"
- shell: "cargo fmt"
max_attempts: 3
fail_workflow: true
Workflow Composition
Note: This chapter is currently being auto-generated by the book-docs-drift workflow. Content will be populated based on the latest codebase implementation.
Overview
Prodigy supports composing complex workflows from reusable templates and importing shared configurations.
Topics to be Documented
- Importing workflows with path and alias
- Defining reusable templates with parameters
- Extending base workflows
- Using templates with parameter substitution
- Template parameter type validation
This chapter will be automatically updated by running the book documentation workflow.
Retry Configuration
Note: This chapter is currently being auto-generated by the book-docs-drift workflow. Content will be populated based on the latest codebase implementation.
Overview
Prodigy provides sophisticated retry mechanisms with multiple backoff strategies to handle transient failures gracefully.
Topics to be Documented
- Workflow-level retry defaults
- Per-command retry configuration
- Backoff strategies (exponential, fibonacci, linear, fixed)
- Retry budget and time limits
- Conditional retry with error type filtering
- Jitter for distributed systems
This chapter will be automatically updated by running the book documentation workflow.
Error Handling
Prodigy provides comprehensive error handling at both the workflow level (for MapReduce jobs) and the command level (for individual workflow steps). This chapter covers the practical features available for handling failures gracefully.
Command-Level Error Handling
Command-level error handling allows you to specify what happens when a single workflow step fails. Use the on_failure
configuration to define recovery, cleanup, or fallback strategies.
Simple Forms
For basic error handling, use the simplest form that meets your needs:
# Ignore errors - don't fail the workflow
- shell: "optional-cleanup.sh"
on_failure: true
# Single recovery command (shell or claude)
- shell: "npm install"
on_failure: "npm cache clean --force"
- shell: "cargo clippy"
on_failure: "/fix-warnings"
# Multiple recovery commands
- shell: "build-project"
on_failure:
- "cleanup-artifacts"
- "/diagnose-build-errors"
- "retry-build"
Advanced Configuration
For more control over error handling behavior:
- shell: "cargo clippy"
on_failure:
claude: "/fix-warnings ${shell.output}"
fail_workflow: false # Continue workflow even if handler fails
max_attempts: 3 # Retry original command up to 3 times (alias: max_retries)
Available Fields:
shell
- Shell command to run on failureclaude
- Claude command to run on failurefail_workflow
- Whether to fail the entire workflow (default:false
)max_attempts
- Maximum retry attempts for the original command (default:1
, alias:max_retries
)
Notes:
- If
max_attempts > 1
, Prodigy will retry the original command after running the failure handler - You can specify both
shell
andclaude
commands - they will execute in sequence - By default, having a handler means the workflow continues even if the step fails
Detailed Handler Configuration
For complex error handling scenarios with multiple commands and fine-grained control:
- shell: "deploy-production"
on_failure:
strategy: recovery # Options: recovery, fallback, cleanup, custom
timeout: 300 # Handler timeout in seconds
handler_failure_fatal: true # Fail workflow if handler fails
fail_workflow: false # Don't fail workflow if step fails
commands:
- shell: "rollback-deployment"
continue_on_error: true
- claude: "/analyze-deployment-failure"
- shell: "notify-team"
Handler Strategies:
recovery
- Try to fix the problem and retry (default)fallback
- Use an alternative approachcleanup
- Clean up resourcescustom
- Custom handler logic
Handler Command Fields:
shell
orclaude
- The command to executecontinue_on_error
- Continue to next handler command even if this fails
Success Handling
Execute commands when a step succeeds:
- shell: "deploy-staging"
on_success:
shell: "notify-success"
claude: "/update-deployment-docs"
Commit Requirements
Specify whether a workflow step must create a git commit:
- claude: "/implement-feature"
commit_required: true # Fail if no commit is made
This is useful for ensuring that Claude commands that are expected to make code changes actually do so.
Workflow-Level Error Policy (MapReduce)
For MapReduce workflows, you can configure workflow-level error policies that control how the entire job responds to failures. This is separate from command-level error handling and only applies to MapReduce mode.
Basic Configuration
name: process-items
mode: mapreduce
error_policy:
# What to do when a work item fails
on_item_failure: dlq # Options: dlq, retry, skip, stop, custom:<handler_name>
# Continue processing after failures
continue_on_failure: true
# Stop after this many failures
max_failures: 10
# Stop if failure rate exceeds threshold (0.0 to 1.0)
failure_threshold: 0.2 # Stop if 20% of items fail
# How to report errors
error_collection: aggregate # Options: aggregate, immediate, batched
Item Failure Actions:
dlq
- Send failed items to Dead Letter Queue for later retry (default)retry
- Retry the item immediately with backoff (if retry_config is set)skip
- Skip the failed item and continuestop
- Stop the entire workflow on first failurecustom:<name>
- Use a custom failure handler (not yet implemented)
Error Collection Strategies:
aggregate
- Collect all errors and report at the end (default)immediate
- Report errors as they occurbatched: {size: N}
- Report errors in batches of N items
Circuit Breaker
Prevent cascading failures by opening a circuit after consecutive failures:
error_policy:
circuit_breaker:
failure_threshold: 5 # Open circuit after 5 consecutive failures
success_threshold: 2 # Close circuit after 2 successes
timeout: 30s # Time before attempting half-open state
half_open_requests: 3 # Test requests in half-open state
Note: Use duration format for timeout (e.g., 30s
, 1m
, 500ms
)
Retry Configuration with Backoff
Configure automatic retry behavior for failed items:
error_policy:
on_item_failure: retry
retry_config:
max_attempts: 3
backoff:
type: exponential
initial: 1s # Initial delay (duration format)
multiplier: 2 # Double delay each retry
Backoff Strategy Options:
# Fixed delay between retries
backoff:
type: fixed
delay: 1s
# Linear increase in delay
backoff:
type: linear
initial: 1s
increment: 500ms
# Exponential backoff (recommended)
backoff:
type: exponential
initial: 1s
multiplier: 2
# Fibonacci sequence delays
backoff:
type: fibonacci
initial: 1s
Important: All duration values use humantime format (e.g., 1s
, 100ms
, 2m
, 30s
), not milliseconds.
Error Metrics
Prodigy automatically tracks error metrics for MapReduce jobs:
- Counts: total_items, successful, failed, skipped
- Rates: failure_rate (0.0 to 1.0)
- Patterns: Detects recurring error types with suggested remediation
- Error types: Frequency of each error category
Access metrics during execution or after completion to understand job health.
Dead Letter Queue (DLQ)
The Dead Letter Queue stores failed work items from MapReduce jobs for later retry or analysis. This is only available for MapReduce workflows, not regular workflows.
Sending Items to DLQ
Configure your MapReduce workflow to use DLQ:
mode: mapreduce
error_policy:
on_item_failure: dlq
Failed items are automatically sent to the DLQ with:
- Original work item data
- Failure reason and error message
- Timestamp of failure
- Attempt history
Retrying Failed Items
Use the CLI to retry failed items:
# Retry all failed items for a job
prodigy dlq retry <job_id>
# Retry with custom parallelism (default: 5)
prodigy dlq retry <job_id> --max-parallel 10
# Dry run to see what would be retried
prodigy dlq retry <job_id> --dry-run
DLQ Retry Features:
- Streams items to avoid memory issues with large queues
- Respects original workflow’s max_parallel setting (unless overridden)
- Preserves correlation IDs for tracking
- Updates DLQ state (removes successful, keeps failed)
- Supports interruption and resumption
- Shared across worktrees for centralized failure tracking
DLQ Storage
DLQ data is stored in:
~/.prodigy/dlq/{repo_name}/{job_id}/
This centralized storage allows multiple worktrees to share the same DLQ.
Best Practices
When to Use Command-Level Error Handling
- Recovery: Use
on_failure
to fix issues and retry (e.g., clearing cache before reinstalling) - Cleanup: Use
strategy: cleanup
to clean up resources after failures - Fallback: Use
strategy: fallback
for alternative approaches - Notifications: Use handler commands to notify teams of failures
When to Use Workflow-Level Error Policy
- MapReduce jobs: Use error_policy for consistent failure handling across all work items
- Failure thresholds: Use max_failures or failure_threshold to prevent runaway jobs
- Circuit breakers: Use when external dependencies might fail cascading
- DLQ: Use for large batch jobs where you want to retry failures separately
Error Information Available
When a command fails, you can access error information in handler commands:
- shell: "risky-command"
on_failure:
claude: "/analyze-error ${shell.output}"
The ${shell.output}
variable contains the command’s stdout/stderr output.
Common Patterns
Cleanup and Retry:
- shell: "npm install"
on_failure:
- "npm cache clean --force"
- "rm -rf node_modules"
- "npm install"
Conditional Recovery:
- shell: "cargo test"
on_failure:
claude: "/fix-failing-tests"
max_attempts: 3
fail_workflow: false
Critical Step with Notification:
- shell: "deploy-production"
on_failure:
commands:
- shell: "rollback-deployment"
- shell: "notify-team 'Deployment failed'"
fail_workflow: true # Still fail workflow after cleanup
Automated Documentation with mdBook
This guide shows you how to set up automated, always-up-to-date documentation for any project using Prodigy’s book workflow system. This same system maintains the documentation you’re reading right now.
Overview
The book workflow system:
- Analyzes your codebase to build a feature inventory
- Detects documentation drift by comparing docs to implementation
- Updates documentation automatically using Claude
- Maintains consistency across all chapters
- Runs on any project - just configure and go
The generalized commands work for any codebase: Rust, Python, JavaScript, etc.
Prerequisites
-
Install Prodigy:
cargo install prodigy
-
Install mdBook:
cargo install mdbook
-
Claude Code CLI with valid API credentials
-
Git repository for your project
Quick Start (30 Minutes)
Step 1: Initialize Prodigy Commands
In your project directory:
# Initialize Prodigy and install book documentation commands
prodigy init
This creates .claude/commands/
with the generalized book commands:
/prodigy-analyze-features-for-book
- Analyze codebase for feature inventory/prodigy-analyze-book-chapter-drift
- Detect documentation drift per chapter/prodigy-fix-book-drift
- Update chapters to fix drift/prodigy-fix-book-build-errors
- Fix mdBook build errors
Step 2: Initialize mdBook Structure
# Create book directory structure
mdbook init book --title "Your Project Documentation"
# Create workflow and config directories
mkdir -p workflows/data
mkdir -p .myproject # Or .config, whatever you prefer
Step 3: Create Project Configuration
Create .myproject/book-config.json
(adjust paths and analysis targets for your project):
{
"project_name": "YourProject",
"project_type": "cli_tool",
"book_dir": "book",
"book_src": "book/src",
"book_build_dir": "book/book",
"analysis_targets": [
{
"area": "cli_commands",
"source_files": ["src/cli/", "src/commands/"],
"feature_categories": ["commands", "arguments", "options"]
},
{
"area": "core_features",
"source_files": ["src/lib.rs", "src/core/"],
"feature_categories": ["api", "public_functions", "exports"]
},
{
"area": "configuration",
"source_files": ["src/config/"],
"feature_categories": ["config_options", "defaults", "validation"]
}
],
"chapter_file": "workflows/data/chapters.json",
"custom_analysis": {
"include_examples": true,
"include_best_practices": true,
"include_troubleshooting": true
}
}
Key Fields to Customize:
project_name
: Your project’s name (used in prompts)project_type
:cli_tool
,library
,web_service
, etc.analysis_targets
: Areas of code to analyze for documentationarea
: Logical grouping namesource_files
: Paths to analyze (relative to project root)feature_categories
: Types of features to extract
Step 4: Define Chapter Structure
Create workflows/data/chapters.json
:
{
"chapters": [
{
"id": "getting-started",
"title": "Getting Started",
"file": "book/src/getting-started.md",
"topics": ["Installation", "Quick start", "First steps"],
"validation": "Check installation instructions and basic usage"
},
{
"id": "user-guide",
"title": "User Guide",
"file": "book/src/user-guide.md",
"topics": ["Core features", "Common workflows", "Examples"],
"validation": "Verify all main features are documented"
},
{
"id": "configuration",
"title": "Configuration",
"file": "book/src/configuration.md",
"topics": ["Config files", "Options", "Defaults"],
"validation": "Check config options match implementation"
},
{
"id": "troubleshooting",
"title": "Troubleshooting",
"file": "book/src/troubleshooting.md",
"topics": ["Common issues", "Debug mode", "FAQ"],
"validation": "Ensure common issues are covered"
}
]
}
Chapter Definition Fields:
id
: Unique identifier for the chaptertitle
: Display title in the bookfile
: Path to markdown file (relative to project root)topics
: What should be covered in this chaptervalidation
: What Claude should check for accuracy
Step 5: Create Book Configuration
Edit book/book.toml
:
[book]
title = "Your Project Documentation"
authors = ["Your Team"]
description = "Comprehensive guide to Your Project"
src = "src"
language = "en"
[build]
build-dir = "book"
create-missing = false
[output.html]
default-theme = "rust"
preferred-dark-theme = "navy"
smart-punctuation = true
git-repository-url = "https://github.com/yourorg/yourproject"
git-repository-icon = "fa-github"
edit-url-template = "https://github.com/yourorg/yourproject/edit/main/book/{path}"
[output.html.search]
enable = true
limit-results = 30
use-boolean-and = true
boost-title = 2
[output.html.fold]
enable = true
level = 1
[output.html.playground]
editable = false
copyable = true
line-numbers = true
Step 6: Create Chapter Files
Create placeholder files for each chapter:
# Create initial chapters with basic structure
cat > book/src/getting-started.md <<EOF
# Getting Started
This chapter covers installation and initial setup.
## Installation
TODO: Add installation instructions
## Quick Start
TODO: Add quick start guide
EOF
# Repeat for other chapters...
Update book/src/SUMMARY.md
:
# Summary
[Introduction](intro.md)
# User Guide
- [Getting Started](getting-started.md)
- [User Guide](user-guide.md)
- [Configuration](configuration.md)
# Reference
- [Troubleshooting](troubleshooting.md)
Step 7: Create the Workflow
Create workflows/book-docs-drift.yml
:
name: book-docs-drift-detection
mode: mapreduce
env:
PROJECT_NAME: "YourProject"
PROJECT_CONFIG: ".myproject/book-config.json"
FEATURES_PATH: ".myproject/book-analysis/features.json"
setup:
- shell: "mkdir -p .myproject/book-analysis"
# Analyze codebase and build feature inventory
- claude: "/prodigy-analyze-features-for-book --project $PROJECT_NAME --config $PROJECT_CONFIG"
map:
input: "workflows/data/chapters.json"
json_path: "$.chapters[*]"
agent_template:
# Analyze each chapter for drift
- claude: "/prodigy-analyze-book-chapter-drift --project $PROJECT_NAME --json '${item}' --features $FEATURES_PATH"
commit_required: true
max_parallel: 3
agent_timeout_secs: 900
reduce:
# Aggregate all drift reports and fix issues
- claude: "/prodigy-fix-book-drift --project $PROJECT_NAME --config $PROJECT_CONFIG"
commit_required: true
# Build the book
- shell: "cd book && mdbook build"
on_failure:
claude: "/prodigy-fix-book-build-errors --project $PROJECT_NAME"
error_policy:
on_item_failure: dlq
continue_on_failure: true
max_failures: 2
merge:
commands:
# Clean up temporary analysis files
- shell: "rm -rf .myproject/book-analysis"
- shell: "git add -A && git commit -m 'chore: remove temporary book analysis files' || true"
# Final build verification
- shell: "cd book && mdbook build"
# Merge back to main branch
- shell: "git fetch origin"
- claude: "/merge-master"
- claude: "/prodigy-merge-worktree ${merge.source_branch}"
Workflow Sections:
- env: Environment variables for project-specific configuration
- setup: Initialize analysis directory and build feature inventory
- map: Process each chapter in parallel to detect drift
- reduce: Aggregate results and update documentation
- merge: Cleanup and merge changes back to main branch
Key Variables:
PROJECT_NAME
: Used in prompts and contextPROJECT_CONFIG
: Path to your book-config.jsonFEATURES_PATH
: Where feature inventory is stored
Step 8: Run the Workflow
# Run the documentation workflow
prodigy run workflows/book-docs-drift.yml
# The workflow will:
# 1. Analyze your codebase for features
# 2. Check each chapter for documentation drift
# 3. Update chapters to match current implementation
# 4. Build the book to verify everything works
# 5. Merge changes back to your main branch
Understanding the Workflow
Phase 1: Setup - Feature Analysis
The setup phase analyzes your codebase and creates a feature inventory:
setup:
- shell: "mkdir -p .myproject/book-analysis"
- claude: "/prodigy-analyze-features-for-book --project $PROJECT_NAME --config $PROJECT_CONFIG"
This generates .myproject/book-analysis/features.json
:
{
"cli_commands": [
{
"name": "run",
"description": "Execute a workflow",
"arguments": ["workflow_file"],
"options": ["--resume", "--dry-run"]
}
],
"api_functions": [
{
"name": "execute_workflow",
"signature": "fn execute_workflow(config: Config) -> Result<()>",
"purpose": "Main entry point for workflow execution"
}
]
}
Phase 2: Map - Chapter Drift Detection
Each chapter is processed in parallel to detect drift:
map:
input: "workflows/data/chapters.json"
json_path: "$.chapters[*]"
agent_template:
- claude: "/prodigy-analyze-book-chapter-drift --project $PROJECT_NAME --json '${item}' --features $FEATURES_PATH"
commit_required: true
For each chapter, Claude:
- Reads the current chapter content
- Compares it to the feature inventory
- Identifies missing, outdated, or incorrect information
- Creates a drift report
Phase 3: Reduce - Fix Drift
The reduce phase aggregates all drift reports and updates chapters:
reduce:
- claude: "/prodigy-fix-book-drift --project $PROJECT_NAME --config $PROJECT_CONFIG"
commit_required: true
- shell: "cd book && mdbook build"
on_failure:
claude: "/prodigy-fix-book-build-errors --project $PROJECT_NAME"
Claude:
- Reviews all drift reports
- Updates chapters to fix issues
- Ensures consistency across chapters
- Verifies the book builds successfully
Phase 4: Merge - Integration
The merge phase cleans up and integrates changes:
merge:
commands:
- shell: "rm -rf .myproject/book-analysis"
- shell: "git add -A && git commit -m 'chore: remove temporary book analysis files' || true"
- shell: "cd book && mdbook build"
- shell: "git fetch origin"
- claude: "/merge-master"
- claude: "/prodigy-merge-worktree ${merge.source_branch}"
GitHub Actions Integration
Automated Documentation Deployment
Create .github/workflows/deploy-docs.yml
:
name: Deploy Documentation
on:
push:
branches: [main, master]
paths:
- 'book/**'
- 'workflows/book-docs-drift.yml'
workflow_dispatch: # Allow manual triggers
permissions:
contents: write
pages: write
id-token: write
jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup mdBook
uses: peaceiris/actions-mdbook@v2
with:
mdbook-version: 'latest'
- name: Build book
run: |
cd book
mdbook build
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./book/book
cname: docs.yourproject.com # Optional: custom domain
Periodic Documentation Updates
Create .github/workflows/update-docs.yml
:
name: Update Documentation
on:
schedule:
# Run weekly on Monday at 9 AM UTC
- cron: '0 9 * * 1'
workflow_dispatch: # Allow manual triggers
jobs:
update-docs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Install Prodigy
run: cargo install prodigy
- name: Install mdBook
uses: peaceiris/actions-mdbook@v2
with:
mdbook-version: 'latest'
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install Claude Code CLI
run: npm install -g @anthropic-ai/claude-code
- name: Configure Claude API
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
# Configure Claude Code CLI with API key
echo "$ANTHROPIC_API_KEY" | claude-code auth login
- name: Run documentation workflow
run: |
prodigy run workflows/book-docs-drift.yml
- name: Create Pull Request
uses: peter-evans/create-pull-request@v5
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: 'docs: automated documentation update'
title: 'Automated Documentation Update'
body: |
This PR was automatically created by the documentation workflow.
Changes:
- Updated documentation to match current codebase
- Fixed any detected drift between docs and implementation
Please review the changes before merging.
branch: docs/automated-update
delete-branch: true
Required Secrets:
ANTHROPIC_API_KEY
: Your Claude API key (add in repository settings)
Enable GitHub Pages
- Go to repository Settings → Pages
- Source: Deploy from a branch
- Branch:
gh-pages
/root
- Save
Your documentation will be available at: https://yourorg.github.io/yourproject
Customization Examples
For CLI Tools
Focus on commands and usage:
{
"analysis_targets": [
{
"area": "cli_commands",
"source_files": ["src/cli/", "src/commands/"],
"feature_categories": ["commands", "subcommands", "options", "arguments"]
},
{
"area": "configuration",
"source_files": ["src/config/"],
"feature_categories": ["config_file", "environment_vars", "flags"]
}
]
}
Chapter structure:
- Installation
- Quick Start
- Commands Reference
- Configuration
- Examples
- Troubleshooting
For Libraries
Focus on API and usage patterns:
{
"analysis_targets": [
{
"area": "public_api",
"source_files": ["src/lib.rs", "src/api/"],
"feature_categories": ["functions", "types", "traits", "macros"]
},
{
"area": "examples",
"source_files": ["examples/"],
"feature_categories": ["use_cases", "patterns", "integrations"]
}
]
}
Chapter structure:
- Getting Started
- API Reference
- Core Concepts
- Advanced Usage
- Examples
- Migration Guides
For Web Services
Focus on endpoints and integration:
{
"analysis_targets": [
{
"area": "api_endpoints",
"source_files": ["src/routes/", "src/handlers/"],
"feature_categories": ["endpoints", "methods", "parameters", "responses"]
},
{
"area": "authentication",
"source_files": ["src/auth/"],
"feature_categories": ["auth_methods", "tokens", "permissions"]
},
{
"area": "deployment",
"source_files": ["deploy/", "docker/"],
"feature_categories": ["docker", "kubernetes", "configuration"]
}
]
}
Chapter structure:
- Overview
- Authentication
- API Reference
- Integration Guide
- Deployment
- Monitoring
Best Practices
1. Start with Minimal Chapters
Don’t try to document everything at once:
{
"chapters": [
{"id": "intro", "title": "Introduction", ...},
{"id": "quickstart", "title": "Quick Start", ...},
{"id": "reference", "title": "Reference", ...}
]
}
Add more chapters as your project grows.
2. Focus Analysis Targets
Be specific about what to analyze:
{
"area": "cli_commands",
"source_files": ["src/cli/commands/"], // Specific path
"feature_categories": ["commands", "options"] // Specific categories
}
Overly broad targets create unfocused documentation.
3. Provide Chapter Context
Give Claude clear guidance on what each chapter should cover:
{
"id": "advanced",
"title": "Advanced Features",
"topics": ["Custom plugins", "Scripting", "Automation"],
"validation": "Check that plugin API and scripting examples are up-to-date"
}
4. Review Initial Output
The first workflow run will:
- Identify what’s missing
- Add current implementation details
- Create a baseline
Review and refine before committing.
5. Run Regularly
Documentation drift happens constantly:
# Run monthly or after major features
prodigy run workflows/book-docs-drift.yml
# Or set up GitHub Actions for automation
6. Use Validation Topics
Specify what Claude should validate:
{
"validation": "Check that all CLI commands in src/cli/commands/ are documented with current options and examples"
}
This ensures focused, accurate updates.
Troubleshooting
Issue: Feature Analysis Produces Empty Results
Cause: Analysis targets don’t match your code structure
Solution: Check that source_files
paths exist:
ls -la src/cli/ # Verify paths in analysis_targets
Adjust paths in book-config.json
to match your actual structure.
Issue: Chapters Not Updating
Cause: Chapter files don’t exist or paths are wrong
Solution: Verify chapter files exist:
# Check all chapters listed in chapters.json exist
cat workflows/data/chapters.json | jq -r '.chapters[].file' | xargs ls -la
Issue: mdBook Build Fails
Cause: SUMMARY.md doesn’t match chapter files
Solution: Ensure all chapters in SUMMARY.md
have corresponding files:
cd book && mdbook build
Fix any missing files or broken links.
Issue: Workflow Takes Too Long
Cause: Too many chapters or overly broad analysis
Solution:
- Reduce
max_parallel
in map phase (default: 3) - Split large chapters into smaller ones
- Narrow
analysis_targets
to essential code paths
Issue: Documentation Quality Issues
Cause: Insufficient initial content or unclear validation
Solution:
- Create better chapter outlines before running workflow
- Add more specific
validation
criteria in chapters.json - Review and manually refine after first run
Advanced Configuration
Custom Analysis Functions
You can extend the analysis by providing custom analysis functions in your config:
{
"custom_analysis": {
"include_examples": true,
"include_best_practices": true,
"include_troubleshooting": true,
"analyze_dependencies": true,
"extract_code_comments": true,
"include_performance_notes": true
}
}
Multi-Language Projects
For projects with multiple languages:
{
"analysis_targets": [
{
"area": "rust_backend",
"source_files": ["src/"],
"feature_categories": ["api", "services"],
"language": "rust"
},
{
"area": "typescript_frontend",
"source_files": ["web/src/"],
"feature_categories": ["components", "hooks"],
"language": "typescript"
}
]
}
Chapter Dependencies
Some chapters may depend on others:
{
"chapters": [
{
"id": "basics",
"title": "Basic Usage",
"dependencies": []
},
{
"id": "advanced",
"title": "Advanced Usage",
"dependencies": ["basics"],
"validation": "Ensure examples build on concepts from Basic Usage chapter"
}
]
}
Real-World Example: Prodigy’s Own Documentation
This documentation you’re reading is maintained by the same workflow described here. You can examine the configuration:
Configuration: .prodigy/book-config.json
{
"project_name": "Prodigy",
"project_type": "cli_tool",
"analysis_targets": [
{
"area": "workflow_execution",
"source_files": ["src/workflow/", "src/orchestrator/"],
"feature_categories": ["workflow_types", "execution_modes", "lifecycle"]
},
{
"area": "mapreduce",
"source_files": ["src/mapreduce/"],
"feature_categories": ["map_phase", "reduce_phase", "parallelism"]
}
]
}
Chapters: workflows/data/prodigy-chapters.json
{
"chapters": [
{
"id": "workflow-basics",
"title": "Workflow Basics",
"file": "book/src/workflow-basics.md",
"topics": ["Standard workflows", "Basic structure"],
"validation": "Check workflow syntax matches current implementation"
}
]
}
Workflow: workflows/book-docs-drift.yml
Study these files for a complete working example.
Next Steps
- Set up the basics: Follow the Quick Start to get a minimal book running
- Customize for your project: Adjust analysis targets and chapters
- Run the workflow: Generate your first automated update
- Refine iteratively: Review output and improve configuration
- Automate: Set up GitHub Actions for continuous documentation
- Extend: Add more chapters as your project grows
Benefits
This approach provides:
- ✅ Always up-to-date documentation - Runs automatically to detect drift
- ✅ Consistent quality - Same analysis across all chapters
- ✅ Reduced maintenance - Less manual documentation work
- ✅ Accurate examples - Extracted from actual code
- ✅ Version control - All changes tracked in git
- ✅ Easy to customize - Configuration-based, works for any project
The same commands that maintain Prodigy’s documentation can maintain yours.
Examples
Example 1: Simple Build and Test
- shell: "cargo build"
- shell: "cargo test"
on_failure:
claude: "/fix-failing-tests"
- shell: "cargo clippy"
Example 2: Coverage Improvement with Goal Seeking
- goal_seek:
goal: "Achieve 80% test coverage"
claude: "/improve-coverage"
validate: |
coverage=$(cargo tarpaulin | grep 'Coverage' | sed 's/.*: \([0-9.]*\)%.*/\1/')
echo "score: ${coverage%.*}"
threshold: 80
max_attempts: 5
commit_required: true
Example 3: Foreach Iteration
# Test multiple configurations in sequence
- foreach:
- rust-version: "1.70"
profile: debug
- rust-version: "1.71"
profile: release
- rust-version: "stable"
profile: release
commands:
- shell: "rustup install ${foreach.item.rust-version}"
- shell: "cargo +${foreach.item.rust-version} build --profile ${foreach.item.profile}"
- shell: "cargo +${foreach.item.rust-version} test"
# Parallel foreach with error handling
- foreach:
- "web-service"
- "api-gateway"
- "worker-service"
parallel: 3
continue_on_error: true
commands:
- shell: "cd services/${foreach.item} && cargo build"
- shell: "cd services/${foreach.item} && cargo test"
on_failure:
claude: "/fix-service-tests ${foreach.item}"
Example 4: Parallel Code Review
name: parallel-code-review
mode: mapreduce
setup:
- shell: "find src -name '*.rs' > files.txt"
- shell: "jq -R -s -c 'split(\"\n\") | map(select(length > 0) | {path: .})' files.txt > items.json"
map:
input: items.json
json_path: "$[*]" # Process all items
agent_template:
- claude: "/review-file ${item.path}"
id: "review"
capture: "review_result"
capture_format: "json"
- shell: "cargo check ${item.path}"
max_parallel: 5
reduce:
- claude: "/summarize-reviews ${map.results}"
Example 5: Conditional Deployment
- shell: "cargo test --quiet && echo true || echo false"
id: "test"
capture: "test_result"
capture_format: "boolean" # Supported: string, json, lines, number, boolean
- shell: "cargo build --release"
when: "${test_result} == true"
- shell: "docker build -t myapp ."
when: "${test_result} == true"
on_success:
shell: "docker push myapp:latest"
Example 6: Multi-Step Validation
- claude: "/implement-feature auth"
commit_required: true
validate:
commands:
- shell: "cargo test auth"
- shell: "cargo clippy -- -D warnings"
- claude: "/validate-implementation --output validation.json"
result_file: "validation.json"
threshold: 90
on_incomplete:
claude: "/complete-gaps ${validation.gaps}"
commit_required: true
max_attempts: 2
Example 7: Environment-Aware Workflow
# Global environment variables
env:
NODE_ENV: production
API_URL: https://api.production.com
# Environment profiles for different contexts
profiles:
production:
API_URL: https://api.production.com
LOG_LEVEL: error
description: "Production environment"
staging:
API_URL: https://api.staging.com
LOG_LEVEL: warn
description: "Staging environment"
# Secrets (masked in logs)
secrets:
API_KEY:
provider: env
key: SECRET_API_KEY
# Load additional variables from .env files
env_files:
- .env
- .env.production
# Workflow steps (no 'commands' wrapper in simple format)
- shell: "cargo build --release"
# Use environment variables in commands
- shell: "echo 'Deploying to ${NODE_ENV} at ${API_URL}'"
# Override environment for specific command using shell syntax
- shell: "LOG_LEVEL=debug ./deploy.sh"
Note: Profile activation with active_profile
is managed internally and not currently exposed in WorkflowConfig YAML. Use --profile
CLI flag to activate profiles.
Example 8: Complex MapReduce with Error Handling
name: tech-debt-elimination
mode: mapreduce
setup:
- shell: "debtmap analyze . --output debt.json"
map:
input: debt.json
json_path: "$.items[*]"
filter: "item.severity == 'critical'"
sort_by: "item.priority DESC"
max_items: 20
max_parallel: 5
agent_template:
- claude: "/fix-debt-item '${item.description}'"
commit_required: true
- shell: "cargo test"
on_failure:
claude: "/debug-and-fix"
reduce:
- shell: "debtmap analyze . --output debt-after.json"
- claude: "/compare-debt-reports --before debt.json --after debt-after.json"
error_policy:
on_item_failure: dlq
continue_on_failure: true
max_failures: 5
failure_threshold: 0.3
Troubleshooting
Common Issues
1. Variables not interpolating
Symptom: Variables appear as literal ${variable_name}
in output instead of their values.
Causes:
- Incorrect syntax (missing
${}
wrapper) - Variable not defined or not in scope
- Typo in variable name
Solutions:
- Ensure proper
${}
syntax:${workflow.name}
, not$workflow.name
- Check variable is defined before use
- Verify variable is available in current context (e.g.,
${item.*}
only available in map phase) - Use echo to debug:
- shell: "echo 'Variable value: ${my_var}'"
2. Capture not working
Symptom: Captured variables are empty or contain unexpected data.
Causes:
- Incorrect
capture_format
for output type - Command output not in expected format
- Missing or incorrect
capture_streams
configuration
Solutions:
- Match
capture_format
to output type and how it transforms output:string
- Captures raw text output as-isnumber
- Parses output as numeric value (int or float)json
- Parses JSON and allows JSONPath queries on the resultlines
- Splits multi-line output into an arrayboolean
- Evaluates to true/false based on success status
- Test command output manually first
- Capture all streams for debugging:
- shell: "cargo test 2>&1" capture: "test_output" capture_streams: stdout: true # Capture standard output stderr: true # Capture error output exit_code: true # Capture exit code success: true # Capture success boolean duration: true # Capture execution duration
3. Validation failing
Symptom: Goal-seeking or validation commands fail to recognize completion.
Causes:
- Validate command not outputting
score: N
format - Threshold too high
- Score calculation incorrect
- Validation command not configured correctly
Solutions:
- Ensure validate command outputs exactly
score: N
(where N is 0-100) - Validation is part of goal_seek commands with these fields:
validate
- Command that outputs scorethreshold
- Minimum score to consider success (0-100)max_iterations
- Maximum attempts before giving upon_incomplete
- Commands to run when score below threshold
- Test validate command independently
- Lower threshold temporarily for debugging
- Example correct format:
- goal_seek: validate: | result=$(run-checks.sh | grep 'Percentage' | sed 's/.*: \([0-9]*\)%.*/\1/') echo "score: $result" threshold: 80 max_iterations: 5 on_incomplete: - claude: "/fix-issues"
4. MapReduce items not found
Symptom: Map phase finds zero items or wrong items.
Causes:
- Incorrect JSONPath expression
- Input file format doesn’t match expectations
- Input file not generated in setup phase
- JSONPath syntax errors
Solutions:
- Test JSONPath expression with actual data using
jq
:jq '$.items[*]' items.json
- Verify input file exists and contains expected structure
- Check setup phase completed successfully
- Use simpler JSONPath first:
$[*]
to get all items - Common JSONPath mistakes:
- Wrong bracket syntax: Use
$.items[*]
not$.items[]
- Missing root
$
: Always start with$
- Incorrect filter syntax:
$[?(@.score >= 5)]
for filtering - Nested paths:
$.data.items[*].field
for deep structures
- Wrong bracket syntax: Use
5. Timeout errors
Symptom: Commands or workflows fail with timeout errors.
Causes:
- Commands take longer than expected
- Default timeout too short
- Infinite loops or hanging processes
Solutions:
- Increase timeout values using duration strings:
- shell: "slow-command.sh" timeout: "600s" # or "10m" - uses humantime duration format
- For MapReduce, increase agent timeout (note: this uses seconds as a number):
map: agent_timeout_secs: 600 # Takes a number (seconds) not a duration string
- Note:
agent_timeout_secs
takes a number (seconds) while most other timeout fields use duration strings like “10m” - Debug hanging commands by running them manually
- Add logging to identify slow steps
6. Environment variables not set
Symptom: Commands fail because required environment variables are missing.
Causes:
- Environment not inherited from parent process
- Typo in variable name
- Profile not activated
- Secret not loaded
Solutions:
- Ensure
inherit: true
in workflow config (default) - Verify profile activation:
active_profile: "development"
- Check secrets are properly configured:
secrets: API_KEY: "${env:SECRET_API_KEY}"
- Debug with:
- shell: "env | grep VARIABLE_NAME"
7. Merge workflow not running
Symptom: Custom merge commands not executed when merging worktree.
Causes:
- Merge block not properly formatted
- Syntax error in merge commands
- Merge workflow timeout too short
Solutions:
-
Both merge formats are valid - choose based on needs:
Simplified format (direct list of commands):
merge: - shell: "git fetch origin" - claude: "/merge-worktree ${merge.source_branch}"
Full format (with timeout configuration):
merge: commands: - shell: "slow-merge-validation.sh" timeout: "600s" # Duration string format
-
Use the full format when you need to set a custom timeout
-
Check logs for merge execution errors
8. Commands failing without error handler
Symptom: Command fails and workflow stops immediately without recovery.
Causes:
- No
on_failure
handler configured - Error not being caught by handler
- Handler itself failing
Solutions:
- Add
on_failure
handler to commands that might fail:- shell: "risky-command.sh" on_failure: - shell: "echo 'Command failed, attempting recovery'" - claude: "/fix-issue"
- Commands without
on_failure
will stop the workflow on first error - Check that your handler commands don’t also fail
- Use shell exit codes to control failure:
command || exit 0
to ignore failures
9. Error policy configuration issues
Symptom: Retry, backoff, or circuit breaker not working as expected.
Causes:
- Incorrect Duration format for timeouts
- Wrong BackoffStrategy enum variant
- Invalid retry_config structure
Solutions:
- Use Duration strings for all timeout values:
error_policy: retry_config: max_attempts: 3 initial_delay: "1s" # Not 1000 max_delay: "30s" # Use duration strings circuit_breaker: timeout: "60s" # Not 60 failure_threshold: 5
- Valid BackoffStrategy values:
constant
- Same delay every timelinear
- Increases linearlyexponential
- Doubles each timefibonacci
- Fibonacci sequence
- Circuit breaker requires both timeout and failure_threshold
Debug Tips
Use verbosity flags for debugging
Prodigy supports multiple verbosity levels for debugging:
# Default: Clean output, no Claude streaming
prodigy run workflow.yml
# -v: Shows Claude streaming JSON output (useful for debugging Claude interactions)
prodigy run workflow.yml -v
# -vv: Adds debug-level logs
prodigy run workflow.yml -vv
# -vvv: Adds trace-level logs (very detailed)
prodigy run workflow.yml -vvv
# Force Claude streaming regardless of verbosity
PRODIGY_CLAUDE_CONSOLE_OUTPUT=true prodigy run workflow.yml
Enable verbose output in shell commands
- shell: "set -x; your-command"
Inspect variables
- shell: "echo 'Variable value: ${my_var}'"
- shell: "echo 'Item fields: path=${item.path} name=${item.name}'"
Capture all streams for debugging
- shell: "cargo test 2>&1"
capture: "test_output"
capture_streams:
stdout: true
stderr: true
exit_code: true
success: true
duration: true
# Then inspect
- shell: "echo 'Exit code: ${test_output.exit_code}'"
- shell: "echo 'Success: ${test_output.success}'"
- shell: "echo 'Duration: ${test_output.duration}s'"
Test JSONPath expressions
# Manually test your JSONPath
jq '$.items[*]' items.json
# Test with filter
jq '$.items[] | select(.score >= 5)' items.json
Validate workflow syntax
# Workflows are validated automatically when loaded
# Check for syntax errors by attempting to run
prodigy run workflow.yml
# View the validation result file (if workflow validation completed)
cat .prodigy/validation-result.json
Check DLQ for failed items
# List failed items
prodigy dlq list <job_id>
# View failure details
prodigy dlq inspect <job_id>
# Retry failed items (primary recovery operation)
prodigy dlq retry <job_id>
# Retry with custom parallelism
prodigy dlq retry <job_id> --max-parallel 10
# Dry run to see what would be retried
prodigy dlq retry <job_id> --dry-run
Monitor MapReduce progress
# View events
prodigy events <job_id>
# Check checkpoints
prodigy checkpoints list
# View event logs directly
ls ~/.prodigy/events/
# Check session state
cat .prodigy/session_state.json
FAQ
Q: Why are my changes not being committed?
A: Add commit_required: true
to your command or use auto_commit: true
for automatic commits when changes are detected. Note: auto_commit
can be set at the workflow level (applies to all steps) or per-step. When true, Prodigy creates commits automatically when git diff detects changes.
Q: How do I retry failed MapReduce items?
A: Use the DLQ retry command:
prodigy dlq retry <job_id>
Q: Can I use environment variables in JSONPath expressions?
A: No, JSONPath expressions are evaluated against the input data, not the environment. Use variables in command arguments instead.
Q: How do I skip items in MapReduce?
A: Use the filter
field:
map:
filter: "item.score >= 5"
Q: What’s the difference between on_failure
and on_incomplete
?
A: on_failure
runs when a command exits with a non-zero code. on_incomplete
is used in goal_seek commands and runs when the validation score is below the threshold.
Q: How do I run commands in parallel?
A: Use MapReduce mode with max_parallel
:
mode: mapreduce
map:
max_parallel: 5
Q: Can I nest workflows?
A: Not directly, but you can use shell
commands to invoke other workflows:
- shell: "prodigy run other-workflow.yml"
Common Error Messages
MapReduceError Types
Prodigy uses structured errors to help diagnose issues:
Job-level errors:
JobNotFound
- Job ID doesn’t exist, check job_id spelling or if job was cleaned upInvalidJobConfiguration
- Workflow YAML has configuration errorsWorktreeSetupFailed
- Failed to create git worktree, check disk space and git status
Agent-level errors:
AgentFailed
- Individual agent execution failed, check DLQ for detailsAgentTimeout
- Agent exceeded timeout, increase agent_timeout_secsCommandExecutionFailed
- Shell or Claude command failed in agent
Resource errors:
WorktreeMergeConflict
- Git merge conflict when merging agent resultsResourceExhausted
- Out of disk space, memory, or other resourcesStorageError
- Failed to read/write to storage, check permissions
Recovery actions:
- Check event logs:
prodigy events <job_id>
- Review DLQ:
prodigy dlq list <job_id>
- View detailed state:
cat ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/checkpoint.json
Checkpoint and Resume Errors
“Checkpoint not found”
- Cause: No checkpoint file exists for this job
- Solution: Job may have completed or checkpoint was deleted, start fresh
“Failed to resume from checkpoint”
- Cause: Checkpoint file is corrupted or format changed
- Solution: Check checkpoint JSON syntax, may need to start over
“Worktree conflicts during merge”
- Cause: Git merge conflicts when combining agent results
- Solution: Resolve conflicts manually in worktree, then retry merge
Variable and Capture Errors
“Variable not found: ${variable_name}”
- Cause: Variable not defined or out of scope
- Solution: Check variable is defined before use, verify scope (workflow vs item vs capture)
“Failed to parse capture output as {format}”
- Cause: Command output doesn’t match capture_format
- Solution: Check output manually, adjust capture_format or command output
“JSONPath expression failed”
- Cause: Invalid JSONPath syntax or doesn’t match data structure
- Solution: Test with
jq
command, simplify expression, check input data format
Best Practices for Debugging
- Start simple: Test commands individually before adding to workflow
- Use verbosity flags: Use
-v
to see Claude interactions,-vv
for debug logs,-vvv
for trace - Use echo liberally: Debug variable values with echo statements
- Check logs and state: Review event logs (
~/.prodigy/events/
) and session state (.prodigy/session_state.json
) - Test incrementally: Add commands one at a time and test after each
- Validate input data: Ensure JSON files and data formats are correct before MapReduce
- Check DLQ regularly: Monitor failed items with
prodigy dlq list
and retry when appropriate - Monitor resources: Check disk space, memory, and CPU during execution
- Version control: Commit working workflows before making changes
- Read error messages carefully: MapReduceError types indicate specific failure modes
- Ask for help: Include full error messages, workflow config, and verbosity output when seeking support