Troubleshooting¶
This chapter provides comprehensive guidance for diagnosing and resolving common issues with Prodigy workflows. Whether you're experiencing MapReduce failures, checkpoint issues, or variable interpolation problems, you'll find practical solutions here.
Common Issues¶
Variables not interpolating¶
Symptoms: Literal ${var} appears in output instead of value
Causes: - Variable name typo or case mismatch - Variable not in scope - Incorrect syntax - Variable not captured
Solutions:
- Check variable name spelling and case sensitivity
- Verify variable is available in current scope (step vs workflow)
- Ensure proper syntax: ${var} not $var for complex expressions
- Verify capture_output command succeeded
- Check variable was set before use (e.g., in previous step)
MapReduce items not found¶
Symptoms: No items to process, empty JSONPath result, or "items.json not found"
Causes: - Input file doesn't exist - Incorrect JSONPath - Setup phase failed - Wrong file format
Solutions:
- Verify input file exists with correct path
- Test JSONPath expression with jsonpath-cli or jq
- Check json_path field syntax (default: $[*])
- Ensure setup phase generated the input file successfully
- Validate JSON format with jq or json validator
Timeout errors¶
Symptoms: Commands or phases timing out before completion
Causes: - Operation too slow - Insufficient timeout - Hung processes - Deadlock
Solutions: - Increase timeout value for long operations - Optimize command execution for better performance - Split work into smaller chunks (use max_items, offset) - Check for hung processes with ps or top - Look for deadlocks in concurrent operations - Use agent_timeout_secs for MapReduce agents
Checkpoint resume not working¶
Symptoms: Resume starts from beginning, fails to load state, or "checkpoint not found"
Causes: - Checkpoint files missing - Wrong session/job ID - Workflow changed - Concurrent resume
Solutions:
- Verify checkpoint files exist in ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/
- Check session/job ID is correct with prodigy sessions list
- Ensure workflow file hasn't changed significantly
- Check for concurrent resume lock in ~/.prodigy/resume_locks/
- Review checkpoint file contents for corruption
See MapReduce Checkpoint and Resume for detailed information.
DLQ items not retrying or re-failing¶
Symptoms: Retry command fails, items immediately fail again, or no progress
Causes: - Systematic error not transient - DLQ file corrupted - Underlying issue not fixed
Solutions:
- Check DLQ file format and contents with prodigy dlq show <job_id>
- Verify error was transient not systematic (e.g., rate limit vs bug)
- Fix underlying issue before retry (e.g., API credentials, file permissions)
- Increase max-parallel for retry if parallelism helps
- Check json_log_location in DLQ for detailed error info
See Dead Letter Queue (DLQ) for complete DLQ management details.
Worktree cleanup failures¶
Symptoms: Orphaned worktrees after failures, "permission denied" on cleanup
Causes: - Locked files - Running processes - Permission issues - Disk full
Solutions:
- Use prodigy worktree clean-orphaned <job_id> for automatic cleanup
- Check for locked files with lsof or similar tools
- Verify no running processes using worktree with ps
- Check disk space with df -h
- Verify file permissions on worktree directory
- Manual cleanup if necessary: rm -rf ~/.prodigy/worktrees/<path>
For more on cleanup failures, see "Cleanup Failure Handling (Spec 136)" in the CLAUDE.md file.
Environment variables not resolved¶
Symptoms: Literal ${VAR} or $VAR appears in commands instead of value
Causes: - Variable not defined - Wrong profile - Scope issue - Syntax error
Solutions:
- Check variable defined in env, secrets, or profiles section
- Verify correct profile activated with --profile flag
- Use proper syntax: ${VAR} for workflow vars, $VAR may work for shell
- Check variable scope (global vs step-level)
- Ensure env_files loaded correctly
See Environment Variables for variable configuration details.
Git context variables empty¶
Symptoms: ${step.files_added} returns empty string or undefined
Causes: - No commits created - Git repo not initialized - Step not completed - Wrong format
Solutions:
- Ensure commands created commits (use commit_required: true)
- Check git repository is initialized in working directory
- Verify step completed before accessing variables
- Use appropriate format modifier (e.g., :json, :newline)
- Check git status to verify changes exist
See Advanced Git Context for available git variables.
Foreach iteration failures¶
Symptoms: Foreach command fails partway through, items skipped, or parallel execution errors
Causes: - Command failure with continue_on_error disabled - Parallel execution resource exhaustion - Variable interpolation errors in item context - Max items limit reached unexpectedly
Solutions:
- Enable continue_on_error to process remaining items on failure
- Reduce parallelism: parallel: 5 instead of parallel: true
- Verify ${item}, ${index}, ${total} variable interpolation
- Check max_items setting matches expectations
- Review progress bar output for failure patterns
- Use shell command for debugging: shell: "echo Processing ${item}"
Example foreach with error handling:
foreach:
input:
list: ["file1.py", "file2.py", "file3.py"]
parallel: 5
max_items: 10
continue_on_error: true
commands:
- shell: "echo Processing ${item} (${index}/${total})"
- claude: "/refactor ${item}"
Source: src/cook/execution/foreach.rs:44-515
Workflow composition errors¶
Symptoms: "Template not found", "Circular dependency detected", "Required parameter not provided"
Causes: - Missing or unregistered templates - Circular extends/imports chains - Required parameters not provided - Path resolution issues
Solutions:
- Verify template exists and is registered: prodigy template list
- Register template if needed: prodigy template register <path>
- Check for circular dependencies in extends/imports chains
- Provide required parameters via --param NAME=value or --param-file
- Review template parameter definitions for requirements
- Check template paths are correct (relative to registry or filesystem)
See Common Error Messages for specific composition error details.
Validate command failures¶
Symptoms: "Schema validation failed", "Threshold not met", "Gap-filling failed", validate command error
Causes: - Validation output doesn't match expected_schema - Completeness percentage below threshold - Invalid JSON output from validation command - Timeout during validation - Gap-filling commands fail
Solutions:
- Test validation command independently and check JSON output
- Verify expected_schema matches actual validation output structure
- Check completeness threshold is realistic: threshold: 80.0
- Increase validation timeout if needed: timeout: 300
- Ensure validation command writes proper JSON to stdout or result_file
- Review gap-filling commands in on_incomplete section
- Use verbose mode to see validation output: prodigy run workflow.yml -v
Example validate configuration:
# Source: src/cook/workflow/validation.rs:10-49
validate:
shell: "cargo test --no-fail-fast -- --format json"
expected_schema:
type: "object"
required: ["passed", "total"]
threshold: 90.0
on_incomplete:
commands:
- claude: "/fix-failing-tests"
Source: src/cook/workflow/validation.rs:10-49
Write file failures¶
Symptoms: "Permission denied", "Directory not found", "Invalid format", file not created or corrupted
Causes: - Parent directory doesn't exist and create_dirs not enabled - Insufficient permissions to write to path - Invalid JSON/YAML content when using format validation - Variable interpolation error in path or content - Invalid file mode permissions
Solutions:
- Enable create_dirs to auto-create parent directories: create_dirs: true
- Check directory permissions: ls -ld $(dirname path/to/file)
- Verify format validation for JSON/YAML: test content with jq or yq
- Test variable interpolation independently: echo "${var}"
- Ensure file mode is valid octal: mode: "0644" not mode: "644"
- Use absolute paths or verify working directory context
- Check disk space: df -h
Example write_file configuration:
# Source: src/config/command.rs:278-298
- write_file:
path: "output/results-${item.id}.json"
content: |
{
"item_id": "${item.id}",
"status": "completed"
}
format: json
create_dirs: true
mode: "0644"
Source: src/config/command.rs:278-313, tests/write_file_integration_test.rs
Claude command fails with "command not found"¶
Symptoms: Shell error about claude command not existing
Causes: - Claude Code not installed - Not in PATH - Wrong executable name
Solutions:
- Install Claude Code CLI if not present
- Verify claude is in PATH with which claude
- Check command name matches Claude Code CLI (not "claude-code")
- Use full path if necessary: /path/to/claude
Debug Tips¶
Use dry-run mode to preview execution¶
Shows: Preview of commands that would be executed without actually running them
Use when: Verifying workflow steps before execution, testing variable interpolation, checking command syntax
Source: src/cli/args.rs:64
Use verbose mode for execution details¶
Shows: Claude streaming output, tool invocations, and execution timeline
Use when: Understanding what Claude is doing, debugging tool calls
Check Claude JSON logs for full interaction¶
Shows: Full Claude interaction including messages, tools, token usage, errors
Use when: Claude command failed, understanding why Claude made certain decisions
For more on Claude JSON logs, see the "Viewing Claude Execution Logs (Spec 126)" section in the project CLAUDE.md file.
Inspect event logs for execution timeline¶
# List events for a job
prodigy events ls --job-id <job_id>
# Follow events in real-time
prodigy events follow --job-id <job_id>
# Show event statistics
prodigy events stats
Shows: Detailed execution timeline, agent starts/completions, durations, real-time event stream
Use when: Understanding workflow execution flow, finding bottlenecks, monitoring active jobs
Source: src/cli/commands/events.rs:22-98
Review DLQ for failed item details¶
Shows: Failed items with full error details, retry history, json_log_location
Use when: MapReduce items failing, understanding failure patterns
Check checkpoint state for resume issues¶
Location: ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/
Shows: Saved execution state, completed items, variables, phase progress
Use when: Resume not working, understanding saved state
Examine worktree git log for commits¶
Shows: All commits created during workflow execution with full details
Use when: Understanding what changed, verifying commits created
Tail Claude JSON log in real-time¶
Shows: Live streaming of Claude JSON log as it's being written
Use when: Watching long-running Claude command, debugging in real-time
Additional Topics¶
For more specific troubleshooting guidance, see: - FAQ - Frequently asked questions - Common Error Messages - Specific error messages explained