Troubleshooting Reference¶
Quick reference guide for diagnosing and resolving Prodigy issues. For detailed troubleshooting guidance, see the Troubleshooting chapter.
Quick Diagnostics¶
When Something Goes Wrong¶
flowchart TD
Start[Issue Detected] --> Verbose[Run with -v flag]
Verbose --> Logs{Check Logs}
Logs -->|Claude interaction| ClaudeLogs[prodigy logs --latest]
Logs -->|Execution flow| Events[prodigy events ls]
Logs -->|Failed items| DLQ[prodigy dlq show]
ClaudeLogs --> State{Check State}
Events --> State
DLQ --> State
State -->|Session info| Sessions[~/.prodigy/sessions/]
State -->|Checkpoints| Checkpoints[~/.prodigy/state/]
State -->|Worktrees| Worktrees[prodigy worktree ls]
Sessions --> Resolve[Resolve Issue]
Checkpoints --> Resolve
Worktrees --> Resolve
style Start fill:#ffebee
style Verbose fill:#e1f5ff
style Logs fill:#fff3e0
style State fill:#f3e5f5
style Resolve fill:#e8f5e9
Figure: Diagnostic workflow for troubleshooting Prodigy issues.
Quick Diagnostics Checklist
Follow this sequence when troubleshooting:
- Check verbosity: Run with
-vflag to see detailed output - Inspect logs: Use
prodigy logs --latest --summaryfor Claude interactions - Review events: Use
prodigy events ls --job-id <job_id>for execution timeline - Check DLQ: Use
prodigy dlq show <job_id>for failed items (MapReduce only) - Verify state: Check
~/.prodigy/state/for checkpoints and session state
Common Error Patterns¶
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
Variables show as ${var} |
Wrong syntax or undefined | Check spelling, use ${var} syntax |
| "Session not found" | Wrong ID or expired | Use prodigy sessions list to find correct ID |
| "Command not found: claude" | Claude not in PATH | Install Claude Code or add to PATH |
| "No items to process" | Wrong JSONPath or missing file | Verify input file exists, test JSONPath |
| Cleanup fails | Locked files or permissions | Use prodigy worktree clean-orphaned <job_id> |
| Resume starts over | No checkpoint or wrong ID | Check ~/.prodigy/state/ for checkpoint files |
| High map phase failures | Resource contention | Reduce max_parallel, increase timeout |
Issue Categories¶
MapReduce Issues¶
Agents failing silently:
- Check: prodigy dlq show <job_id>
- Inspect: json_log_location field in DLQ entries
- See: Dead Letter Queue (DLQ)
Checkpoint resume not working:
- Check: ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/
- Verify: Session/job ID with prodigy sessions list
- See: Checkpoint and Resume
Concurrent resume blocked:
- Check: ~/.prodigy/resume_locks/{job_id}.lock
- Verify: Process still running with PID from error message
- Clean: Remove lock file if process is dead
- See: CLAUDE.md "Concurrent Resume Protection (Spec 140)"
Cleanup failures:
- Use: prodigy worktree clean-orphaned <job_id>
- Check: Locked files with lsof | grep worktree-path
- See: CLAUDE.md "Cleanup Failure Handling (Spec 136)"
Session and Resume Issues¶
Resume fails with "session not found":
- List sessions: prodigy sessions list
- Try job ID: prodigy resume-job <job_id> or prodigy resume <job_id>
- Check state: ~/.prodigy/sessions/{session-id}.json
Session state corrupted:
- Check: Session file in ~/.prodigy/sessions/
- Verify: Checkpoint files in ~/.prodigy/state/
- Last resort: Start new workflow run
Variable and Interpolation Issues¶
Variables not interpolating:
- Check syntax: ${var} not $var for complex expressions
- Verify scope: Variable defined at workflow/step level
- Check spelling: Variable names are case-sensitive
- See: Environment Variables
Environment variables empty:
- Verify: Variable defined in env section
- Check profile: Use --profile <name> if using profiles
- Test: echo "$VAR" in shell command to verify
Performance Issues¶
Resource Contention
High max_parallel values can exhaust system resources. Start with 5-10 agents and monitor performance before increasing.
Slow MapReduce execution:
- Reduce: max_parallel to avoid resource exhaustion
- Increase: agent_timeout_secs if agents timeout
- Split: Use max_items and offset for chunking
- Check: System resource usage with top or htop
High resource usage:
- Lower parallelism in map phase
- Reduce context size in Claude commands
- Check for memory leaks in custom commands
- Monitor: prodigy events stats for bottlenecks
Performance Tuning
For optimal MapReduce performance:
- 10-1000 work items: Sweet spot for parallelism benefits
- 10 sec - 5 min per item: Ideal task duration
- Start small: Test with
max_items: 10before full run
Timeout errors:
- Increase: timeout field in command configuration
- Split: Large operations into smaller steps
- Check: For hung processes with ps aux | grep prodigy
Worktree Problems¶
Cleanup Failures
If cleanup fails during MapReduce execution, the agent is still marked as successful and results are preserved. Use prodigy worktree clean-orphaned <job_id> to clean up later.
Orphaned worktrees:
- List: prodigy worktree ls
- Clean: prodigy worktree clean-orphaned <job_id>
- Manual: rm -rf ~/.prodigy/worktrees/{path} (last resort)
Merge conflicts: - Use: Custom merge workflow with conflict resolution - Review: Git status in worktree before merge - See: CLAUDE.md "Custom Merge Workflows"
Worktree locked:
- Check: Running processes with lsof ~/.prodigy/worktrees/{path}
- Kill: Process if safe, or wait for completion
- Clean: Use prodigy worktree clean -f if necessary
Debugging Techniques¶
Verbosity Levels¶
# Default: Clean output
prodigy run workflow.yml # (1)!
# Verbose: Show Claude streaming output
prodigy run workflow.yml -v # (2)!
# Very verbose: Add debug logs
prodigy run workflow.yml -vv # (3)!
# Trace: Maximum detail
prodigy run workflow.yml -vvv # (4)!
- Minimal output for production workflows - shows only progress and results
- Adds Claude JSON streaming output for debugging interactions
- Adds debug-level logs from Prodigy internals
- Maximum verbosity including trace-level execution details
Log Inspection¶
Claude JSON Logs
Every Claude command creates a streaming JSONL log with full conversation history:
# View latest log with summary
prodigy logs --latest --summary
# Follow log in real-time
prodigy logs --latest --tail
# View specific log file
cat ~/.claude/projects/{worktree-path}/{uuid}.jsonl | jq -c '.'
Log location is displayed after each Claude command execution.
Event Logs
Track workflow execution and identify bottlenecks:
Source: src/cli/commands/events.rs:22-98
State Inspection¶
Git Context¶
Worktree inspection:
# Navigate to worktree
cd ~/.prodigy/worktrees/{repo}/{session}/
# View git log
git log --oneline -10
# Check status
git status
# View diff
git diff HEAD~1
Error Messages¶
For detailed explanations of specific error messages, see:
Common error patterns:
- "session not found": Session ID incorrect or expired
- "command not found: claude": Claude Code not installed or not in PATH
- "no items to process": Input file missing or JSONPath incorrect
- "cleanup failed": Locked files or permission issues
- "resume already in progress": Concurrent resume protection active
- "checkpoint not found": Checkpoint files missing or wrong ID
Best Practices for Debugging¶
Debugging Workflow
Follow this systematic approach to diagnose issues quickly:
- Start with verbosity: Always use
-vflag when debugging - Check logs first: Claude JSON logs contain full interaction details
- Review events: Event timeline shows execution flow and bottlenecks
- Inspect DLQ early: Failed items in DLQ indicate systematic issues
- Verify state: Check checkpoint and session files for corruption
- Test incrementally: Use
--dry-runto preview execution - Monitor resources: Watch CPU, memory, disk during execution
- Use specialized tools:
prodigy events,prodigy logs,prodigy dlq
For comprehensive debugging strategies, see the Troubleshooting Guide.
Related Topics¶
- Troubleshooting Guide - Detailed troubleshooting for all issues
- FAQ - Frequently asked questions
- MapReduce Checkpoint and Resume - Resume functionality details
- Environment Variables - Variable configuration
- Observability Configuration - Logging and monitoring