Troubleshooting Reference¶

Quick reference guide for diagnosing and resolving Prodigy issues. For detailed troubleshooting guidance, see the Troubleshooting chapter.

Quick Diagnostics¶

When Something Goes Wrong¶

flowchart TD
    Start[Issue Detected] --> Verbose[Run with -v flag]
    Verbose --> Logs{Check Logs}

    Logs -->|Claude interaction| ClaudeLogs[prodigy logs --latest]
    Logs -->|Execution flow| Events[prodigy events ls]
    Logs -->|Failed items| DLQ[prodigy dlq show]

    ClaudeLogs --> State{Check State}
    Events --> State
    DLQ --> State

    State -->|Session info| Sessions[~/.prodigy/sessions/]
    State -->|Checkpoints| Checkpoints[~/.prodigy/state/]
    State -->|Worktrees| Worktrees[prodigy worktree ls]

    Sessions --> Resolve[Resolve Issue]
    Checkpoints --> Resolve
    Worktrees --> Resolve

    style Start fill:#ffebee
    style Verbose fill:#e1f5ff
    style Logs fill:#fff3e0
    style State fill:#f3e5f5
    style Resolve fill:#e8f5e9

Figure: Diagnostic workflow for troubleshooting Prodigy issues.

Quick Diagnostics Checklist

Follow this sequence when troubleshooting:

Check verbosity: Run with -v flag to see detailed output
Inspect logs: Use prodigy logs --latest --summary for Claude interactions
Review events: Use prodigy events ls --job-id <job_id> for execution timeline
Check DLQ: Use prodigy dlq show <job_id> for failed items (MapReduce only)
Verify state: Check ~/.prodigy/state/ for checkpoints and session state

Common Error Patterns¶

Symptom	Likely Cause	Quick Fix
Variables show as `${var}`	Wrong syntax or undefined	Check spelling, use `${var}` syntax
"Session not found"	Wrong ID or expired	Use `prodigy sessions list` to find correct ID
"Command not found: claude"	Claude not in PATH	Install Claude Code or add to PATH
"No items to process"	Wrong JSONPath or missing file	Verify input file exists, test JSONPath
Cleanup fails	Locked files or permissions	Use `prodigy worktree clean-orphaned <job_id>`
Resume starts over	No checkpoint or wrong ID	Check `~/.prodigy/state/` for checkpoint files
High map phase failures	Resource contention	Reduce `max_parallel`, increase timeout

Issue Categories¶

MapReduce Issues¶

Agents failing silently: - Check: prodigy dlq show <job_id> - Inspect: json_log_location field in DLQ entries - See: Dead Letter Queue (DLQ)

Checkpoint resume not working: - Check: ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/ - Verify: Session/job ID with prodigy sessions list - See: Checkpoint and Resume

Concurrent resume blocked: - Check: ~/.prodigy/resume_locks/{job_id}.lock - Verify: Process still running with PID from error message - Clean: Remove lock file if process is dead - See: CLAUDE.md "Concurrent Resume Protection (Spec 140)"

Cleanup failures: - Use: prodigy worktree clean-orphaned <job_id> - Check: Locked files with lsof | grep worktree-path - See: CLAUDE.md "Cleanup Failure Handling (Spec 136)"

Session and Resume Issues¶

Resume fails with "session not found": - List sessions: prodigy sessions list - Try job ID: prodigy resume-job <job_id> or prodigy resume <job_id> - Check state: ~/.prodigy/sessions/{session-id}.json

Session state corrupted: - Check: Session file in ~/.prodigy/sessions/ - Verify: Checkpoint files in ~/.prodigy/state/ - Last resort: Start new workflow run

Variable and Interpolation Issues¶

Variables not interpolating: - Check syntax: ${var} not $var for complex expressions - Verify scope: Variable defined at workflow/step level - Check spelling: Variable names are case-sensitive - See: Environment Variables

Environment variables empty: - Verify: Variable defined in env section - Check profile: Use --profile <name> if using profiles - Test: echo "$VAR" in shell command to verify

Performance Issues¶

Resource Contention

High max_parallel values can exhaust system resources. Start with 5-10 agents and monitor performance before increasing.

Slow MapReduce execution: - Reduce: max_parallel to avoid resource exhaustion - Increase: agent_timeout_secs if agents timeout - Split: Use max_items and offset for chunking - Check: System resource usage with top or htop

High resource usage: - Lower parallelism in map phase - Reduce context size in Claude commands - Check for memory leaks in custom commands - Monitor: prodigy events stats for bottlenecks

Performance Tuning

For optimal MapReduce performance:

10-1000 work items: Sweet spot for parallelism benefits
10 sec - 5 min per item: Ideal task duration
Start small: Test with max_items: 10 before full run

Timeout errors: - Increase: timeout field in command configuration - Split: Large operations into smaller steps - Check: For hung processes with ps aux | grep prodigy

Worktree Problems¶

Cleanup Failures

If cleanup fails during MapReduce execution, the agent is still marked as successful and results are preserved. Use prodigy worktree clean-orphaned <job_id> to clean up later.

Orphaned worktrees: - List: prodigy worktree ls - Clean: prodigy worktree clean-orphaned <job_id> - Manual: rm -rf ~/.prodigy/worktrees/{path} (last resort)

Merge conflicts: - Use: Custom merge workflow with conflict resolution - Review: Git status in worktree before merge - See: CLAUDE.md "Custom Merge Workflows"

Worktree locked: - Check: Running processes with lsof ~/.prodigy/worktrees/{path} - Kill: Process if safe, or wait for completion - Clean: Use prodigy worktree clean -f if necessary

Debugging Techniques¶

Verbosity Levels¶

# Default: Clean output
prodigy run workflow.yml              # (1)!

# Verbose: Show Claude streaming output
prodigy run workflow.yml -v           # (2)!

# Very verbose: Add debug logs
prodigy run workflow.yml -vv          # (3)!

# Trace: Maximum detail
prodigy run workflow.yml -vvv         # (4)!

Minimal output for production workflows - shows only progress and results
Adds Claude JSON streaming output for debugging interactions
Adds debug-level logs from Prodigy internals
Maximum verbosity including trace-level execution details

Log Inspection¶

Claude JSON Logs

Every Claude command creates a streaming JSONL log with full conversation history:

# View latest log with summary
prodigy logs --latest --summary

# Follow log in real-time
prodigy logs --latest --tail

# View specific log file
cat ~/.claude/projects/{worktree-path}/{uuid}.jsonl | jq -c '.'

Log location is displayed after each Claude command execution.

Event Logs

Track workflow execution and identify bottlenecks:

# List events for job
prodigy events ls --job-id <job_id>

# Follow events in real-time
prodigy events follow --job-id <job_id>

# Show statistics
prodigy events stats

Source: src/cli/commands/events.rs:22-98

State Inspection¶

Session StateCheckpoint StateDLQ Contents

# List all sessions
prodigy sessions list

# View session details
cat ~/.prodigy/sessions/{session-id}.json | jq '.'

# List checkpoints for job
ls ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/

# View checkpoint contents
cat ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/map-checkpoint-*.json | jq '.'

# Show failed items
prodigy dlq show <job_id>

# View DLQ file directly
cat ~/.prodigy/dlq/{repo}/{job_id}.json | jq '.'

Git Context¶

Worktree inspection:

# Navigate to worktree
cd ~/.prodigy/worktrees/{repo}/{session}/

# View git log
git log --oneline -10

# Check status
git status

# View diff
git diff HEAD~1

Error Messages¶

For detailed explanations of specific error messages, see:

Common Error Messages

Common error patterns:

"session not found": Session ID incorrect or expired
"command not found: claude": Claude Code not installed or not in PATH
"no items to process": Input file missing or JSONPath incorrect
"cleanup failed": Locked files or permission issues
"resume already in progress": Concurrent resume protection active
"checkpoint not found": Checkpoint files missing or wrong ID

Best Practices for Debugging¶

Debugging Workflow

Follow this systematic approach to diagnose issues quickly:

Start with verbosity: Always use -v flag when debugging
Check logs first: Claude JSON logs contain full interaction details
Review events: Event timeline shows execution flow and bottlenecks
Inspect DLQ early: Failed items in DLQ indicate systematic issues
Verify state: Check checkpoint and session files for corruption
Test incrementally: Use --dry-run to preview execution
Monitor resources: Watch CPU, memory, disk during execution
Use specialized tools: prodigy events, prodigy logs, prodigy dlq

For comprehensive debugging strategies, see the Troubleshooting Guide.

Troubleshooting Guide - Detailed troubleshooting for all issues
FAQ - Frequently asked questions
MapReduce Checkpoint and Resume - Resume functionality details
Environment Variables - Variable configuration
Observability Configuration - Logging and monitoring