Troubleshooting¶

This chapter provides comprehensive guidance for diagnosing and resolving common issues with Prodigy workflows. Whether you're experiencing MapReduce failures, checkpoint issues, or variable interpolation problems, you'll find practical solutions here.

Common Issues¶

Variables not interpolating¶

Symptoms: Literal ${var} appears in output instead of value

Causes: - Variable name typo or case mismatch - Variable not in scope - Incorrect syntax - Variable not captured

Solutions: - Check variable name spelling and case sensitivity - Verify variable is available in current scope (step vs workflow) - Ensure proper syntax: ${var} not $var for complex expressions - Verify capture_output command succeeded - Check variable was set before use (e.g., in previous step)

MapReduce items not found¶

Symptoms: No items to process, empty JSONPath result, or "items.json not found"

Causes: - Input file doesn't exist - Incorrect JSONPath - Setup phase failed - Wrong file format

Solutions: - Verify input file exists with correct path - Test JSONPath expression with jsonpath-cli or jq - Check json_path field syntax (default: $[*]) - Ensure setup phase generated the input file successfully - Validate JSON format with jq or json validator

Timeout errors¶

Symptoms: Commands or phases timing out before completion

Causes: - Operation too slow - Insufficient timeout - Hung processes - Deadlock

Solutions: - Increase timeout value for long operations - Optimize command execution for better performance - Split work into smaller chunks (use max_items, offset) - Check for hung processes with ps or top - Look for deadlocks in concurrent operations - Use agent_timeout_secs for MapReduce agents

Checkpoint resume not working¶

Symptoms: Resume starts from beginning, fails to load state, or "checkpoint not found"

Causes: - Checkpoint files missing - Wrong session/job ID - Workflow changed - Concurrent resume

Solutions: - Verify checkpoint files exist in ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/ - Check session/job ID is correct with prodigy sessions list - Ensure workflow file hasn't changed significantly - Check for concurrent resume lock in ~/.prodigy/resume_locks/ - Review checkpoint file contents for corruption

See MapReduce Checkpoint and Resume for detailed information.

DLQ items not retrying or re-failing¶

Symptoms: Retry command fails, items immediately fail again, or no progress

Causes: - Systematic error not transient - DLQ file corrupted - Underlying issue not fixed

Solutions: - Check DLQ file format and contents with prodigy dlq show <job_id> - Verify error was transient not systematic (e.g., rate limit vs bug) - Fix underlying issue before retry (e.g., API credentials, file permissions) - Increase max-parallel for retry if parallelism helps - Check json_log_location in DLQ for detailed error info

See Dead Letter Queue (DLQ) for complete DLQ management details.

Worktree cleanup failures¶

Symptoms: Orphaned worktrees after failures, "permission denied" on cleanup

Causes: - Locked files - Running processes - Permission issues - Disk full

Solutions: - Use prodigy worktree clean-orphaned <job_id> for automatic cleanup - Check for locked files with lsof or similar tools - Verify no running processes using worktree with ps - Check disk space with df -h - Verify file permissions on worktree directory - Manual cleanup if necessary: rm -rf ~/.prodigy/worktrees/<path>

For more on cleanup failures, see "Cleanup Failure Handling (Spec 136)" in the CLAUDE.md file.

Environment variables not resolved¶

Symptoms: Literal ${VAR} or $VAR appears in commands instead of value

Causes: - Variable not defined - Wrong profile - Scope issue - Syntax error

Solutions: - Check variable defined in env, secrets, or profiles section - Verify correct profile activated with --profile flag - Use proper syntax: ${VAR} for workflow vars, $VAR may work for shell - Check variable scope (global vs step-level) - Ensure env_files loaded correctly

See Environment Variables for variable configuration details.

Git context variables empty¶

Symptoms: ${step.files_added} returns empty string or undefined

Causes: - No commits created - Git repo not initialized - Step not completed - Wrong format

Solutions: - Ensure commands created commits (use commit_required: true) - Check git repository is initialized in working directory - Verify step completed before accessing variables - Use appropriate format modifier (e.g., :json, :newline) - Check git status to verify changes exist

See Advanced Git Context for available git variables.

Foreach iteration failures¶

Symptoms: Foreach command fails partway through, items skipped, or parallel execution errors

Causes: - Command failure with continue_on_error disabled - Parallel execution resource exhaustion - Variable interpolation errors in item context - Max items limit reached unexpectedly

Solutions: - Enable continue_on_error to process remaining items on failure - Reduce parallelism: parallel: 5 instead of parallel: true - Verify ${item}, ${index}, ${total} variable interpolation - Check max_items setting matches expectations - Review progress bar output for failure patterns - Use shell command for debugging: shell: "echo Processing ${item}"

Example foreach with error handling:

foreach:
  input:
    list: ["file1.py", "file2.py", "file3.py"]
  parallel: 5
  max_items: 10
  continue_on_error: true
  commands:
    - shell: "echo Processing ${item} (${index}/${total})"
    - claude: "/refactor ${item}"

Source: src/cook/execution/foreach.rs:44-515

Workflow composition errors¶

Symptoms: "Template not found", "Circular dependency detected", "Required parameter not provided"

Causes: - Missing or unregistered templates - Circular extends/imports chains - Required parameters not provided - Path resolution issues

Solutions: - Verify template exists and is registered: prodigy template list - Register template if needed: prodigy template register <path> - Check for circular dependencies in extends/imports chains - Provide required parameters via --param NAME=value or --param-file - Review template parameter definitions for requirements - Check template paths are correct (relative to registry or filesystem)

See Common Error Messages for specific composition error details.

Validate command failures¶

Symptoms: "Schema validation failed", "Threshold not met", "Gap-filling failed", validate command error

Causes: - Validation output doesn't match expected_schema - Completeness percentage below threshold - Invalid JSON output from validation command - Timeout during validation - Gap-filling commands fail

Solutions: - Test validation command independently and check JSON output - Verify expected_schema matches actual validation output structure - Check completeness threshold is realistic: threshold: 80.0 - Increase validation timeout if needed: timeout: 300 - Ensure validation command writes proper JSON to stdout or result_file - Review gap-filling commands in on_incomplete section - Use verbose mode to see validation output: prodigy run workflow.yml -v

Example validate configuration:

# Source: src/cook/workflow/validation.rs:10-49
validate:
  shell: "cargo test --no-fail-fast -- --format json"
  expected_schema:
    type: "object"
    required: ["passed", "total"]
  threshold: 90.0
  on_incomplete:
    commands:
      - claude: "/fix-failing-tests"

Source: src/cook/workflow/validation.rs:10-49

Write file failures¶

Symptoms: "Permission denied", "Directory not found", "Invalid format", file not created or corrupted

Causes: - Parent directory doesn't exist and create_dirs not enabled - Insufficient permissions to write to path - Invalid JSON/YAML content when using format validation - Variable interpolation error in path or content - Invalid file mode permissions

Solutions: - Enable create_dirs to auto-create parent directories: create_dirs: true - Check directory permissions: ls -ld $(dirname path/to/file) - Verify format validation for JSON/YAML: test content with jq or yq - Test variable interpolation independently: echo "${var}" - Ensure file mode is valid octal: mode: "0644" not mode: "644" - Use absolute paths or verify working directory context - Check disk space: df -h

Example write_file configuration:

# Source: src/config/command.rs:278-298
- write_file:
    path: "output/results-${item.id}.json"
    content: |
      {
        "item_id": "${item.id}",
        "status": "completed"
      }
    format: json
    create_dirs: true
    mode: "0644"

Source: src/config/command.rs:278-313, tests/write_file_integration_test.rs

Claude command fails with "command not found"¶

Symptoms: Shell error about claude command not existing

Causes: - Claude Code not installed - Not in PATH - Wrong executable name

Solutions: - Install Claude Code CLI if not present - Verify claude is in PATH with which claude - Check command name matches Claude Code CLI (not "claude-code") - Use full path if necessary: /path/to/claude

Debug Tips¶

Use dry-run mode to preview execution¶

prodigy run workflow.yml --dry-run

Shows: Preview of commands that would be executed without actually running them

Use when: Verifying workflow steps before execution, testing variable interpolation, checking command syntax

Source: src/cli/args.rs:64

Use verbose mode for execution details¶

prodigy run workflow.yml -v

Shows: Claude streaming output, tool invocations, and execution timeline

Use when: Understanding what Claude is doing, debugging tool calls

Check Claude JSON logs for full interaction¶

prodigy logs --latest --summary

Shows: Full Claude interaction including messages, tools, token usage, errors

Use when: Claude command failed, understanding why Claude made certain decisions

For more on Claude JSON logs, see the "Viewing Claude Execution Logs (Spec 126)" section in the project CLAUDE.md file.

Inspect event logs for execution timeline¶

# List events for a job
prodigy events ls --job-id <job_id>

# Follow events in real-time
prodigy events follow --job-id <job_id>

# Show event statistics
prodigy events stats

Shows: Detailed execution timeline, agent starts/completions, durations, real-time event stream

Use when: Understanding workflow execution flow, finding bottlenecks, monitoring active jobs

Source: src/cli/commands/events.rs:22-98

Review DLQ for failed item details¶

prodigy dlq show <job_id>

Shows: Failed items with full error details, retry history, json_log_location

Use when: MapReduce items failing, understanding failure patterns

Check checkpoint state for resume issues¶

Location: ~/.prodigy/state/{repo}/mapreduce/jobs/{job_id}/

Shows: Saved execution state, completed items, variables, phase progress

Use when: Resume not working, understanding saved state

Examine worktree git log for commits¶

cd ~/.prodigy/worktrees/{repo}/{session}/ && git log

Shows: All commits created during workflow execution with full details

Use when: Understanding what changed, verifying commits created

Tail Claude JSON log in real-time¶

prodigy logs --latest --tail

Shows: Live streaming of Claude JSON log as it's being written

Use when: Watching long-running Claude command, debugging in real-time

Additional Topics¶

For more specific troubleshooting guidance, see: - FAQ - Frequently asked questions - Common Error Messages - Specific error messages explained