Observability and Logging¶

Prodigy provides comprehensive execution monitoring and debugging through event tracking, Claude execution logs, and configurable verbosity levels.

Overview¶

Observability features: - Event tracking: JSONL event streams for all operations - Claude observability: Detailed Claude execution logs with tool invocations - Verbosity control: Granular output control from clean to trace-level - Log analysis: Tools for inspecting execution history - Performance metrics: Token usage and timing information

graph TD
    Workflow[Workflow Execution] --> Events[Event System]
    Workflow --> Claude[Claude Commands]
    Workflow --> Verbosity[Verbosity Control]

    Events --> JSONL[JSONL Event Files<br/>~/.prodigy/events/]
    Events --> Types[Event Types<br/>AgentStarted, Completed, Failed]

    Claude --> JSONLog[JSON Logs<br/>~/.local/state/claude/logs/]
    Claude --> Tools[Tool Invocations]
    Claude --> Tokens[Token Usage]

    Verbosity --> Clean[Default: Clean Output]
    Verbosity --> Verbose["-v: Show Streaming"]
    Verbosity --> Debug["-vv: Debug Logs"]
    Verbosity --> Trace["-vvv: Trace Details"]

    JSONL --> Analysis[Log Analysis]
    JSONLog --> Analysis
    Analysis --> Debugging[Debugging & Monitoring]

    style Events fill:#e1f5ff
    style Claude fill:#fff3e0
    style Verbosity fill:#f3e5f5
    style Analysis fill:#e8f5e9

Figure: Prodigy's observability architecture showing event tracking, Claude logs, and verbosity control.

Event Tracking¶

All workflow operations are logged to JSONL event files:

~/.prodigy/events/{repo_name}/{job_id}/
└── events-{timestamp}.jsonl

Event Storage Best Practice

Events are stored globally in ~/.prodigy/events/ to enable cross-worktree aggregation. Multiple worktrees working on the same job share the same event log, making it easy to monitor parallel execution.

Event Types¶

AgentStarted - Agent execution begins:

{
  "type": "AgentStarted",
  "job_id": "mapreduce-123",
  "agent_id": "agent-1",
  "item_id": "item-1",
  "timestamp": "2025-01-11T12:00:00Z"
}

AgentCompleted - Agent finishes successfully:

{
  "type": "AgentCompleted",  // (1)!
  "job_id": "mapreduce-123",  // (2)!
  "agent_id": "agent-1",  // (3)!
  "duration": {"secs": 30, "nanos": 0},  // (4)!
  "commits": ["abc123", "def456"],  // (5)!
  "json_log_location": "/path/to/logs/session-xyz.json"  // (6)!
}

Event type indicating successful completion
MapReduce job identifier
Unique agent identifier for this work item
Total execution time for the agent
Git commits created during execution
Path to Claude's detailed JSON log for debugging

AgentFailed - Agent encounters errors:

{
  "type": "AgentFailed",
  "job_id": "mapreduce-123",
  "agent_id": "agent-1",
  "error": "Timeout after 300 seconds",
  "json_log_location": "/path/to/logs/session-xyz.json"
}

WorkItemProcessed - Item completion:

{
  "type": "WorkItemProcessed",
  "job_id": "mapreduce-123",
  "item_id": "item-1",
  "status": "completed",
  "result": {...}
}

CheckpointSaved - State persistence:

{
  "type": "CheckpointSaved",
  "job_id": "mapreduce-123",
  "phase": "map",
  "checkpoint_path": "/path/to/checkpoint.json",
  "timestamp": "2025-01-11T12:05:00Z"
}

ClaudeMessage - Claude interaction messages:

// Source: src/cook/execution/events/event_types.rs:164-169
{
  "type": "ClaudeMessage",
  "agent_id": "agent-1",
  "content": "Analyzing file structure...",
  "message_type": "assistant",
  "json_log_location": "/path/to/logs/session-xyz.json"
}

Event Organization¶

Events are organized by repository and job:

~/.prodigy/events/
└── prodigy/                    # (1)!
    ├── mapreduce-123/          # (2)!
    │   └── events-20250111.jsonl  # (3)!
    └── mapreduce-456/
        └── events-20250111.jsonl

Repository name for multi-repo support
Job ID groups all events for this MapReduce run
JSONL file with one event per line (append-only)

Claude Observability¶

Detailed Claude execution logs capture complete interactions:

JSON Log Location¶

Every Claude command creates a JSON log file:

~/.local/state/claude/logs/session-{session_id}.json

Log Contents¶

Complete conversation history: - User messages and prompts - Claude responses - Tool invocations with parameters - Tool results - Token usage statistics - Error details and stack traces

Accessing JSON Logs¶

Via Verbose Output (-v flag):

prodigy run workflow.yml -v

Output includes log location:

Executing: claude /my-command
Claude JSON log: /Users/user/.local/state/claude/logs/session-abc123.json
✓ Command completed

In MapReduce Events:

{
  "type": "AgentCompleted",
  "agent_id": "agent-1",
  "json_log_location": "/path/to/logs/session-xyz.json"
}

In DLQ Items:

{
  "item_id": "item-1",
  "failure_history": [{
    "error": "Command failed",
    "json_log_location": "/path/to/logs/session-xyz.json"
  }]
}

Analyzing JSON Logs¶

Common Log Analysis Tasks

The examples below show how to extract specific information from Claude JSON logs using jq. These patterns are useful for debugging agent failures, tracking token usage, and understanding Claude's decision-making process.

View complete conversation:

cat ~/.local/state/claude/logs/session-abc123.json | jq '.messages'

Check tool invocations:

cat ~/.local/state/claude/logs/session-abc123.json | \
  jq '.messages[].content[] | select(.type == "tool_use")'

Analyze token usage:

cat ~/.local/state/claude/logs/session-abc123.json | jq '.usage'

Extract errors:

cat ~/.local/state/claude/logs/session-abc123.json | \
  jq '.messages[] | select(.role == "assistant") | .content[] | select(.type == "error")'

Verbosity Control¶

Granular output control with verbosity flags:

Choosing the Right Verbosity Level

Start with default output for production workflows. Use -v when debugging Claude interactions or when you need to see streaming output. Reserve -vv and -vvv for deep troubleshooting of Prodigy internals.

Levels¶

Default (verbosity = 0): - Clean, minimal output - Progress indicators - Results only

Verbose (-v, verbosity = 1): - Claude streaming JSON output - Command execution details - Log file locations

Debug (-vv, verbosity = 2): - Internal debug logs - Execution traces - State transitions

Trace (-vvv, verbosity = 3): - Trace-level internal logging - Full execution details - Performance metrics

Usage¶

# Default: clean output
prodigy run workflow.yml

# Verbose: show Claude streaming
prodigy run workflow.yml -v

# Debug: internal logs
prodigy run workflow.yml -vv

# Trace: maximum detail
prodigy run workflow.yml -vvv

Environment Override¶

Force streaming output regardless of verbosity:

export PRODIGY_CLAUDE_CONSOLE_OUTPUT=true
prodigy run workflow.yml

Debugging MapReduce Failures¶

Using JSON Logs¶

When a MapReduce agent fails, use this debugging workflow:

flowchart TD
    Start[Agent Fails] --> DLQ[Check DLQ Item]
    DLQ --> GetLog{json_log_location<br/>present?}

    GetLog -->|Yes| InspectLog[Inspect Claude JSON Log]
    GetLog -->|No| CheckEvents[Check Event Stream]

    InspectLog --> FindError[Find Failing Tool/Message]
    FindError --> Context[Analyze Context]

    Context --> Messages[Review Message History]
    Context --> Tools[Check Tool Invocations]
    Context --> Tokens[Examine Token Usage]
    Context --> Errors[Extract Error Details]

    Messages --> Root[Identify Root Cause]
    Tools --> Root
    Tokens --> Root
    Errors --> Root

    Root --> Fix[Apply Fix]
    Fix --> Retry[Retry via DLQ]

    CheckEvents --> EventLog[Parse Event JSONL]
    EventLog --> Root

    style Start fill:#ffebee
    style GetLog fill:#fff3e0
    style Root fill:#e8f5e9
    style Fix fill:#e1f5ff

Figure: MapReduce debugging workflow showing how to trace failures using JSON logs and events.

When a MapReduce agent fails:

Check DLQ for json_log_location:

prodigy dlq show <job_id> | jq '.items[].failure_history[].json_log_location'

Inspect the Claude JSON log:

cat /path/from/step1/session-xyz.json | jq

Identify failing tool:

cat /path/from/step1/session-xyz.json | jq '.messages[-3:]'

Understand context:
Review full conversation history
Check tool invocations and results
Examine token usage for context issues
Look for error messages

Performance Metrics¶

Token Usage¶

Track token consumption:

{
  "usage": {
    "input_tokens": 1234,
    "output_tokens": 567,
    "cache_read_tokens": 89,
    "cache_creation_tokens": 0
  }
}

Execution Timing¶

Monitor performance:

{
  "timings": {
    "step1": {"secs": 10, "nanos": 500000000},
    "step2": {"secs": 25, "nanos": 0},
    "total": {"secs": 35, "nanos": 500000000}
  }
}

Event Query Examples¶

Correlation IDs¶

Events include optional correlation IDs for tracing related operations across multiple agents:

// Source: src/storage/types.rs:75
{
  "type": "AgentStarted",
  "job_id": "mapreduce-123",  // (1)!
  "agent_id": "agent-1",  // (2)!
  "correlation_id": "trace-abc-123",  // (3)!
  "timestamp": "2025-01-11T12:00:00Z"
}

Job identifier - groups all agents in this MapReduce run
Agent identifier - unique to this work item
Correlation ID - traces related operations across agents (optional)

Filter events by correlation ID:

# Source: src/cook/execution/events/filter.rs:63
# Find all events related to a specific workflow trace
cat ~/.prodigy/events/prodigy/mapreduce-123/events-*.jsonl | \
  jq -c 'select(.correlation_id == "trace-abc-123")'

Track an agent workflow end-to-end:

# Get correlation ID from initial event
CORRELATION_ID=$(cat events.jsonl | jq -r 'select(.type == "AgentStarted") | .correlation_id' | head -1)

# Find all related events
cat events.jsonl | jq -c "select(.correlation_id == \"$CORRELATION_ID\")"

Find Failed Agents¶

cat ~/.prodigy/events/prodigy/mapreduce-123/events-*.jsonl | \
  jq -c 'select(.type == "AgentFailed")'

Calculate Success Rate¶

# Count completed
completed=$(cat events.jsonl | jq 'select(.type == "AgentCompleted")' | wc -l)

# Count failed
failed=$(cat events.jsonl | jq 'select(.type == "AgentFailed")' | wc -l)

# Calculate rate
echo "Success rate: $(($completed * 100 / ($completed + $failed)))%"

Find Slowest Agents¶

cat events.jsonl | \
  jq -c 'select(.type == "AgentCompleted") | {agent_id, duration: .duration.secs}' | \
  sort -k2 -n -r | \
  head -10

Log Management¶

Log Locations¶

LinuxmacOSWindows

# Prodigy events
~/.prodigy/events/{repo_name}/{job_id}/

# Claude logs
~/.local/state/claude/logs/

# Session state
~/.prodigy/sessions/

# Checkpoints
~/.prodigy/state/{repo_name}/

# Prodigy events
~/.prodigy/events/{repo_name}/{job_id}/

# Claude logs
~/.local/state/claude/logs/

# Session state
~/.prodigy/sessions/

# Checkpoints
~/.prodigy/state/{repo_name}/

# Prodigy events
%USERPROFILE%\.prodigy\events\{repo_name}\{job_id}\

# Claude logs
%USERPROFILE%\.local\state\claude\logs\

# Session state
%USERPROFILE%\.prodigy\sessions\

# Checkpoints
%USERPROFILE%\.prodigy\state\{repo_name}\

Log Storage Considerations

Claude JSON logs can grow large with extensive tool usage. Monitor disk space when running many MapReduce agents. Consider setting up automated cleanup for logs older than 30 days in production environments.

Cleanup¶

# Clean old event logs (older than 30 days)
find ~/.prodigy/events -name "*.jsonl" -mtime +30 -delete

# Clean old Claude logs
find ~/.local/state/claude/logs -name "*.json" -mtime +30 -delete

# Clean completed sessions
prodigy sessions clean --completed

Examples¶

Debug Workflow Failure¶

# Run with verbose output
prodigy run workflow.yml -v

# Check event log for errors
cat ~/.prodigy/events/prodigy/latest/events-*.jsonl | \
  jq -c 'select(.type == "AgentFailed")'

# Inspect Claude log
cat $(jq -r '.json_log_location' dlq-item.json) | jq '.messages[-5:]'

Monitor MapReduce Progress¶

# Run in verbose mode
prodigy run mapreduce.yml -v &

# Watch event stream
tail -f ~/.prodigy/events/prodigy/mapreduce-123/events-*.jsonl | \
  jq -c 'select(.type == "AgentCompleted")'

Analyze Token Usage¶

# Extract token usage from all agents
for log in ~/.local/state/claude/logs/session-*.json; do
  echo "$log:"
  jq '.usage' "$log"
done