Session Management¶
Prodigy provides unified session tracking and lifecycle management for both standard workflows and MapReduce jobs.
Overview¶
Session management features:
- Unified tracking: Single system for workflows and MapReduce
- Lifecycle states: Running, Paused, Completed, Failed, Cancelled
- State persistence: Sessions stored in ~/.prodigy/sessions/
- Resume capabilities: Resume from checkpoints with session or job IDs
- Concurrent protection: Lock-based prevention of simultaneous resumes
Session Types¶
Workflow Sessions¶
Standard workflow execution tracking:
{
"id": "session-abc123",
"session_type": "Workflow",
"status": "Running",
"started_at": "2025-01-11T12:00:00Z",
"workflow_data": {
"workflow_name": "my-workflow",
"current_step": 2,
"total_steps": 5,
"completed_steps": [0, 1],
"variables": {},
"files_changed": 3,
"worktree_name": "session-abc123"
}
}
MapReduce Sessions¶
MapReduce job state management:
// Source: src/unified_session/state.rs:117-125
{
"id": "session-mapreduce-xyz",
"session_type": "MapReduce",
"status": "Running",
"started_at": "2025-01-11T12:00:00Z",
"mapreduce_data": {
"job_id": "mapreduce-xyz",
"phase": "map",
"total_items": 100,
"processed_items": 45,
"failed_items": 2,
"agent_count": 0,
"reduce_results": null
}
}
Session Lifecycle¶
States¶
- Running: Active execution in progress
- Paused: Interrupted, ready to resume from checkpoint
- Completed: Successfully finished
- Failed: Terminated with errors
- Cancelled: User-initiated stop
State Transitions¶
stateDiagram-v2
[*] --> Running: Workflow Start
Running --> Paused: Interrupt (Ctrl+C)
Running --> Completed: Success
Running --> Failed: Error
Running --> Cancelled: User Stop
Paused --> Running: Resume
Paused --> Cancelled: Cleanup
Completed --> [*]
Failed --> [*]
Cancelled --> [*]
note right of Running
Checkpoint created
State persisted
end note
note right of Paused
Resumable state
Work preserved
end note
Figure: Session lifecycle showing state transitions and checkpoint behavior.
Session Storage¶
Sessions are stored globally for easy access:
Session File Structure¶
{
"id": "session-abc123", // (1)!
"session_type": "Workflow", // (2)!
"status": "Paused", // (3)!
"started_at": "2025-01-11T12:00:00Z",
"updated_at": "2025-01-11T12:05:00Z",
"completed_at": null,
"metadata": {
"execution_start_time": "2025-01-11T12:00:00Z",
"workflow_type": "standard",
"total_steps": 5,
"current_step": 2
},
"checkpoints": [ // (4)!
"checkpoint-1-2025-01-11T12:02:00Z"
],
"timings": { // (5)!
"step1": {"secs": 10, "nanos": 0},
"step2": {"secs": 15, "nanos": 0}
},
"error": null
}
- Unique session identifier used for resume commands
- Session type:
WorkfloworMapReduce - Current lifecycle state:
Running,Paused,Completed,Failed,Cancelled - List of checkpoint identifiers for resume operations
- Execution timing for performance analysis
Resume Capabilities¶
Resume Command¶
Resume sessions using session ID or job ID:
# Resume with session ID
prodigy resume session-abc123
# Resume with job ID (MapReduce)
prodigy resume-job mapreduce-xyz
# Unified resume (auto-detects ID type)
prodigy resume mapreduce-xyz
Resume ID Types
You can use either session IDs or job IDs to resume. Prodigy automatically detects the ID type and loads the correct session.
Session-Job Mapping¶
Bidirectional mapping enables resume with either ID:
Each mapping contains:
// Source: src/storage/session_job_mapping.rs:14-26
{
"session_id": "session-mapreduce-xyz",
"job_id": "mapreduce-xyz",
"workflow_name": "my-workflow",
"created_at": "2025-01-11T12:00:00Z"
}
Checkpoint-Based Resume¶
Resume reconstructs state from checkpoints:
- Load session from
~/.prodigy/sessions/ - Verify session is in Paused state
- Load checkpoint from
~/.prodigy/state/ - Reconstruct execution context
- Continue from last completed step
flowchart TD
Start[Resume Command] --> Load[Load Session File]
Load --> Verify{Status = Paused?}
Verify -->|No| Error[Error: Not Resumable]
Verify -->|Yes| Checkpoint[Load Checkpoint]
Checkpoint --> Lock{Acquire Lock}
Lock -->|Failed| Blocked[Error: Already Running]
Lock -->|Success| Context[Reconstruct Context]
Context --> Variables[Restore Variables]
Context --> State[Restore Work State]
Variables --> Continue[Continue Execution]
State --> Continue
Continue --> Complete[Update Session]
Complete --> Release[Release Lock]
style Start fill:#e1f5ff
style Continue fill:#e8f5e9
style Error fill:#ffebee
style Blocked fill:#ffebee
Figure: Resume flow showing checkpoint loading, lock acquisition, and state reconstruction.
State Preservation¶
Variables and Context¶
Preserved across resume: - Workflow variables - Captured command outputs - Environment variables - Map results (in MapReduce)
Work Item State¶
In MapReduce workflows: - Completed items: Preserved with full results - In-progress items: Moved back to pending on resume - Failed items: Tracked in DLQ with retry counts - Pending items: Continue processing from where left off
In-Progress Items on Resume
When a MapReduce workflow is interrupted, any items that were being processed by agents are moved back to the pending queue. This ensures they are reprocessed on resume and prevents partial results.
Concurrent Resume Protection¶
Prodigy prevents multiple resumes of the same session:
RAII-Based Locking¶
Lock Metadata¶
Lock files contain:
{
"pid": 12345,
"hostname": "machine.local",
"acquired_at": "2025-01-11T12:00:00Z",
"session_id": "session-abc123"
}
Stale Lock Detection¶
- Platform-specific process checks (Unix:
kill -0, Windows:tasklist) - Automatic cleanup of locks from crashed processes
- New resume succeeds after stale lock removal
Automatic Stale Lock Cleanup
If a resume process crashes or is killed, the lock file may remain. Prodigy automatically detects when the holding process is no longer running and removes stale locks, allowing new resume attempts to succeed.
Lock Errors¶
If resume blocked by active lock:
Error: Resume already in progress for job mapreduce-xyz
Lock held by: PID 12345 on hostname (acquired 2025-01-11 10:30:00 UTC)
Please wait for the other process to complete.
Concurrent Resume Prevention
Attempting to resume a session that is already being resumed will fail with a lock error. Wait for the other process to complete, or verify the process is still running before manually removing the lock file.
Session Management Commands¶
List Sessions¶
# List all sessions
prodigy sessions list
# Filter by status
prodigy sessions list --status paused
# Show details
prodigy sessions show session-abc123
Clean Sessions¶
# Clean completed sessions
prodigy sessions clean
# Clean specific session
prodigy sessions clean session-abc123
# Force cleanup
prodigy sessions clean -f
Session Metadata¶
Execution Timing¶
Track performance metrics:
{
"timings": {
"step1": {"secs": 10, "nanos": 0},
"step2": {"secs": 15, "nanos": 0},
"total": {"secs": 25, "nanos": 0}
}
}
Progress Tracking¶
Monitor execution progress:
Examples¶
Resume Interrupted Workflow¶
Workflow Resume
Resume MapReduce Job¶
MapReduce Resume
Prevent Concurrent Resume¶
Concurrent Resume Protection