MapReduce Worktree Architecture¶
MapReduce workflows in Prodigy use an isolated git worktree architecture that ensures the main repository remains untouched during workflow execution. This chapter explains the worktree hierarchy, branch naming conventions, merge flows, and debugging strategies.
Overview¶
When you run a MapReduce workflow, Prodigy creates a hierarchical worktree structure:
Main Repository (untouched during execution)
↓
Parent Worktree (session-mapreduce-{id})
├── Setup Phase → Executes here
├── Reduce Phase → Executes here
└── Map Phase → Each agent in child worktree
├── Child Worktree (mapreduce-agent-{id})
├── Child Worktree (mapreduce-agent-{id})
└── Child Worktree (mapreduce-agent-{id})
This architecture provides complete isolation, allowing parallel agents to work independently while preserving a clean main repository.
Worktree Hierarchy¶
Parent Worktree¶
Created at the start of MapReduce workflow execution:
Location: ~/.prodigy/worktrees/{project}/session-mapreduce-{timestamp}
Purpose: - Isolates all workflow execution from main repository - Hosts setup phase execution - Hosts reduce phase execution - Serves as merge target for agent results
Branch: Follows prodigy-{session-id} pattern. The session ID includes a timestamp (e.g., session-mapreduce-20250112_143052), so the full branch name becomes prodigy-session-mapreduce-20250112_143052 (source: src/worktree/builder.rs:176-178)
Worktree Allocation Strategies:
All worktrees in Prodigy have names, paths, and git branches - the distinction is in how they're allocated:
-
Directly-Created Worktrees: MapReduce coordinators create session worktrees with explicit, predictable names (e.g.,
session-mapreduce-20250112_143052). These have deterministic paths and are easy to locate. -
Pool-Allocated Worktrees: When agents request worktrees via
WorktreeRequest::Anonymous(source: src/cook/execution/mapreduce/resources/worktree.rs:42), they receive pre-allocated worktrees from a shared pool. These worktrees have pool-assigned names rather than request-specific names. The pool allocation strategy enables efficient resource reuse across multiple agents.
Important: Both allocation strategies produce worktrees with full identity (name, path, branch). The difference is in naming predictability and resource management approach.
Child Worktrees¶
Created for each map agent:
Location: ~/.prodigy/worktrees/{project}/mapreduce-agent-{agent_id}
Purpose: - Complete isolation per agent - Independent failure handling - Parallel execution safety
Branch: Follows prodigy-{worktree-name} pattern (branched from parent worktree)
Resource Management: Agent worktrees can be acquired through two strategies:
-
Worktree Pool (preferred): Agents first attempt to acquire pre-allocated worktrees from a
WorktreePool. This reduces creation overhead and enables efficient resource reuse. -
Direct Creation (fallback): If the pool is exhausted or unavailable, agents fall back to creating dedicated worktrees via
WorktreeManager.
The acquire_session method implements this pool-first strategy, ensuring optimal resource utilization while maintaining isolation guarantees.
Note: The agent_id in the location path encodes the work item information. Agent worktrees are created dynamically as map agents execute.
Branch Naming Conventions¶
Prodigy uses consistent branch naming to track worktree relationships:
Parent Worktree Branch¶
Format: prodigy-{session-id}
The branch name follows the universal worktree pattern where all worktrees use prodigy-{name}. For MapReduce workflows, the session ID itself includes the timestamp, so the full branch name looks like:
Example: prodigy-session-mapreduce-20250112_143052
This is prodigy- + the session ID session-mapreduce-20250112_143052
Agent Worktree Branch¶
Format: prodigy-{worktree-name}
All worktrees in Prodigy follow the universal prodigy-{name} branch naming pattern (source: src/worktree/builder.rs:178). The worktree name itself varies based on the allocation strategy:
Pool-Allocated Worktrees: When agents acquire worktrees from the pre-allocated pool, the worktree name is generated by the pool and may not follow a predictable pattern. These are still tracked by the consistent prodigy-{name} branch format.
Directly-Created Worktrees: When agents create dedicated worktrees (fallback when pool is exhausted), the name typically encodes job and agent information.
Example: prodigy-mapreduce-agent-mapreduce-20251109_193734_agent_22
This is prodigy- + the worktree name mapreduce-agent-mapreduce-20251109_193734_agent_22
Directly-Created Worktree Name Components:
- mapreduce-agent-: Indicates this is a MapReduce agent worktree
- {job_id}: The MapReduce job identifier (includes timestamp)
- _agent_{n}: Sequential agent number within the job
Note: The branch naming is always consistent (prodigy-{name}), but worktree naming varies based on allocation strategy.
Merge Flow¶
MapReduce workflows involve multiple merge operations to aggregate results:
1. Agent Merge (Child → Parent)¶
When an agent completes successfully:
Process: 1. Agent completes all commands successfully 2. Agent commits changes to its branch 3. Merge coordinator adds agent to merge queue 4. Sequential merge into parent worktree branch 5. Child worktree cleanup
2. MapReduce to Parent Merge¶
After all map agents complete and reduce phase finishes:
Process: 1. All agents merged into parent worktree 2. Reduce phase executes in parent worktree 3. User confirms merge to main repository 4. Sequential merge with conflict detection 5. Parent worktree cleanup
Merge Strategies¶
Fast-Forward When Possible: If no divergence, use fast-forward merge
Three-Way Merge: When branches have diverged, perform three-way merge
Conflict Handling: Stop and report conflicts for manual resolution
Agent Merge Details¶
Merge Queue¶
Agents are added to a merge queue as they complete:
Queue Architecture: Merge queue is managed in-memory by a background worker task using a tokio unbounded mpsc channel (mpsc::unbounded_channel::<MergeRequest>()). Merge requests are processed sequentially via this channel, eliminating MERGE_HEAD race conditions. Queue state is not persisted - merge operations are atomic (source: src/cook/execution/mapreduce/merge_queue.rs:70).
Resume and Recovery: The merge queue state is reconstructed on resume from checkpoint data (source: src/cook/execution/mapreduce/merge_queue.rs:153). When a MapReduce workflow is interrupted and resumed, the queue is rebuilt based on: - Completed agents: Already merged, skip re-merging - Failed agents: Tracked in DLQ, can be retried separately - In-progress agents: Moved back to pending status, will be re-executed - Pending agents: Continue processing from where left off
Any in-progress merges at the time of interruption are retried from the agent worktree state. This ensures no agent results are lost during resume.
Queue Processing: Queue processes MergeRequest objects containing:
- agent_id: Unique agent identifier
- branch_name: Agent's git branch to merge
- item_id: Work item identifier for correlation
- env: Execution environment context (variables, secrets)
Merge requests are processed FIFO with automatic conflict detection.
Sequential Merge Processing¶
Merges are processed sequentially to prevent conflicts:
- Lock merge queue
- Take next agent from pending queue
- Perform merge into parent worktree
- Update queue (move to merged or failed)
- Release lock
Automatic Conflict Resolution¶
If a standard git merge fails with conflicts, the merge queue automatically invokes Claude using the /prodigy-merge-worktree command to resolve conflicts intelligently:
Conflict Resolution Flow:
1. Standard git merge attempted
2. If conflicts detected, invoke Claude with /prodigy-merge-worktree {branch_name}
3. Claude is executed with PRODIGY_AUTOMATION=true environment variable (source: src/cook/execution/mapreduce/merge_queue.rs:98-99)
4. Claude analyzes conflicts and attempts resolution
5. If Claude succeeds, merge completes automatically
6. If Claude fails, agent is marked as failed and added to DLQ
PRODIGY_AUTOMATION Environment Variable: When set to true, this signals to Claude Code that it's operating in automated workflow mode and should use appropriate merge strategies without requiring user interaction. Claude will attempt to resolve conflicts autonomously using standard git merge strategies and code analysis.
Benefits: - Reduces manual merge conflict resolution overhead - Handles common conflict patterns automatically - Preserves full context for debugging via Claude logs - Falls back gracefully to DLQ for complex conflicts - Automated execution mode ensures non-interactive conflict resolution
This automatic conflict resolution is especially useful when multiple agents modify overlapping code areas.
Parent to Master Merge¶
Merge Confirmation¶
After reduce phase completes, Prodigy prompts for merge confirmation:
✓ MapReduce workflow completed successfully
Merge session-mapreduce-20250112_143052 to master? [y/N]
Custom Merge Workflows¶
Configure custom merge validation:
merge:
- shell: "git fetch origin"
- shell: "cargo test"
- shell: "cargo clippy"
- claude: "/prodigy-merge-worktree ${merge.source_branch} ${merge.target_branch}"
Important: Always pass both ${merge.source_branch} and ${merge.target_branch} to the /prodigy-merge-worktree command (source: .claude/commands/prodigy-merge-worktree.md). This ensures the merge targets the branch you were on when you started the workflow, not a hardcoded main/master branch.
Merge Variables¶
Available during merge workflows:
${merge.worktree}- Worktree name${merge.source_branch}- Session branch name${merge.target_branch}- Main repository branch (usually master/main)${merge.session_id}- Session ID for correlation
Debugging MapReduce Worktrees¶
Inspecting Worktree State¶
# List all worktrees
git worktree list
# View worktree details
cd ~/.prodigy/worktrees/{project}/session-mapreduce-*
git status
git log
# View agent worktree
cd ~/.prodigy/worktrees/{project}/agent-*
git log --oneline
Finding Agent Worktree Paths¶
Agent worktrees may be directly-created (with predictable names) or pool-allocated (with pool-assigned names). To correlate agent IDs to worktree paths:
Directly-Created Worktrees (deterministic paths):
# Pattern: ~/.prodigy/worktrees/{project}/mapreduce-agent-{job_id}_agent_{n}
cd ~/.prodigy/worktrees/{project}/mapreduce-agent-*
Pool-Allocated Worktrees (pool-assigned paths):
# List all worktrees and correlate by branch name
git worktree list
# Look for branches matching agent pattern
git branch -a | grep prodigy-mapreduce-agent
Note: Both allocation strategies produce fully-identified worktrees with names, paths, and branches. Pool allocation assigns names from the pool's naming scheme, while direct creation uses request-specific naming patterns. Use WorktreeInfo tracking (described below) to correlate agent IDs to actual worktree locations.
WorktreeInfo Tracking: Prodigy captures worktree metadata in WorktreeInfo structs containing:
- name: Worktree identifier
- path: Full filesystem path
- branch: Git branch name
This information is logged in MapReduce events and can be inspected via prodigy events {job_id} to correlate agent IDs to worktree paths.
Common Debugging Scenarios¶
Agent Failed to Merge:
- Check DLQ for failure details:
prodigy dlq show {job_id} - Inspect failed agent worktree:
cd ~/.prodigy/worktrees/{project}/mapreduce-agent-* - Review agent changes:
git diff master - Check for conflicts:
git status - Review Claude merge logs if conflict resolution was attempted
Parent Worktree Not Merging:
- Check parent worktree:
cd ~/.prodigy/worktrees/{project}/session-mapreduce-* - Verify all agents merged:
git log --oneline - Check for uncommitted changes:
git status - Review merge history:
git log --graph --oneline --all
Merge Conflict Resolution¶
If merge conflicts occur:
# Navigate to parent worktree
cd ~/.prodigy/worktrees/{project}/session-mapreduce-*
# View conflicts
git status
# Resolve manually
vim <conflicted-file>
# Complete merge
git add <conflicted-file>
git commit
Verification Commands¶
Verify Main Repository is Clean¶
# Main repository should have no changes from MapReduce execution
git status
# Expected: nothing to commit, working tree clean
Verify Worktree Isolation¶
# Check that parent worktree has changes
cd ~/.prodigy/worktrees/{project}/session-mapreduce-*
git status
git log --oneline
# Main repository should still be clean
cd /path/to/main/repo
git status
Verify Agent Merges¶
# Check for merge events
prodigy events {job_id}
# Verify merged agents in parent worktree
cd ~/.prodigy/worktrees/{project}/session-mapreduce-*
git log --oneline | grep "Merge"
Best Practices¶
Worktree Management¶
- Cleanup: Remove old worktrees after successful merge:
prodigy worktree clean - Monitoring: Check worktree disk usage periodically
- Inspection: Review worktrees before deleting to verify results
Merge Workflows¶
- Test Before Merge: Run tests in merge workflow to catch issues
- Sync Upstream: Fetch and merge origin/main before merging to main
- Conflict Prevention: Keep MapReduce jobs focused to minimize conflicts
Debugging¶
- Preserve Worktrees: Don't delete worktrees until debugging is complete
- Event Logs: Review event logs for merge failures:
prodigy events {job_id} - DLQ Review: Check failed items that might indicate merge issues
Troubleshooting¶
Worktree Creation Fails¶
Issue: Cannot create parent or child worktree Solution: Check disk space, verify git repository is valid, ensure no existing worktree with same name
Agent Merge Fails¶
Issue: Agent results fail to merge into parent Solution: Check merge queue, inspect agent worktree for conflicts, review agent changes
Parent Merge Conflicts¶
Issue: Merging parent worktree to main causes conflicts Solution: Resolve conflicts manually, consider rebasing parent worktree on latest main
Orphaned Worktrees¶
Issue: Worktrees remain after workflow completion
Solution: Use prodigy worktree clean to remove old worktrees, or manually remove with git worktree remove