Automatic Gap Detection
Automatic Gap Detection¶
Automatic gap detection is a critical component of Prodigy's documentation workflow that identifies undocumented features and automatically creates chapter/subsection definitions with stub markdown files. This ensures comprehensive documentation coverage and prevents features from being implemented without corresponding user guidance.
Source: Implemented in .claude/commands/prodigy-detect-documentation-gaps.md:1-1048 and tested in tests/documentation_gap_detection_test.rs:1-678
Overview¶
Gap detection runs in the setup phase of the book workflow (workflows/book-docs-drift.yml:31-34) and performs several key functions:
- Analyzes features.json (from feature analysis) against existing chapters/subsections
- Classifies gaps by severity (high, medium, low)
- Validates content sufficiency before creating subsections (Step 0)
- Syncs chapters.json with actual file structure (Phase 7.5)
- Creates missing chapter definitions and stub markdown files
- Updates SUMMARY.md with proper hierarchy
- Generates flattened-items.json for the map phase (mandatory)
The gap detection process ensures that: - Features aren't documented without sufficient codebase material (prevents stub subsections) - Multi-subsection chapter structures are accurately reflected in chapters.json - The map phase receives a complete, flat list of all chapters and subsections to process - Documentation organization matches implementation reality
Command Usage¶
Command: /prodigy-detect-documentation-gaps
Parameters (.claude/commands/prodigy-detect-documentation-gaps.md:5-11):
/prodigy-detect-documentation-gaps \
--project "Prodigy" \
--config ".prodigy/book-config.json" \
--features ".prodigy/book-analysis/features.json" \
--chapters "workflows/data/prodigy-chapters.json" \
--book-dir "book"
Workflow Integration (workflows/book-docs-drift.yml:31-34):
setup:
# Step 1: Analyze features
- claude: "/prodigy-analyze-features-for-book --project $PROJECT_NAME --config $PROJECT_CONFIG"
# Step 2: Detect gaps and generate flattened-items.json
- claude: "/prodigy-detect-documentation-gaps \
--project $PROJECT_NAME \
--config $PROJECT_CONFIG \
--features $FEATURES_PATH \
--chapters $CHAPTERS_FILE \
--book-dir $BOOK_DIR"
Gap Severity Classification¶
Gap detection classifies documentation gaps into three severity levels based on feature importance and documentation completeness (.claude/commands/prodigy-detect-documentation-gaps.md:66-112):
High Severity (Missing Chapter/Subsection)¶
Criteria: - Feature area exists in features.json - NO corresponding chapter OR subsection found - Major user-facing capability with no guidance
Example:
{
"severity": "high",
"type": "missing_chapter",
"feature_category": "agent_merge",
"feature_description": "Custom merge workflows for map agents",
"recommended_chapter_id": "agent-merge-workflows",
"recommended_title": "Agent Merge Workflows"
}
Action: Create new chapter definition with stub markdown file
Medium Severity (Incomplete Chapter/Subsection)¶
Criteria: - Chapter or multi-subsection structure exists for feature area - But specific sub-capabilities are missing - Could be addressed by adding subsection or expanding content
Example: - "mapreduce" chapter exists but missing "performance_tuning" subsection
Action: Create subsection definition and add to existing multi-subsection chapter
Low Severity (Minor Gap)¶
Criteria: - Edge cases or advanced features not documented - Internal APIs exposed to users - Less common use cases
Action: Log as warning but may not create new content
Content Sufficiency Validation (Step 0)¶
CRITICAL SAFEGUARD: Before creating any subsection, gap detection validates that sufficient material exists in the codebase to support meaningful documentation.
Source: .claude/commands/prodigy-detect-documentation-gaps.md:166-335
Preservation of Single-File Chapters¶
Gap detection ALWAYS preserves well-written single-file chapters (.claude/commands/prodigy-detect-documentation-gaps.md:174-209):
Preservation Rules: - < 1000 lines AND < 10 H2 sections: PRESERVE as single-file - ≥ 1000 lines OR ≥ 10 H2 sections: Consider subsections for readability
Why: The original flat documentation structure works well for moderate-sized chapters. Subsections should only be created when they genuinely improve navigation.
Content Availability Validation¶
Step 0a: Discover Codebase Structure (.claude/commands/prodigy-detect-documentation-gaps.md:211-222)
Before counting content, the command discovers where code and examples are located using language-agnostic patterns:
# Discover test locations
TEST_DIRS=$(find . -type d -name "*test*" -o -name "*spec*" | grep -v node_modules | grep -v .git | head -5)
# Discover example/workflow/config locations
EXAMPLE_DIRS=$(find . -type d -name "*example*" -o -name "*workflow*" -o -name "*sample*" -o -name "*config*" | grep -v node_modules | grep -v .git | head -5)
# Discover primary source locations (works for Rust, Python, JS, TS, Go, Java)
SOURCE_DIRS=$(find . -type f \( -name "*.rs" -o -name "*.py" -o -name "*.js" -o -name "*.ts" -o -name "*.go" -o -name "*.java" \) | sed 's|/[^/]*$||' | sort -u | grep -v node_modules | grep -v .git | head -10)
Step 0b: Count Potential Content Sources (.claude/commands/prodigy-detect-documentation-gaps.md:224-255)
For each proposed subsection, the command counts language-agnostic content sources:
FEATURE_CATEGORY="<feature-category-name>"
# Type definitions (struct, class, interface, enum, type)
TYPE_COUNT=$(rg "(struct|class|interface|type|enum).*${FEATURE_CATEGORY}" --hidden --iglob '!.git' --iglob '!node_modules' -c | awk '{s+=$1} END {print s}')
# Function/method definitions
FUNCTION_COUNT=$(rg "(fn|function|def|func|public|private).*${FEATURE_CATEGORY}" --hidden --iglob '!.git' --iglob '!node_modules' -c | awk '{s+=$1} END {print s}')
# Test mentions in discovered test directories
TEST_COUNT=0
for test_dir in $TEST_DIRS; do
count=$(rg "${FEATURE_CATEGORY}" "$test_dir" --hidden -c 2>/dev/null | awk '{s+=$1} END {print s}')
TEST_COUNT=$((TEST_COUNT + count))
done
# Example/config file mentions in discovered example directories
EXAMPLE_COUNT=0
for example_dir in $EXAMPLE_DIRS; do
count=$(rg "${FEATURE_CATEGORY}" "$example_dir" --hidden -c 2>/dev/null | awk '{s+=$1} END {print s}')
EXAMPLE_COUNT=$((EXAMPLE_COUNT + count))
done
# Calculate totals
TOTAL_MENTIONS=$((TYPE_COUNT + FUNCTION_COUNT + TEST_COUNT + EXAMPLE_COUNT))
# Estimate documentation lines (rule of thumb)
# Each type = ~30 lines docs, each function = ~10 lines, each example = ~40 lines, each test = ~15 lines
ESTIMATED_LINES=$((TYPE_COUNT * 30 + FUNCTION_COUNT * 10 + EXAMPLE_COUNT * 40 + TEST_COUNT * 15))
Content Sufficiency Thresholds¶
MUST HAVE (to create subsection) - (.claude/commands/prodigy-detect-documentation-gaps.md:259-265):
- TOTAL_MENTIONS >= 5 - Feature mentioned in at least 5 places
- ESTIMATED_LINES >= 50 - Can generate at least 50 lines of documentation
- At least ONE of:
- TYPE_COUNT >= 1 (has configuration type/struct/class)
- EXAMPLE_COUNT >= 1 (has real example/config file)
SHOULD HAVE (for quality subsection) - (.claude/commands/prodigy-detect-documentation-gaps.md:266-269):
- TOTAL_MENTIONS >= 10
- ESTIMATED_LINES >= 100
- TYPE_COUNT >= 1 AND EXAMPLE_COUNT >= 1 (both type definition and example)
Decision Tree¶
If TOTAL_MENTIONS < 5 OR ESTIMATED_LINES < 50:
- ✗ DO NOT create subsection
- Alternative: Add as section within parent chapter's index.md
- Log: "⚠ Skipping subsection '${SUBSECTION_TITLE}': only ${TOTAL_MENTIONS} mentions, ${ESTIMATED_LINES} estimated lines"
- Gap Report: Record as "action": "skipped_subsection_creation", "reason": "insufficient_content"
If TOTAL_MENTIONS >= 5 AND ESTIMATED_LINES >= 50 BUT < 100:
- ~ Create subsection with "MINIMAL" flag
- Add metadata: {"content_warning": "minimal", "estimated_lines": ESTIMATED_LINES}
- Signals to fix phase that limited content is expected
If TOTAL_MENTIONS >= 10 AND ESTIMATED_LINES >= 100: - ✓ Proceed with full subsection creation
Special Case: Meta-Subsections¶
Meta-subsections like "Best Practices", "Troubleshooting", and "Examples" use different validation criteria (.claude/commands/prodigy-detect-documentation-gaps.md:306-334):
Best Practices Subsection:
BEST_PRACTICE_COUNT=$(rg "best.practice|pattern|guideline" --hidden --iglob '!.git' --iglob '!node_modules' -i -c | awk '{s+=$1} END {print s}')
# Requirement: BEST_PRACTICE_COUNT >= 3 OR documented patterns in code
Troubleshooting Subsection:
ERROR_COUNT=$(rg "error|warn|fail" --hidden --iglob '!.git' --iglob '!node_modules' -c | awk '{s+=$1} END {print s}')
ISSUE_COUNT=$(rg "TODO|FIXME|XXX" --hidden --iglob '!.git' --iglob '!node_modules' -c | awk '{s+=$1} END {print s}')
# Requirement: ERROR_COUNT >= 10 OR ISSUE_COUNT >= 5
Examples Subsection:
EXAMPLE_FILE_COUNT=0
for example_dir in $EXAMPLE_DIRS; do
count=$(find "$example_dir" -type f \( -name "*.yml" -o -name "*.yaml" -o -name "*.json" -o -name "*.toml" \) 2>/dev/null | wc -l)
EXAMPLE_FILE_COUNT=$((EXAMPLE_FILE_COUNT + count))
done
# Requirement: EXAMPLE_FILE_COUNT >= 2 real config files
If threshold not met: Add brief section to parent chapter's index.md instead of creating separate subsection.
Structure Validation (Phase 7.5)¶
MANDATORY: Ensures chapters.json accurately reflects the actual file structure before generating flattened-items.json.
Source: .claude/commands/prodigy-detect-documentation-gaps.md:678-743
Validation Process¶
Step 1: Scan for Multi-Subsection Directories
Find all directories under book/src/ with an index.md file and count .md subsection files:
for dir in $(find "${BOOK_DIR}/src/" -maxdepth 1 -type d); do
if [ -f "${dir}/index.md" ]; then
SUBSECTION_COUNT=$(find "${dir}" -maxdepth 1 -name "*.md" ! -name "index.md" | wc -l)
if [ "$SUBSECTION_COUNT" -gt 0 ]; then
# This is a multi-subsection chapter
CHAPTER_ID=$(basename "$dir")
echo "Found multi-subsection chapter: $CHAPTER_ID"
fi
fi
done
Step 2: Compare Against chapters.json
For each discovered multi-subsection chapter:
1. Look up definition in chapters.json
2. Check if type field is "multi-subsection" or "single-file"
3. If type is "single-file" or missing: MISMATCH - add to mismatches list
4. If type is "multi-subsection": Compare subsection counts
- If counts don't match: MISMATCH
Step 3: Check for Orphaned Single-File Definitions
For each chapter with type: "single-file":
1. Check if expected file (book/src/chapter-id.md) exists
2. Check if directory (book/src/chapter-id/) exists instead
3. If file missing but directory exists: MISMATCH
Step 4: Auto-Migrate Mismatched Chapters
For each mismatched chapter:
1. Scan directory to discover all subsection files
2. For each .md file (excluding index.md):
- Extract subsection ID from filename (remove .md)
- Read file and extract title from first H1/H2 heading
- Extract topics from section headings
- Create subsection definition
3. Update chapter in chapters.json:
- Change type to "multi-subsection"
- Change file to index_file (pointing to index.md)
- Add subsections array with all discovered subsections
- Preserve existing topics and validation fields
4. Write updated chapters.json to disk
5. Record migration in gap report
Example Migration¶
Before (chapters.json - incorrect):
{
"id": "mapreduce",
"title": "MapReduce Workflows",
"file": "mapreduce.md",
"type": "single-file",
"topics": ["Map phase", "Reduce phase"]
}
Actual File Structure (reality):
book/src/mapreduce/
├── index.md
├── checkpoint-and-resume.md
├── performance-tuning.md
└── worktree-isolation.md
After Migration (chapters.json - corrected):
{
"id": "mapreduce",
"title": "MapReduce Workflows",
"index_file": "mapreduce/index.md",
"type": "multi-subsection",
"topics": ["Map phase", "Reduce phase"],
"subsections": [
{
"id": "checkpoint-and-resume",
"title": "Checkpoint and Resume",
"file": "mapreduce/checkpoint-and-resume.md"
},
{
"id": "performance-tuning",
"title": "Performance Tuning",
"file": "mapreduce/performance-tuning.md"
},
{
"id": "worktree-isolation",
"title": "Worktree Isolation",
"file": "mapreduce/worktree-isolation.md"
}
]
}
Commit: Structure fixes are committed BEFORE generating flattened-items.json with message: "docs: sync chapters.json with actual file structure"
Flattened Items Generation (Phase 8)¶
CRITICAL: This file MUST be generated regardless of whether gaps are found. The map phase depends on it.
Source: .claude/commands/prodigy-detect-documentation-gaps.md:744-827
Purpose¶
Creates a flat array of all chapters and subsections for parallel processing in the map phase. This enables each map agent to work on a single chapter or subsection independently.
Processing Logic¶
For each chapter in chapters.json:
If type == "multi-subsection":
For each subsection in chapter.subsections:
Create item with parent metadata
Add to flattened array
If type == "single-file":
Create item with type marker
Add to flattened array
Output Structure¶
File: .prodigy/book-analysis/flattened-items.json
Example:
[
{
"id": "workflow-basics",
"title": "Workflow Basics",
"file": "book/src/workflow-basics.md",
"topics": [
"Setup phase",
"Command types",
"Variable interpolation"
],
"validation": "Check that workflow syntax and variable documentation are complete",
"type": "single-file"
},
{
"id": "checkpoint-and-resume",
"title": "Checkpoint and Resume",
"file": "book/src/mapreduce/checkpoint-and-resume.md",
"parent_chapter_id": "mapreduce",
"parent_chapter_title": "MapReduce Workflows",
"type": "subsection",
"topics": [
"Checkpoint creation",
"Resume behavior",
"State preservation"
],
"validation": "Check that checkpoint mechanism and resume procedures are documented",
"feature_mapping": [
"mapreduce.checkpoint",
"mapreduce.resume"
]
},
{
"id": "performance-tuning",
"title": "Performance Tuning",
"file": "book/src/mapreduce/performance-tuning.md",
"parent_chapter_id": "mapreduce",
"parent_chapter_title": "MapReduce Workflows",
"type": "subsection",
"topics": [
"Parallel execution",
"Resource limits"
],
"feature_mapping": [
"mapreduce.performance",
"mapreduce.resource_limits"
]
}
]
Map Phase Integration¶
The map phase consumes flattened-items.json (workflows/book-docs-drift.yml:36-48):
map:
input: "${ANALYSIS_DIR}/flattened-items.json"
json_path: "$[*]" # Each item is a chapter or subsection
agent_template:
# Analyze drift for this specific chapter/subsection
- claude: "/prodigy-analyze-subsection-drift --project $PROJECT_NAME --json '${item}' --features $FEATURES_PATH"
# Fix drift for this specific chapter/subsection
- claude: "/prodigy-fix-subsection-drift --project $PROJECT_NAME --json '${item}'"
Why Required: Without flattened-items.json, the map phase cannot parallelize drift analysis and fixing across chapters/subsections.
Topic Normalization¶
Gap detection uses normalization logic to accurately match feature categories against documented topics (.claude/commands/prodigy-detect-documentation-gaps.md:42-50):
Normalization Steps¶
- Convert to lowercase
- Remove punctuation and special characters
- Trim whitespace
- Extract key terms from compound names
Examples¶
"MapReduce Workflows" → ["mapreduce", "workflows"]
"agent_merge" → "agent-merge"
"command-types" → "command-types"
"Validation Operations" → ["validation", "operations"]
Matching Logic¶
For each feature area in features.json, the command checks if any of these match: 1. Chapter ID contains normalized_category 2. normalized_category contains Chapter ID 3. Chapter title contains normalized_category 4. Chapter topics contain normalized_category 5. Section headings in markdown match normalized_category 6. Subsection feature_mapping arrays match
Test Case (tests/documentation_gap_detection_test.rs:236-274):
#[test]
fn test_gap_detection_normalizes_topic_names() -> Result<()> {
// Features with underscores
let features = vec![
MockFeature {
category: "command_types".to_string(),
// ...
},
];
// Chapters with normalized names (hyphens)
let chapters = vec![
MockChapter {
id: "command-types".to_string(), // Hyphen vs underscore
// ...
},
];
let gaps = detect_gaps(&features, &chapters);
// Result: No gaps because normalization matches them
assert_eq!(gaps.len(), 0, "Normalization should match underscore and hyphen variations");
Ok(())
}
Idempotence¶
Gap detection can be run multiple times safely without creating duplicate chapters or subsections (.claude/commands/prodigy-detect-documentation-gaps.md:867-887).
Idempotence Guarantees¶
- Checks for existing chapters before creating
- Uses normalized comparison for matching
- Skips already-created chapters
- Can run repeatedly without side effects
Test Case¶
Source: tests/documentation_gap_detection_test.rs:236-274
#[test]
fn test_gap_detection_idempotence() -> Result<()> {
let features = vec![MockFeature {
category: "new_feature".to_string(),
description: "A new feature".to_string(),
capabilities: vec!["capability1".to_string()],
}];
// First run with no chapters
let gaps_first = detect_gaps(&features, &vec![]);
assert_eq!(gaps_first.len(), 1, "First run detects 1 gap");
// Simulate creating the chapter
let updated_chapters = vec![MockChapter {
id: "new-feature".to_string(),
title: "New Feature".to_string(),
file: "new-feature.md".to_string(),
topics: vec!["New feature overview".to_string()],
}];
// Second run with the new chapter
let gaps_second = detect_gaps(&features, &updated_chapters);
assert_eq!(gaps_second.len(), 0, "Second run detects no gaps");
Ok(())
}
Gap Report Structure¶
Output: .prodigy/book-analysis/gap-report.json
Example Report¶
{
"analysis_date": "2025-11-09T12:34:56Z",
"features_analyzed": 12,
"documented_topics": 10,
"gaps_found": 2,
"gaps": [
{
"severity": "high",
"type": "missing_chapter",
"feature_category": "agent_merge",
"feature_description": "Custom merge workflows for map agents",
"recommended_chapter_id": "agent-merge-workflows",
"recommended_title": "Agent Merge Workflows",
"recommended_location": "book/src/agent-merge-workflows.md",
"is_subsection": false
},
{
"severity": "high",
"type": "missing_chapter",
"feature_category": "circuit_breaker",
"feature_description": "Circuit breaker for error handling",
"recommended_chapter_id": "circuit-breaker",
"recommended_title": "Circuit Breaker",
"recommended_location": "book/src/circuit-breaker.md",
"is_subsection": false
}
],
"actions_taken": [
{
"action": "created_chapter_definition",
"chapter_id": "agent-merge-workflows",
"file_path": "workflows/data/prodigy-chapters.json"
},
{
"action": "created_stub_file",
"file_path": "book/src/agent-merge-workflows.md",
"type": "chapter"
},
{
"action": "updated_summary",
"file_path": "book/src/SUMMARY.md",
"items_added": [
{"type": "chapter", "id": "agent-merge-workflows"}
]
}
],
"structure_validation": {
"mismatches_found": 1,
"mismatched_chapters": ["mapreduce"],
"migrations_performed": [
{
"chapter_id": "mapreduce",
"action": "migrated_to_multi_subsection",
"subsections_discovered": 3
}
],
"validation_timestamp": "2025-11-09T12:34:56Z"
}
}
Execution Progress¶
When gap detection runs, it displays progress through multiple phases:
🔍 Analyzing documentation coverage...
✓ Loaded 12 feature areas from features.json
✓ Loaded 10 existing chapters
✓ Parsed SUMMARY.md structure
📊 Comparing features against documentation...
✓ Analyzed workflow_basics: documented ✓
✓ Analyzed mapreduce: documented ✓
⚠ Analyzed agent_merge: not documented (gap detected)
✓ Analyzed command_types: documented ✓
⚠ Analyzed circuit_breaker: not documented (gap detected)
🔍 Validating chapter structure (Phase 7.5)...
✓ Scanning for multi-subsection directories
✓ Comparing against chapters.json definitions
⚠ Found mismatch in mapreduce chapter (was single-file, now multi-subsection)
✓ Auto-migrated mapreduce chapter structure
📝 Creating missing chapters...
✓ Generated definition: agent-merge-workflows
✓ Created stub: book/src/agent-merge-workflows.md
✓ Generated definition: circuit-breaker
✓ Created stub: book/src/circuit-breaker.md
✓ Updated SUMMARY.md
💾 Generating flattened items for map phase...
✓ Processed 1 single-file chapter (workflow-basics)
✓ Processed 3 subsections from mapreduce chapter
✓ Processed 10 additional chapters/subsections
✓ Generated .prodigy/book-analysis/flattened-items.json
💾 Committing changes...
✓ Staged 6 files
✓ Committed: docs: auto-discover missing chapters for agent-merge-workflows, circuit-breaker
✓ Committed: docs: sync chapters.json with actual file structure
Final Summary¶
📊 Documentation Gap Analysis Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Features Analyzed: 12
Documented Topics: 10
Gaps Found: 2
🔴 High Severity Gaps (Missing Chapters): 2
• agent_merge - Custom merge workflows for map agents
• circuit_breaker - Workflow error circuit breaking
✅ Actions Taken:
✓ Created 2 chapter definitions in workflows/data/prodigy-chapters.json
✓ Created 2 stub files in book/src/
✓ Updated book/src/SUMMARY.md
✓ Generated flattened-items.json with 14 items
✓ Auto-migrated 1 chapter structure
✓ Committed changes (2 commits)
📝 Next Steps:
The map phase will now process 14 chapters/subsections to populate content.
Review the generated stubs and customize as needed.
Error Handling¶
Source: .claude/commands/prodigy-detect-documentation-gaps.md:889-919
Common Errors¶
Missing features.json:
- Cause: Feature analysis step hasn't run yet
- Solution: Ensure /prodigy-analyze-features-for-book runs before gap detection in setup phase
- Error Message: "Error: features.json not found at {path}. Run feature analysis first."
Missing/Invalid chapters.json: - Cause: Chapter definitions file doesn't exist or has invalid JSON - Solution: Create valid chapters.json or fix JSON syntax errors - Recovery: Gap detection can initialize empty chapters.json if needed
File Write Failures: - Cause: Permission issues or disk full - Solution: Check directory permissions and disk space - Rollback: Gap detection records partial state in gap report for manual cleanup
Invalid JSON Handling:
- Cause: Malformed JSON in input files
- Solution: Validate JSON with jq before running workflow
- Error Recording: Details added to gap report for debugging
Testing¶
Gap detection has comprehensive test coverage in tests/documentation_gap_detection_test.rs:1-678:
Test Coverage¶
Core Functionality: - Identifying missing chapters (tests/documentation_gap_detection_test.rs:1-50) - Idempotence behavior (tests/documentation_gap_detection_test.rs:236-274) - Topic normalization logic (tests/documentation_gap_detection_test.rs:275-320) - Chapter definition generation (tests/documentation_gap_detection_test.rs:321-370)
Edge Cases: - False positive prevention via normalization - Handling chapters with multiple topics - Subsection discovery and validation - Structure migration for multi-subsection chapters
Quality Assurance: - Stub file structure validation - SUMMARY.md update correctness - Gap report JSON schema validation