Interpreting Results
Interpreting Results
Understanding Output Formats
Debtmap provides three output formats:
Terminal (default): Human-readable with colors and tables
debtmap analyze .
JSON: Machine-readable for CI/CD integration
debtmap analyze . --format json --output report.json
Markdown: Documentation-friendly
debtmap analyze . --format markdown --output report.md
JSON Structure
Debtmap uses a unified JSON format (spec 108) that provides consistent structure for all debt items. The output is generated by converting the internal UnifiedAnalysis to UnifiedOutput format (src/output/unified.rs).
Top-level JSON structure:
{
"format_version": "1.0",
"metadata": {
"debtmap_version": "0.2.0",
"generated_at": "2025-12-04T12:00:00Z",
"project_root": "/path/to/project",
"analysis_type": "full"
},
"summary": {
"total_items": 45,
"total_debt_score": 2847.3,
"debt_density": 113.89,
"total_loc": 25000,
"by_type": {
"File": 3,
"Function": 42
},
"by_category": {
"Architecture": 5,
"Testing": 23,
"Performance": 7,
"CodeQuality": 10
},
"score_distribution": {
"critical": 2,
"high": 8,
"medium": 18,
"low": 17
}
},
"items": [ /* array of UnifiedDebtItemOutput */ ]
}
Individual debt item structure (function-level):
{
"type": "Function",
"score": 87.3,
"category": "Testing",
"priority": "high",
"location": {
"file": "src/main.rs",
"line": 42,
"function": "process_data"
},
"metrics": {
"cyclomatic_complexity": 15,
"cognitive_complexity": 22,
"length": 68,
"nesting_depth": 4,
"coverage": 0.0,
"uncovered_lines": [42, 43, 44, 45, 46],
"entropy_score": 0.65
},
"debt_type": {
"ComplexityHotspot": {
"cyclomatic": 15,
"cognitive": 22,
"adjusted_cyclomatic": null
}
},
"function_role": "BusinessLogic",
"purity_analysis": {
"is_pure": false,
"confidence": 0.8,
"side_effects": ["file_io", "network"]
},
"dependencies": {
"upstream_count": 2,
"downstream_count": 3,
"upstream_callers": ["main", "process_request"],
"downstream_callees": ["validate", "save", "notify"]
},
"recommendation": {
"action": "Add test coverage for complex function",
"priority": "High",
"implementation_steps": [
"Write unit tests covering all 15 branches",
"Consider extracting validation logic to reduce complexity"
]
},
"impact": {
"coverage_improvement": 0.85,
"complexity_reduction": 12.0,
"risk_reduction": 3.7
},
"adjusted_complexity": {
"dampened_cyclomatic": 12.5,
"dampening_factor": 0.83
},
"complexity_pattern": "validation",
"pattern_type": "state_machine",
"pattern_confidence": 0.72
}
Source: JSON structure defined in src/output/unified.rs:18-24 (UnifiedOutput), lines 66-71 (UnifiedDebtItemOutput enum), lines 156-183 (FunctionDebtItemOutput)
Note: The "type" field uses a tagged enum format where the value is either "Function" or "File". All items have consistent top-level fields (score, category, priority, location) regardless of type, simplifying filtering and sorting across mixed results.
Reading Function Metrics
Key fields in metrics object:
cyclomatic_complexity: Decision points - guides test case count (src/output/unified.rs:195)cognitive_complexity: Understanding difficulty - guides refactoring prioritylength: Lines of code - signals SRP violationsnesting_depth: Indentation depth - signals need for extractioncoverage: Test coverage percentage (0.0-1.0, optional if no coverage data)uncovered_lines: Array of line numbers not covered by tests (optional)entropy_score: Pattern analysis score for false positive reduction (0.0-1.0, optional)
Fields at item level:
function_role: Function importance ("EntryPoint","BusinessLogic","Utility","TestHelper") - affects score multiplier (src/priority/mod.rs:231)purity_analysis: Whether function has side effects (optional) - affects testability assessmentis_pure: Boolean indicating no side effectsconfidence: How certain (0.0-1.0)side_effects: Array of detected side effect types (e.g., “file_io”, “network”, “mutation”)
dependencies: Call graph relationshipsupstream_count: Number of functions that call this one (impact radius)downstream_count: Number of functions this calls (complexity)upstream_callers: Names of calling functions (optional, included if count > 0)downstream_callees: Names of called functions (optional, included if count > 0)
Complexity adjustment fields:
adjusted_complexity: Entropy-based dampening applied (optional, included when dampening factor < 1.0)dampened_cyclomatic: Reduced complexity after pattern analysisdampening_factor: Multiplier applied (e.g., 0.83 = 17% reduction)
complexity_pattern: Detected pattern name (e.g., “validation”, “dispatch”, “error_handling”)pattern_type: Pattern category (e.g., “state_machine”, “coordinator”) - high-level classificationpattern_confidence: Confidence in pattern detection (0.0-1.0, shown if ≥ 0.5)
Pattern detection and complexity adjustment:
Debtmap detects common code patterns that appear complex but are actually maintainable (src/complexity/entropy_analysis.rs). When detected with high confidence (≥ 0.5), complexity metrics are dampened:
Validation patterns - Repetitive input checking:
#![allow(unused)]
fn main() {
// Appears complex (cyclomatic = 15) but is repetitive
if field1.is_empty() { return Err(...) }
if field2.is_empty() { return Err(...) }
// ... repeated for many fields
}
Dampening: ~20-40% reduction if similarity is high
State machine patterns - Structured state transitions:
#![allow(unused)]
fn main() {
match state {
State::Init => { /* transition logic */ }
State::Running => { /* transition logic */ }
// ... many similar cases
}
}
Dampening: ~30-50% reduction for consistent transition logic
Error handling patterns - Systematic error checking:
#![allow(unused)]
fn main() {
let x = operation1().map_err(|e| /* wrap error */)?;
let y = operation2().map_err(|e| /* wrap error */)?;
// ... repeated error wrapping
}
Dampening: ~15-30% reduction for consistent error propagation
When to trust adjusted_complexity:
pattern_confidence ≥ 0.7: High confidence, usedampened_cyclomaticfor priority decisionspattern_confidence 0.5-0.7: Moderate confidence, consider both original and dampened valuespattern_confidence < 0.5or missing: Use originalcyclomatic_complexity
Entropy score interpretation:
entropy_score ≥ 0.7: High entropy, genuinely complex code - prioritize refactoringentropy_score 0.4-0.7: Moderate entropy, some repetition - review manuallyentropy_score < 0.4: Low entropy, highly repetitive pattern - likely false positive if flagged complex
Prioritizing Work
Debtmap provides multiple prioritization strategies, with unified scoring (0-10 scale) as the recommended default for most workflows:
1. By Unified Score (default - recommended)
debtmap analyze . --top 10
Shows top 10 items by combined complexity, coverage, and dependency factors, weighted and adjusted by function role.
Why use unified scoring:
- Balances complexity (40%), coverage (40%), and dependency impact (20%)
- Adjusts for function importance (entry points prioritized over utilities)
- Normalized 0-10 scale is intuitive and consistent
- Reduces false positives through coverage propagation
- Best for sprint planning and function-level refactoring decisions
Example:
# Show top 20 critical items
debtmap analyze . --min-priority 7.0 --top 20
# Focus on high-impact functions (score >= 7.0)
debtmap analyze . --format json | jq '.functions[] | select(.unified_score >= 7.0)'
2. By Risk Category (legacy compatibility)
debtmap analyze . --min-priority high
Shows only HIGH and CRITICAL priority items using legacy risk scoring.
Note: Legacy risk scoring uses additive formulas and unbounded scales. Prefer unified scoring for new workflows.
3. By Debt Type
debtmap analyze . --filter Architecture,Testing
Focuses on specific categories:
Architecture: God objects, complexity, dead codeTesting: Coverage gaps, test qualityPerformance: Resource leaks, inefficienciesCodeQuality: Code smells, maintainability
4. By ROI (with coverage)
debtmap analyze . --lcov coverage.lcov --top 20
Prioritizes by return on investment for testing/refactoring. Combines unified scoring with test effort estimates to identify high-value work.
Choosing the right strategy:
- Sprint planning for developers: Use unified scoring (
--top N) - Architectural review: Use tiered prioritization (
--summary) - Category-focused work: Use debt type filtering (
--filter) - Testing priorities: Use ROI analysis with coverage data (
--lcov) - Historical comparisons: Use legacy risk scoring (for consistency with old reports)
Tiered Prioritization
Note: Tiered prioritization uses traditional debt scoring (additive, higher = worse) and is complementary to the unified scoring system (0-10 scale). Both systems can be used together:
- Unified scoring (0-10 scale): Best for function-level prioritization and sprint planning
- Tiered prioritization (debt tiers): Best for architectural focus and strategic debt planning
Use --summary for tiered view focusing on architectural issues, or default output for function-level unified scores.
Debtmap uses a tier-based system to map debt scores to actionable priority levels. Each tier includes effort estimates and strategic guidance for efficient debt remediation.
Tier Levels
The Tier enum defines four priority levels based on score thresholds:
#![allow(unused)]
fn main() {
pub enum Tier {
Critical, // Score ≥ 90
High, // Score 70-89.9
Moderate, // Score 50-69.9
Low, // Score < 50
}
}
Score-to-Tier Mapping:
- Critical (≥ 90): Immediate action required - blocks progress
- High (70-89.9): Should be addressed this sprint
- Moderate (50-69.9): Plan for next sprint
- Low (< 50): Background maintenance work
Effort Estimates Per Tier
Each tier includes estimated effort based on typical remediation patterns:
| Tier | Estimated Effort | Typical Work |
|---|---|---|
| Critical | 1-2 days | Major refactoring, comprehensive testing, architectural changes |
| High | 2-4 hours | Extract functions, add test coverage, fix resource leaks |
| Moderate | 1-2 hours | Simplify logic, reduce duplication, improve error handling |
| Low | 30 minutes | Address TODOs, minor cleanup, documentation |
Effort calculation considers:
- Complexity metrics (cyclomatic, cognitive)
- Test coverage gaps
- Number of dependencies (upstream/downstream)
- Debt category (Architecture debt takes longer than CodeQuality)
Tiered Display Grouping
Note: TieredDisplay is an internal structure (src/priority/mod.rs:515) used for terminal output formatting and is not serialized to JSON. JSON output includes individual items with their priority field (critical, high, medium, low) based on score thresholds (src/output/unified.rs:74-81).
The internal TieredDisplay structure groups similar debt items for batch action recommendations in terminal output:
Grouping strategy:
- Groups items by tier (Critical/High/Moderate/Low) and similarity pattern
- Prevents grouping of god objects (always show individually)
- Prevents grouping of Critical items (each needs individual attention)
- Suggests batch actions for similar Low/Moderate items in terminal output
To view tiered grouping in JSON, use priority filtering:
# Get all Critical items (priority: "critical")
debtmap analyze . --format json | jq '.items[] | select(.priority == "critical")'
# Get High priority items
debtmap analyze . --format json | jq '.items[] | select(.priority == "high")'
# Count items by priority
debtmap analyze . --format json | jq '.summary.score_distribution'
The score_distribution in the summary provides counts for each priority tier.
Using Tiered Prioritization
1. Start with Critical tier:
debtmap analyze . --min-priority critical
Focus on items with score ≥ 90. These typically represent:
- Complex functions with 0% coverage
- God objects blocking feature development
- Critical resource leaks or security issues
2. Plan High tier work:
debtmap analyze . --min-priority high --format json > sprint-plan.json
Schedule 2-4 hours per item for this sprint. Look for:
- Functions approaching complexity thresholds
- Moderate coverage gaps on important code paths
- Performance bottlenecks with clear solutions
3. Batch Moderate tier items:
debtmap analyze . --min-priority moderate
Review batch recommendations. Examples:
- “10 validation functions detected - extract common pattern”
- “5 similar test files with duplication - create shared fixtures”
- “8 functions with magic values - create constants module”
4. Schedule Low tier background work: Address during slack time or as warm-up tasks for new contributors.
Strategic Guidance by Tier
Critical Tier Strategy:
- Block new features until addressed
- Pair programming recommended for complex items
- Architectural review before major refactoring
- Comprehensive testing after changes
High Tier Strategy:
- Sprint planning priority
- Impact analysis before changes
- Code review from senior developers
- Integration testing after changes
Moderate Tier Strategy:
- Batch similar items for efficiency
- Extract patterns across multiple files
- Incremental improvement over multiple PRs
- Regression testing for affected areas
Low Tier Strategy:
- Good first issues for new contributors
- Documentation improvements
- Code cleanup during refactoring nearby code
- Technical debt gardening sessions
Categorized Debt Analysis
Note: CategorizedDebt is an internal analysis structure (src/priority/mod.rs:419) used for query operations and markdown formatting. It is not serialized to JSON output. The JSON output uses by_category in the summary section instead (see JSON Structure above).
Debtmap’s internal CategorizedDebt analysis groups debt items by category and identifies cross-category dependencies. This analysis powers the markdown output and internal query methods but is not directly exposed in JSON format.
CategorySummary
Each category gets a summary with metrics for planning:
#![allow(unused)]
fn main() {
pub struct CategorySummary {
pub category: DebtCategory,
pub total_score: f64,
pub item_count: usize,
pub estimated_effort_hours: f64,
pub average_severity: f64,
pub top_items: Vec<DebtItem>, // Up to 5 highest priority
}
}
Effort estimation formulas:
- Architecture debt:
complexity_score / 10 × 2hours (structural changes take longer) - Testing debt:
complexity_score / 10 × 1.5hours (writing tests) - Performance debt:
complexity_score / 10 × 1.8hours (profiling + optimization) - CodeQuality debt:
complexity_score / 10 × 1.2hours (refactoring)
Example category summary:
{
"category": "Architecture",
"total_score": 487.5,
"item_count": 15,
"estimated_effort_hours": 97.5,
"average_severity": 32.5,
"top_items": [
{
"debt_type": "GodObject",
"file": "src/services/user_service.rs",
"score": 95.0,
"estimated_effort_hours": 16.0
},
{
"debt_type": "ComplexityHotspot",
"file": "src/payments/processor.rs",
"score": 87.3,
"estimated_effort_hours": 14.0
}
]
}
Cross-Category Dependencies
CrossCategoryDependency identifies blocking relationships between different debt categories:
#![allow(unused)]
fn main() {
pub struct CrossCategoryDependency {
pub from_category: DebtCategory,
pub to_category: DebtCategory,
pub blocking_items: Vec<(DebtItem, DebtItem)>,
pub impact_level: ImpactLevel, // Critical, High, Medium, Low
pub recommendation: String,
}
}
Common dependency patterns:
1. Architecture blocks Testing:
- Pattern: God objects are too complex to test effectively
- Example:
UserServicehas 50+ functions, making comprehensive testing impractical - Impact: Critical - cannot improve test coverage without refactoring
- Recommendation: “Split god object into 4-5 focused modules before adding tests”
2. Async issues require Architecture changes:
- Pattern: Blocking I/O in async contexts requires architectural redesign
- Example: Sync database calls in async handlers
- Impact: High - performance problems require design changes
- Recommendation: “Introduce async database layer before optimizing handlers”
3. Complexity affects Testability:
- Pattern: High cyclomatic complexity makes thorough testing difficult
- Example: Function with 22 branches needs 22+ test cases
- Impact: High - testing effort grows exponentially with complexity
- Recommendation: “Reduce complexity to < 10 before writing comprehensive tests”
4. Performance requires Architecture:
- Pattern: O(n²) nested loops need different data structures
- Example: Linear search in loops should use HashMap
- Impact: Medium - optimization requires structural changes
- Recommendation: “Refactor data structure before micro-optimizations”
Example cross-category dependency:
{
"from_category": "Architecture",
"to_category": "Testing",
"impact_level": "Critical",
"blocking_items": [
{
"blocker": {
"debt_type": "GodObject",
"file": "src/services/user_service.rs",
"functions": 52,
"score": 95.0
},
"blocked": {
"debt_type": "TestingGap",
"file": "src/services/user_service.rs",
"coverage": 15,
"score": 78.0
}
}
],
"recommendation": "Split UserService into focused modules (auth, profile, settings, notifications) before attempting to improve test coverage. Current structure makes comprehensive testing impractical.",
"estimated_unblock_effort_hours": 16.0
}
Using Category-Based Analysis
View category distribution in JSON:
debtmap analyze . --format json | jq '.summary.by_category'
This shows item counts per category (Architecture, Testing, Performance, CodeQuality).
Filter items by category:
debtmap analyze . --format json | jq '.items[] | select(.category == "Architecture")'
Focus on specific category with CLI:
debtmap analyze . --filter Architecture --top 10
Generate markdown for detailed category analysis:
debtmap analyze . --format markdown --output report.md
The markdown format includes full CategorySummary details with effort estimates and cross-category dependency analysis.
Strategic planning workflow:
-
Review category summaries:
- Identify which category has highest total score
- Check estimated effort hours per category
- Note average severity to gauge urgency
-
Check cross-category dependencies:
- Find Critical and High impact blockers
- Prioritize blockers before blocked items
- Plan architectural changes before optimization
-
Plan remediation order:
Example decision tree: - Architecture score > 400? → Address god objects first - Testing gap with low complexity? → Quick wins, add tests - Performance issues + architecture debt? → Refactor structure first - High code quality debt but good architecture? → Incremental cleanup -
Use category-specific strategies:
- Architecture: Pair programming, design reviews, incremental refactoring
- Testing: TDD for new code, characterization tests for legacy
- Performance: Profiling first, optimize hot paths, avoid premature optimization
- CodeQuality: Code review focus, linting rules, consistent patterns
Note on output formats: CategorySummary and CrossCategoryDependency details are available in markdown format only. The JSON output provides category counts in summary.by_category and you can filter items by category using the category field on each item.
Debt Density Metric
Debt density normalizes technical debt scores across projects of different sizes, providing a per-1000-lines-of-code metric for fair comparison.
Formula
debt_density = (total_debt_score / total_lines_of_code) × 1000
Example calculation:
Project A:
- Total debt score: 1,250
- Total lines of code: 25,000
- Debt density: (1,250 / 25,000) × 1000 = 50
Project B:
- Total debt score: 2,500
- Total lines of code: 50,000
- Debt density: (2,500 / 50,000) × 1000 = 50
Projects A and B have equal debt density (50) despite B having twice the absolute debt, because B is also twice as large. They have proportionally similar technical debt.
Interpretation Guidelines
Use these thresholds to assess codebase health:
| Debt Density | Assessment | Description |
|---|---|---|
| 0-50 | Clean | Well-maintained codebase, minimal debt |
| 51-100 | Moderate | Typical technical debt, manageable |
| 101-150 | High | Significant debt, prioritize remediation |
| 150+ | Critical | Severe debt burden, may impede development |
Context matters:
- Early-stage projects: Often have higher density (rapid iteration)
- Mature projects: Should trend toward lower density over time
- Legacy systems: May have high density, track trend over time
- Greenfield rewrites: Aim for density < 50
Using Debt Density
1. Compare projects fairly:
# Small microservice (5,000 LOC, debt = 250)
# Debt density: 50
# Large monolith (100,000 LOC, debt = 5,000)
# Debt density: 50
# Equal health despite size difference
2. Track improvement over time:
Sprint 1: 50,000 LOC, debt = 7,500, density = 150 (High)
Sprint 5: 52,000 LOC, debt = 6,500, density = 125 (Improving)
Sprint 10: 54,000 LOC, debt = 4,860, density = 90 (Moderate)
3. Set team goals:
Current density: 120
Target density: < 80 (by Q4)
Reduction needed: 40 points
Strategy:
- Fix 2-3 Critical items per sprint
- Prevent new debt (enforce thresholds)
- Refactor before adding features in high-debt modules
4. Benchmark across teams/projects:
{
"team_metrics": [
{
"project": "auth-service",
"debt_density": 45,
"assessment": "Clean",
"trend": "stable"
},
{
"project": "billing-service",
"debt_density": 95,
"assessment": "Moderate",
"trend": "improving"
},
{
"project": "legacy-api",
"debt_density": 165,
"assessment": "Critical",
"trend": "worsening"
}
]
}
Limitations
Debt density doesn’t account for:
- Code importance: 100 LOC in payment logic ≠ 100 LOC in logging utils
- Complexity distribution: One 1000-line god object vs. 1000 simple functions
- Test coverage: 50% coverage on critical paths vs. low-priority features
- Team familiarity: New codebase vs. well-understood legacy system
Best practices:
- Use density as one metric among many
- Combine with category analysis and tiered prioritization
- Focus on trend (improving/stable/worsening) over absolute number
- Consider debt per module for more granular insights
Debt Density in CI/CD
Track density over time:
# Generate report with density
debtmap analyze . --format json --output debt-report.json
# Extract density for trending
DENSITY=$(jq '.debt_density' debt-report.json)
# Store in metrics database
echo "debtmap.density:${DENSITY}|g" | nc -u -w0 statsd 8125
Set threshold gates:
# .github/workflows/debt-check.yml
- name: Check debt density
run: |
DENSITY=$(debtmap analyze . --format json | jq '.debt_density')
if (( $(echo "$DENSITY > 150" | bc -l) )); then
echo "❌ Debt density too high: $DENSITY (limit: 150)"
exit 1
fi
echo "✅ Debt density acceptable: $DENSITY"
Actionable Insights
Each recommendation includes:
ACTION: What to do
- “Add 6 unit tests for full coverage”
- “Refactor into 3 smaller functions”
- “Extract validation to separate function”
IMPACT: Expected improvement
- “Full test coverage, -3.7 risk”
- “Reduce complexity from 22 to 8”
- “Eliminate 120 lines of duplication”
WHY: Rationale
- “Business logic with 0% coverage, manageable complexity”
- “High complexity with low coverage threatens stability”
- “Repeated validation pattern across 5 files”
Example workflow:
- Run analysis with coverage:
debtmap analyze . --lcov coverage.lcov - Filter to CRITICAL items:
--min-priority critical - Review top 5 recommendations
- Start with highest ROI items
- Rerun analysis to track progress
Common Patterns to Recognize
Pattern 1: High Complexity, Well Tested
Complexity: 25, Coverage: 95%, Risk: LOW
This is actually good! Complex but thoroughly tested code. Learn from this approach.
Pattern 2: Moderate Complexity, No Tests
Complexity: 12, Coverage: 0%, Risk: CRITICAL
Highest priority - manageable complexity, should be easy to test.
Pattern 3: Low Complexity, No Tests
Complexity: 3, Coverage: 0%, Risk: LOW
Low priority - simple code, less risky without tests.
Pattern 4: Repetitive High Complexity (Dampened)
Cyclomatic: 20, Effective: 7 (65% dampened), Risk: LOW
Validation or dispatch pattern - looks complex but is repetitive. Lower priority.
Pattern 5: God Object
File: services.rs, Functions: 50+, Responsibilities: 15+
Architectural issue - split before adding features.