Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Flow Scoring

Data flow scoring enhances Debtmap’s technical debt analysis by evaluating function purity, refactorability, and code patterns. This subsection explains how data flow analysis affects debt prioritization through three key factors: purity, refactorability, and pattern recognition.

Overview

Data flow scoring is an optional scoring layer that adjusts debt priorities based on functional programming principles. Functions that are pure, easily refactorable, or follow recognized patterns receive reduced priority scores, reflecting their lower maintenance burden.

Key principle: Pure functions and data transformation pipelines represent less technical debt than impure functions with side effects, because they’re easier to test, reason about, and refactor.

Source: src/priority/unified_scorer.rs:995-1020 (calculate_unified_priority_with_data_flow)

How Data Flow Scoring Works

Data flow scoring applies three weighted factors to the base debt score:

adjusted_score = base_score * combined_adjustment
combined_adjustment = (purity_factor * purity_weight
                     + refactorability_factor * refactorability_weight
                     + pattern_factor * pattern_weight)
                    / total_weight

Each factor ranges from 0.0 to 1.0, where lower values reduce the final priority score.

Source: src/priority/unified_scorer.rs:1058-1075

Purity Spectrum

The purity spectrum classifies functions into five levels based on their side effects and mutation behavior. Pure functions receive the lowest priority multipliers since they represent minimal technical debt.

Classification Levels

LevelMultiplierDescription
StrictlyPure0.0No mutations, no I/O, referentially transparent
LocallyPure0.3Pure interface but uses local mutations internally
IOIsolated0.6I/O operations clearly separated from logic
IOMixed0.9I/O mixed with business logic
Impure1.0Mutable state, side effects throughout

Source: src/priority/unified_scorer.rs:64-94 (PuritySpectrum enum)

Classification Algorithm

The purity factor is calculated by analyzing three sources of information from the data flow graph:

  1. Purity Analysis Results: High-confidence purity (>80%) indicates strict or local purity
  2. Mutation Analysis: Tracks whether a function has local mutations
  3. I/O Operations: Identifies I/O patterns for non-pure functions
#![allow(unused)]
fn main() {
// Classification logic (simplified)
if purity.is_pure && purity.confidence > 0.8 {
    if mutations.has_mutations {
        PuritySpectrum::LocallyPure  // 0.3 multiplier
    } else {
        PuritySpectrum::StrictlyPure // 0.0 multiplier
    }
} else if purity.is_pure {
    PuritySpectrum::LocallyPure      // 0.3 multiplier
} else {
    classify_io_isolation(io_ops)    // 0.6-1.0 multiplier
}
}

Source: src/priority/unified_scorer.rs:878-918 (calculate_purity_factor)

I/O Isolation Classification

For impure functions, the system evaluates I/O isolation based on concentration:

  • IOIsolated (0.6): At most 2 unique I/O operation types and 3 total operations
  • IOMixed (0.9): More than 2 unique types or more than 3 operations
  • Impure (1.0): No I/O information available

Source: src/priority/unified_scorer.rs:921-935 (classify_io_isolation)

Purity Level vs Purity Spectrum

Debtmap uses two related but distinct purity classifications:

AspectPurityLevelPuritySpectrum
PurposeAnalysis classificationScoring multiplier
Levels4 (StrictlyPure, LocallyPure, ReadOnly, Impure)5 (adds IOIsolated, IOMixed)
Usagesrc/analysis/purity_analysis.rssrc/priority/unified_scorer.rs
FocusCategorizing purity typeAssigning debt priority

PurityLevel (from purity analysis) describes what kind of function this is. PuritySpectrum (for scoring) determines how much this affects debt priority, with finer granularity for I/O patterns.

Source: src/analysis/purity_analysis.rs:32-43 (PurityLevel), src/priority/unified_scorer.rs:64-94 (PuritySpectrum)

Pattern Factor

The pattern factor distinguishes data flow pipelines from business logic. Pure data transformation chains (map/filter/reduce patterns) receive reduced priority.

Calculation

#![allow(unused)]
fn main() {
// Pattern factor ranges from 0.7 to 1.0
let transform_ratio = transform_count / dependency_count;

if transform_ratio > 0.5 {
    0.7   // Data flow pipeline - lowest priority
} else if transform_ratio > 0.3 {
    0.85  // Mixed - moderate reduction
} else {
    1.0   // Business logic - no reduction
}
}

Rationale: Functions with high transformation-to-dependency ratios are likely data flow pipelines, which are easier to test and maintain than complex business logic.

Source: src/priority/unified_scorer.rs:949-978 (calculate_pattern_factor)

Data Transformation Detection

The system counts data transformations by examining the data flow graph for:

  • Outgoing function calls with associated data transformations
  • Variable dependencies passed between functions
  • Transformation types (map, filter, reduce, etc.)

Source: src/priority/unified_scorer.rs:980-993 (count_data_transformations)

Refactorability Factor

The refactorability factor was designed to identify dead stores and unused mutations. However, this analysis produced too many false positives and has been simplified.

Current behavior: Returns a neutral factor of 1.0 (no adjustment).

#![allow(unused)]
fn main() {
fn calculate_refactorability_factor(
    _func_id: &FunctionId,
    _data_flow: &DataFlowGraph,
    _config: &DataFlowScoringConfig,
) -> f64 {
    // Dead store analysis has been removed as it produced
    // too many false positives.
    1.0
}
}

Future plans: More sophisticated dead store analysis may be reintroduced with improved heuristics.

Source: src/priority/unified_scorer.rs:937-947

Data Flow Graph

The DataFlowGraph struct provides the underlying data for all data flow scoring calculations:

#![allow(unused)]
fn main() {
pub struct DataFlowGraph {
    call_graph: CallGraph,
    variable_deps: HashMap<FunctionId, HashSet<String>>,
    data_transformations: HashMap<(FunctionId, FunctionId), DataTransformation>,
    io_operations: HashMap<FunctionId, Vec<IoOperation>>,
    purity_analysis: HashMap<FunctionId, PurityInfo>,
    mutation_analysis: HashMap<FunctionId, MutationInfo>,
    // ... (CFG analysis fields omitted for brevity)
}
}

Key data used for scoring:

  • purity_analysis: Results from purity detection
  • mutation_analysis: Tracks live vs dead mutations
  • io_operations: I/O operation locations and types
  • variable_deps: Variable dependencies for pattern detection
  • data_transformations: Transformation relationships between functions

Source: src/data_flow/mod.rs:113-140

Configuration

Configure data flow scoring in your .debtmap.toml:

[data_flow_scoring]
enabled = true              # Enable/disable data flow scoring (default: true)
purity_weight = 0.4         # Weight for purity factor (default: 0.4)
refactorability_weight = 0.3 # Weight for refactorability factor (default: 0.3)
pattern_weight = 0.3        # Weight for pattern factor (default: 0.3)

Weight guidelines:

  • All weights should be between 0.0 and 1.0
  • Weights are normalized internally (don’t need to sum to 1.0)
  • Higher purity_weight emphasizes functional programming style
  • Higher pattern_weight rewards data transformation pipelines

Source: src/config/scoring.rs:678-723 (DataFlowScoringConfig)

Disabling Data Flow Scoring

To disable data flow scoring entirely:

[data_flow_scoring]
enabled = false

When disabled, calculate_unified_priority_with_data_flow returns the base score without any data flow adjustments.

Practical Examples

Example 1: Strictly Pure Function

#![allow(unused)]
fn main() {
fn calculate_total(prices: &[f64]) -> f64 {
    prices.iter().sum()
}
}

Analysis:

  • No mutations: has_mutations = false
  • No I/O operations: io_ops = []
  • High purity confidence: confidence > 0.8

Result: PuritySpectrum::StrictlyPure (multiplier: 0.0)

This function’s debt score is reduced by the purity factor, deprioritizing it for refactoring.

Example 2: I/O Isolated Function

#![allow(unused)]
fn main() {
fn save_report(report: &Report) -> std::io::Result<()> {
    let json = serde_json::to_string(report)?;
    std::fs::write("report.json", json)?;
    Ok(())
}
}

Analysis:

  • I/O operations: [file_write] (1 unique type, 1 operation)
  • Concentrated I/O: unique_types.len() <= 2 && ops.len() <= 3

Result: PuritySpectrum::IOIsolated (multiplier: 0.6)

Example 3: Data Flow Pipeline

#![allow(unused)]
fn main() {
fn process_transactions(transactions: Vec<Transaction>) -> Vec<Summary> {
    transactions
        .into_iter()
        .filter(|t| t.amount > 0.0)
        .map(|t| Summary::from(t))
        .collect()
}
}

Analysis:

  • High transformation ratio (filter + map chains)
  • transform_ratio > 0.5

Result: Pattern factor = 0.7, reducing debt priority for this data pipeline.

Integration with Unified Scoring

Data flow scoring integrates with the broader unified scoring system. The entry point is:

#![allow(unused)]
fn main() {
pub fn calculate_unified_priority_with_data_flow(
    func: &FunctionMetrics,
    call_graph: &CallGraph,
    data_flow: &DataFlowGraph,
    coverage: Option<&LcovData>,
    _organization_issues: Option<f64>,
    debt_aggregator: Option<&DebtAggregator>,
    config: &DataFlowScoringConfig,
) -> UnifiedScore
}

The function:

  1. Calculates base score using role-aware unified scoring
  2. If data flow scoring is enabled, calculates the three factors
  3. Applies weighted combination to adjust final score
  4. Records factors in UnifiedScore for debugging

Source: src/priority/unified_scorer.rs:995-1080