Data Flow Scoring
Data flow scoring enhances Debtmap’s technical debt analysis by evaluating function purity, refactorability, and code patterns. This subsection explains how data flow analysis affects debt prioritization through three key factors: purity, refactorability, and pattern recognition.
Overview
Data flow scoring is an optional scoring layer that adjusts debt priorities based on functional programming principles. Functions that are pure, easily refactorable, or follow recognized patterns receive reduced priority scores, reflecting their lower maintenance burden.
Key principle: Pure functions and data transformation pipelines represent less technical debt than impure functions with side effects, because they’re easier to test, reason about, and refactor.
Source: src/priority/unified_scorer.rs:995-1020 (calculate_unified_priority_with_data_flow)
How Data Flow Scoring Works
Data flow scoring applies three weighted factors to the base debt score:
adjusted_score = base_score * combined_adjustment
combined_adjustment = (purity_factor * purity_weight
+ refactorability_factor * refactorability_weight
+ pattern_factor * pattern_weight)
/ total_weight
Each factor ranges from 0.0 to 1.0, where lower values reduce the final priority score.
Source: src/priority/unified_scorer.rs:1058-1075
Purity Spectrum
The purity spectrum classifies functions into five levels based on their side effects and mutation behavior. Pure functions receive the lowest priority multipliers since they represent minimal technical debt.
Classification Levels
| Level | Multiplier | Description |
|---|---|---|
StrictlyPure | 0.0 | No mutations, no I/O, referentially transparent |
LocallyPure | 0.3 | Pure interface but uses local mutations internally |
IOIsolated | 0.6 | I/O operations clearly separated from logic |
IOMixed | 0.9 | I/O mixed with business logic |
Impure | 1.0 | Mutable state, side effects throughout |
Source: src/priority/unified_scorer.rs:64-94 (PuritySpectrum enum)
Classification Algorithm
The purity factor is calculated by analyzing three sources of information from the data flow graph:
- Purity Analysis Results: High-confidence purity (>80%) indicates strict or local purity
- Mutation Analysis: Tracks whether a function has local mutations
- I/O Operations: Identifies I/O patterns for non-pure functions
#![allow(unused)]
fn main() {
// Classification logic (simplified)
if purity.is_pure && purity.confidence > 0.8 {
if mutations.has_mutations {
PuritySpectrum::LocallyPure // 0.3 multiplier
} else {
PuritySpectrum::StrictlyPure // 0.0 multiplier
}
} else if purity.is_pure {
PuritySpectrum::LocallyPure // 0.3 multiplier
} else {
classify_io_isolation(io_ops) // 0.6-1.0 multiplier
}
}
Source: src/priority/unified_scorer.rs:878-918 (calculate_purity_factor)
I/O Isolation Classification
For impure functions, the system evaluates I/O isolation based on concentration:
- IOIsolated (0.6): At most 2 unique I/O operation types and 3 total operations
- IOMixed (0.9): More than 2 unique types or more than 3 operations
- Impure (1.0): No I/O information available
Source: src/priority/unified_scorer.rs:921-935 (classify_io_isolation)
Purity Level vs Purity Spectrum
Debtmap uses two related but distinct purity classifications:
| Aspect | PurityLevel | PuritySpectrum |
|---|---|---|
| Purpose | Analysis classification | Scoring multiplier |
| Levels | 4 (StrictlyPure, LocallyPure, ReadOnly, Impure) | 5 (adds IOIsolated, IOMixed) |
| Usage | src/analysis/purity_analysis.rs | src/priority/unified_scorer.rs |
| Focus | Categorizing purity type | Assigning debt priority |
PurityLevel (from purity analysis) describes what kind of function this is. PuritySpectrum (for scoring) determines how much this affects debt priority, with finer granularity for I/O patterns.
Source: src/analysis/purity_analysis.rs:32-43 (PurityLevel), src/priority/unified_scorer.rs:64-94 (PuritySpectrum)
Pattern Factor
The pattern factor distinguishes data flow pipelines from business logic. Pure data transformation chains (map/filter/reduce patterns) receive reduced priority.
Calculation
#![allow(unused)]
fn main() {
// Pattern factor ranges from 0.7 to 1.0
let transform_ratio = transform_count / dependency_count;
if transform_ratio > 0.5 {
0.7 // Data flow pipeline - lowest priority
} else if transform_ratio > 0.3 {
0.85 // Mixed - moderate reduction
} else {
1.0 // Business logic - no reduction
}
}
Rationale: Functions with high transformation-to-dependency ratios are likely data flow pipelines, which are easier to test and maintain than complex business logic.
Source: src/priority/unified_scorer.rs:949-978 (calculate_pattern_factor)
Data Transformation Detection
The system counts data transformations by examining the data flow graph for:
- Outgoing function calls with associated data transformations
- Variable dependencies passed between functions
- Transformation types (map, filter, reduce, etc.)
Source: src/priority/unified_scorer.rs:980-993 (count_data_transformations)
Refactorability Factor
The refactorability factor was designed to identify dead stores and unused mutations. However, this analysis produced too many false positives and has been simplified.
Current behavior: Returns a neutral factor of 1.0 (no adjustment).
#![allow(unused)]
fn main() {
fn calculate_refactorability_factor(
_func_id: &FunctionId,
_data_flow: &DataFlowGraph,
_config: &DataFlowScoringConfig,
) -> f64 {
// Dead store analysis has been removed as it produced
// too many false positives.
1.0
}
}
Future plans: More sophisticated dead store analysis may be reintroduced with improved heuristics.
Source: src/priority/unified_scorer.rs:937-947
Data Flow Graph
The DataFlowGraph struct provides the underlying data for all data flow scoring calculations:
#![allow(unused)]
fn main() {
pub struct DataFlowGraph {
call_graph: CallGraph,
variable_deps: HashMap<FunctionId, HashSet<String>>,
data_transformations: HashMap<(FunctionId, FunctionId), DataTransformation>,
io_operations: HashMap<FunctionId, Vec<IoOperation>>,
purity_analysis: HashMap<FunctionId, PurityInfo>,
mutation_analysis: HashMap<FunctionId, MutationInfo>,
// ... (CFG analysis fields omitted for brevity)
}
}
Key data used for scoring:
purity_analysis: Results from purity detectionmutation_analysis: Tracks live vs dead mutationsio_operations: I/O operation locations and typesvariable_deps: Variable dependencies for pattern detectiondata_transformations: Transformation relationships between functions
Source: src/data_flow/mod.rs:113-140
Configuration
Configure data flow scoring in your .debtmap.toml:
[data_flow_scoring]
enabled = true # Enable/disable data flow scoring (default: true)
purity_weight = 0.4 # Weight for purity factor (default: 0.4)
refactorability_weight = 0.3 # Weight for refactorability factor (default: 0.3)
pattern_weight = 0.3 # Weight for pattern factor (default: 0.3)
Weight guidelines:
- All weights should be between 0.0 and 1.0
- Weights are normalized internally (don’t need to sum to 1.0)
- Higher
purity_weightemphasizes functional programming style - Higher
pattern_weightrewards data transformation pipelines
Source: src/config/scoring.rs:678-723 (DataFlowScoringConfig)
Disabling Data Flow Scoring
To disable data flow scoring entirely:
[data_flow_scoring]
enabled = false
When disabled, calculate_unified_priority_with_data_flow returns the base score without any data flow adjustments.
Practical Examples
Example 1: Strictly Pure Function
#![allow(unused)]
fn main() {
fn calculate_total(prices: &[f64]) -> f64 {
prices.iter().sum()
}
}
Analysis:
- No mutations:
has_mutations = false - No I/O operations:
io_ops = [] - High purity confidence:
confidence > 0.8
Result: PuritySpectrum::StrictlyPure (multiplier: 0.0)
This function’s debt score is reduced by the purity factor, deprioritizing it for refactoring.
Example 2: I/O Isolated Function
#![allow(unused)]
fn main() {
fn save_report(report: &Report) -> std::io::Result<()> {
let json = serde_json::to_string(report)?;
std::fs::write("report.json", json)?;
Ok(())
}
}
Analysis:
- I/O operations:
[file_write](1 unique type, 1 operation) - Concentrated I/O:
unique_types.len() <= 2 && ops.len() <= 3
Result: PuritySpectrum::IOIsolated (multiplier: 0.6)
Example 3: Data Flow Pipeline
#![allow(unused)]
fn main() {
fn process_transactions(transactions: Vec<Transaction>) -> Vec<Summary> {
transactions
.into_iter()
.filter(|t| t.amount > 0.0)
.map(|t| Summary::from(t))
.collect()
}
}
Analysis:
- High transformation ratio (filter + map chains)
transform_ratio > 0.5
Result: Pattern factor = 0.7, reducing debt priority for this data pipeline.
Integration with Unified Scoring
Data flow scoring integrates with the broader unified scoring system. The entry point is:
#![allow(unused)]
fn main() {
pub fn calculate_unified_priority_with_data_flow(
func: &FunctionMetrics,
call_graph: &CallGraph,
data_flow: &DataFlowGraph,
coverage: Option<&LcovData>,
_organization_issues: Option<f64>,
debt_aggregator: Option<&DebtAggregator>,
config: &DataFlowScoringConfig,
) -> UnifiedScore
}
The function:
- Calculates base score using role-aware unified scoring
- If data flow scoring is enabled, calculates the three factors
- Applies weighted combination to adjust final score
- Records factors in
UnifiedScorefor debugging
Source: src/priority/unified_scorer.rs:995-1080
Related Documentation
- Rebalanced Scoring - How data flow factors combine with other scoring weights
- Function-Level Scoring - Base scoring for individual functions
- File-Level Scoring - Aggregated scoring at file level
- Scoring Configuration - Configuration reference