Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Semantic Classification

Debtmap performs semantic analysis to classify functions by their architectural role, enabling more accurate complexity scoring and prioritization.

Overview

Semantic classification identifies the purpose of each function based on AST patterns, helping debtmap:

  • Apply appropriate complexity expectations
  • Adjust scoring based on function role
  • Provide role-specific recommendations

Function Roles

Debtmap classifies functions into seven distinct roles, each with specific detection criteria and scoring behavior.

Pure Logic

Functions that compute without side effects. These are the core business logic functions that deserve highest test priority.

Detection Criteria (from src/priority/semantic_classifier/mod.rs:43):

  • Default classification when no other role matches
  • Does not match entry point, debug, constructor, enum converter, accessor, pattern matching, I/O wrapper, or orchestrator patterns

Example:

#![allow(unused)]
fn main() {
fn calculate_total(items: &[Item]) -> u32 {
    items.iter().map(|i| i.price).sum()
}
}

Orchestrator

Functions that coordinate other functions with simple delegation logic.

Detection Criteria (from src/priority/semantic_classifier/classifiers.rs:257-328):

  • Name matches orchestration patterns: orchestrate, coordinate, manage, dispatch, route, delegate, forward
  • Name prefixes: workflow_, pipeline_, process_, orchestrate_, coordinate_, execute_flow_
  • Must have at least 2 meaningful callees (non-stdlib functions)
  • Cyclomatic complexity ≤ 5
  • Delegation ratio ≥ 20% (function calls / total lines)
  • Excludes adapter/wrapper patterns (single delegation)

Example:

#![allow(unused)]
fn main() {
fn process_order(order: Order) -> Result<Receipt> {
    let validated = validate_order(&order)?;
    let priced = calculate_prices(&validated)?;
    finalize_order(&priced)
}
}

I/O Wrapper

Functions that wrap I/O operations. Includes simple constructors, accessors, and enum converters.

Detection Criteria (from src/priority/semantic_classifier/classifiers.rs:331-343):

  • Name contains I/O keywords: read, write, file, socket, http, request, response, stream, serialize, deserialize, save, load, etc.
  • Short I/O functions (< 20 lines) are always classified as I/O wrappers
  • Longer functions (≤ 50 lines) with strong I/O name patterns (output_, write_, print_, etc.) and low nesting (≤ 3)

Example:

#![allow(unused)]
fn main() {
fn read_config(path: &Path) -> Result<Config> {
    let content = fs::read_to_string(path)?;
    toml::from_str(&content)
}
}

Entry Point

Main functions and public API endpoints. These have highest classification precedence.

Detection Criteria (from src/priority/semantic_classifier/pattern_matchers.rs:54-63):

  • Name patterns: main, run, start, init, handle, process, execute, serve, listen
  • Functions at the top of the call graph
  • Highest classification precedence (checked before all other roles)

Example:

fn main() {
    let args = Args::parse();
    run(args).unwrap();
}

Pattern Match

Functions dominated by pattern matching logic, typically with many branches but low cyclomatic complexity.

Detection Criteria (from src/priority/semantic_classifier/classifiers.rs:213-252):

  • Name suggests pattern matching: detect, classify, identify, determine, resolve, match, parse_type, get_type, find_type
  • Low cyclomatic complexity (≤ 2) but higher cognitive complexity
  • Cognitive/cyclomatic ratio > 5.0 (indicates many if/else or match branches)

Example:

#![allow(unused)]
fn main() {
fn handle_event(event: Event) -> Action {
    match event {
        Event::Click(pos) => Action::Select(pos),
        Event::Drag(from, to) => Action::Move(from, to),
        Event::Release => Action::Confirm,
    }
}
}

Debug

Functions used for troubleshooting and diagnostics. These have the lowest test priority.

Detection Criteria (from src/priority/semantic_classifier/classifiers.rs:14-53):

  • Name prefixes: debug_, print_, dump_, trace_
  • Name suffixes: _diagnostics, _debug, _stats
  • Name contains: diagnostics
  • Cognitive complexity ≤ 10 (prevents misclassifying complex functions with debug-like names)
  • Alternatively: Very simple functions (cognitive < 5, length < 20) with output-focused I/O patterns (print, display, show, log, trace, dump)

Example:

#![allow(unused)]
fn main() {
fn print_call_graph_diagnostics(graph: &CallGraph) {
    for node in graph.nodes() {
        println!("{}: {} callers, {} callees",
            node.name, node.callers.len(), node.callees.len());
    }
}
}

Unknown

Functions that cannot be classified into any specific role. These receive neutral scoring adjustments.

Detection Criteria (from src/priority/semantic_classifier/mod.rs:32):

  • Reserved for edge cases where classification fails
  • In practice, functions default to PureLogic when no other role matches

Classification Precedence

The classifier applies rules in a specific order to ensure correct classification when multiple patterns match (from src/priority/semantic_classifier/mod.rs:47-113):

  1. Entry Point - Highest precedence, checked first
  2. Debug - Diagnostic functions detected early
  3. Constructor - Simple constructors (AST-based detection, spec 117/122)
  4. Enum Converter - Simple enum-to-value converters (spec 124)
  5. Accessor - Simple getter/accessor methods (spec 125)
  6. Data Flow - Data transformation orchestrators (spec 126, opt-in)
  7. Pattern Match - Pattern matching functions
  8. I/O Wrapper - I/O-focused functions
  9. Orchestrator - Coordination functions
  10. Pure Logic - Default fallback

AST-Based Detection

Semantic classification uses AST analysis to detect function roles beyond simple name matching.

Constructor Detection (Spec 117/122)

Source: src/priority/semantic_classifier/classifiers.rs:115-183

Detects constructor functions even with non-standard names by analyzing:

  • Return type: Must return Self, Result<Self>, or Option<Self>
  • Body patterns: Contains struct initialization with Self { ... }
  • No loops in function body
  • Complexity thresholds: cyclomatic ≤ 5, nesting ≤ 2, length < 30

Detected Patterns:

  • Standard names: new, default, from_*, with_*, create_*, make_*, build_*
  • Non-standard names with AST analysis: create_default_client() returning Self

Example:

#![allow(unused)]
fn main() {
// Detected even without standard naming
pub fn create_default_client() -> Self {
    Self {
        timeout: Duration::from_secs(30),
        retries: 3,
    }
}
}

Enum Converter Detection (Spec 124)

Source: src/priority/semantic_classifier/classifiers.rs:185-211

Detects simple enum-to-string converter functions:

  • Name patterns: name, as_str, to_*
  • Body contains exhaustive match statement on self
  • All match arms return string/numeric literals only
  • No function calls in match arms (e.g., no format!())
  • Cognitive complexity ≤ 3

Example:

#![allow(unused)]
fn main() {
pub fn name(&self) -> &'static str {
    match self {
        FrameworkType::Django => "Django",
        FrameworkType::Flask => "Flask",
        FrameworkType::PyQt => "PyQt",
    }
}
}

Accessor Method Detection (Spec 125)

Source: src/priority/semantic_classifier/mod.rs:121-177

Detects simple accessor and getter methods:

Single-word patterns:

  • id, name, value, kind, type, status, code, key, index

Prefix patterns:

  • get_*, is_*, has_*, can_*, should_*, as_*, to_*, into_*

Complexity thresholds:

  • Cyclomatic complexity ≤ 2
  • Cognitive complexity ≤ 1
  • Length < 10 lines
  • Nesting ≤ 1 level
  • If AST available, verifies body is simple accessor pattern

Example:

#![allow(unused)]
fn main() {
pub fn id(&self) -> u32 {
    self.id
}
}

Data Flow Classification (Spec 126)

Source: src/priority/semantic_classifier/mod.rs:81-96

Analyzes data flow patterns to identify orchestration functions based on transformation chains.

Detection Criteria:

  • Enabled via configuration (opt-in by default)
  • High confidence (≥ 0.8)
  • High transformation ratio (≥ 0.7)
  • Low business logic ratio (< 0.3)

Debug Function Detection (Spec 119)

Source: src/priority/semantic_classifier/pattern_matchers.rs:7-20

Detects debug/diagnostic functions using:

Name patterns:

  • Prefixes: debug_, print_, dump_, trace_
  • Suffixes: _diagnostics, _debug, _stats
  • Contains: diagnostics

Behavioral characteristics:

  • Low cognitive complexity (< 5)
  • Short length (< 20 lines)
  • Output-focused I/O patterns: print, display, show, log, trace, dump

Role-Specific Expectations

Different roles have different coverage and complexity expectations:

RoleCoverage ExpectationComplexity Tolerance
Pure LogicHighLow
OrchestratorMediumMedium
I/O WrapperLowLow
Entry PointLowMedium
Pattern MatchMediumVariable
DebugLowLow
UnknownMediumMedium

Scoring Adjustments

Semantic classification affects scoring through role multipliers. These values adjust the priority score for each function role (from src/config/scoring.rs:307-333):

[scoring.role_multipliers]
pure_logic = 1.2       # Prioritized (highest test priority)
orchestrator = 0.8     # Reduced priority
io_wrapper = 0.7       # Minor reduction
entry_point = 0.9      # Slight reduction
pattern_match = 0.6    # Moderate reduction
debug = 0.3           # Lowest test priority
unknown = 1.0         # No adjustment

Scoring Formula:

  • Higher multipliers (> 1.0) increase function priority
  • Lower multipliers (< 1.0) decrease function priority
  • pure_logic = 1.2 means pure logic functions are prioritized 20% higher
  • debug = 0.3 means debug functions are de-prioritized significantly

Configuration

Semantic Classification

[semantic]
enabled = true
role_detection = true
adjust_coverage_expectations = true

Constructor Detection (Spec 117/122)

From src/config/detection.rs:54-98:

[classification.constructors]
# Name patterns for constructor functions
patterns = ["new", "default", "from_", "with_", "create_", "make_", "build_", "of_", "empty", "zero", "any"]

# Complexity thresholds
max_cyclomatic = 2       # Maximum cyclomatic complexity
max_cognitive = 3        # Maximum cognitive complexity
max_length = 15          # Maximum function length
max_nesting = 1          # Maximum nesting depth

# Enable AST-based detection for non-standard constructor names
ast_detection = true     # Analyzes return types and body patterns

Accessor Detection (Spec 125)

From src/config/detection.rs:135-226:

[classification.accessors]
enabled = true

# Single-word accessor names
single_word_patterns = ["id", "name", "value", "kind", "type", "status", "code", "key", "index"]

# Prefix patterns for accessors
prefix_patterns = ["get_", "is_", "has_", "can_", "should_", "as_", "to_", "into_"]

# Complexity thresholds
max_cyclomatic = 2       # Maximum cyclomatic complexity
max_cognitive = 1        # Maximum cognitive complexity (stricter than constructors)
max_length = 10          # Maximum function length
max_nesting = 1          # Maximum nesting depth

Data Flow Classification (Spec 126)

From src/config/detection.rs:228-273:

[classification.data_flow]
enabled = false          # Opt-in feature
min_confidence = 0.8     # Minimum confidence required
min_transformation_ratio = 0.7  # Minimum transformation ratio for orchestrator
max_business_logic_ratio = 0.3  # Maximum business logic for orchestrator

Debug Function Detection

Debug function detection is controlled by name patterns in src/priority/semantic_classifier/pattern_matchers.rs:7-20. The detection thresholds are:

  • Cognitive complexity ≤ 10 for name-matched functions
  • Cognitive complexity < 5 and length < 20 for behavior-matched functions

Troubleshooting

Function Classified Incorrectly

If a function is classified with the wrong role:

  1. Check classification precedence - Entry points take highest precedence
  2. Review complexity thresholds - High complexity can disqualify certain roles
  3. Examine name patterns - Some roles require specific naming conventions
  4. Enable AST detection - Set classification.constructors.ast_detection = true for better constructor detection

Constructor Not Detected

If a simple constructor is classified as PureLogic:

  1. Ensure function name matches patterns or returns Self
  2. Check complexity thresholds: cyclomatic ≤ 2, cognitive ≤ 3, length < 15
  3. Enable AST detection for non-standard names
  4. Verify no loops in function body

Debug Function Not Detected

If a diagnostic function has high priority:

  1. Ensure name matches debug patterns (debug_*, print_*, *_diagnostics, etc.)
  2. Check cognitive complexity is ≤ 10
  3. Functions with high complexity are intentionally excluded to prevent misclassification

See Also