Statistics and Metrics¶
ripgrep can output detailed statistics about the search operation, providing insights into performance, match counts, and files processed.
Overview¶
Statistics are useful for: - Understanding search performance - Debugging slow searches - Analyzing codebases - Gathering metrics for reporting - Optimizing search patterns
Enabling Statistics¶
Enable statistics output with the --stats flag:
This prints statistics after all search results.
Statistics Output Format¶
The statistics output includes several categories of information. Statistics are printed to stdout after all search results.
flowchart TD
Input[Search Input] --> Search[ripgrep Search]
Search --> Track["Track Metrics
Internally"]
subgraph Metrics["Collected Metrics"]
M1[Matches Count]
M2[Bytes Searched]
M3[Files Processed]
M4[Elapsed Time]
end
Track --> Metrics
Search --> Results["Search Results
to stdout"]
Metrics --> Stats[Statistics Summary]
Stats --> Output["Statistics Output
to stdout"]
style Track fill:#e1f5ff
style Stats fill:#fff3e0
style Results fill:#e8f5e9
style Output fill:#e8f5e9
Figure: Statistics collection flow showing how ripgrep tracks metrics during search and outputs them after results.
3 matches
2 matched lines
1 files contained matches
5 files searched
150 bytes printed
1500 bytes searched
0.025000 seconds spent searching
0.001000 seconds total
Note
Time values are formatted with 6 decimal places of precision.
Key Metrics¶
The statistics data structure tracks the following metrics:
// Source: crates/printer/src/stats.rs:13-21
pub struct Stats {
elapsed: Duration, // (1)!
searches: u64, // (2)!
searches_with_match: u64, // (3)!
bytes_searched: u64, // (4)!
bytes_printed: u64, // (5)!
matched_lines: u64, // (6)!
matches: u64, // (7)!
}
- Time spent searching across all threads
- Total number of files examined
- Number of files containing at least one match
- Total bytes read and searched
- Total bytes output (results + context lines)
- Number of lines containing matches
- Total number of pattern matches found
Match Statistics¶
- Matches: Total number of pattern matches found
- Matched lines: Number of lines containing matches. With multiline patterns (
--multilineor-U), this counts all lines that participate in or are part of any match, not just the first line of each match. - Files contained matches: Count of files with at least one match
Comparing --count vs --count-matches
These options provide different levels of match granularity:
| Flag | Output | Use Case |
|---|---|---|
--count |
Lines per file | When you need per-file line counts (e.g., "How many lines have errors in each file?") |
--count-matches |
Matches per file | When you need per-file match counts (e.g., "How many TODO comments in each file?") |
--stats |
Aggregate totals | When you need overall statistics across all files |
Example:
Search Scope¶
- Files searched: Total number of files examined
- Bytes searched: Total bytes read and searched
- Bytes printed: Amount of data output (including context)
Performance Metrics¶
- Seconds spent searching: Actual search time across all threads
- Seconds total: Wall-clock time for the entire operation
Performance Overhead
The --stats flag itself has minimal performance overhead since ripgrep tracks these metrics internally regardless. Enabling --stats only adds the cost of formatting and printing the final summary.
Understanding the Metrics¶
Matches vs. Matched Lines¶
A line can contain multiple matches:
Output:
graph LR
Line[Input Line] --> Match1["Match 1 (pos 0)"]
Line --> Match2["Match 2 (pos 31)"]
Match1 --> Result["2 matches
1 line"]
Match2 --> Result
style Line fill:#e1f5ff
style Match1 fill:#fff3e0
style Match2 fill:#fff3e0
style Result fill:#e8f5e9
Figure: Illustration of how a single line can contain multiple matches. The line count is 1, but the match count is 2.
Thread Time vs. Wall Time¶
With parallel search: - Searching time: Sum of time across all threads (can exceed wall time) - Total time: Actual elapsed time
Understanding Thread Time
The "seconds spent searching" metric accumulates CPU time across all threads. With parallel search, this will typically exceed wall-clock time. A ratio close to your thread count indicates good parallelization. For example, if you have 4 threads and the ratio is ~4x, your search is efficiently using all threads.
Example:
In this case, the 4:1 ratio (0.4s / 0.1s = 4) shows that all 4 threads were fully utilized during the search.
gantt
title Parallel Search Thread Utilization (4 threads)
dateFormat X
axisFormat %L ms
section Thread 1
Searching :t1, 0, 100
section Thread 2
Searching :t2, 0, 100
section Thread 3
Searching :t3, 0, 100
section Thread 4
Searching :t4, 0, 100
section Wall Time
Total elapsed :crit, wall, 0, 100
Figure: Visual representation of parallel search showing 4 threads running simultaneously. Wall-clock time is 100ms, but total CPU time is 400ms (4 threads × 100ms each), giving a 4:1 ratio indicating full parallelization.
Use Cases¶
Performance Analysis¶
Compare different search strategies:
# Compare regex vs. fixed string
time rg --stats 'complex.*pattern'
time rg --stats -F 'literal_string'
Codebase Metrics¶
Analyze code patterns:
Search Optimization¶
Identify slow searches:
Combining with Other Options¶
Statistics with File Counts¶
Statistics with Quiet Mode¶
Performance Impact with --quiet
When combining --stats with --quiet, ripgrep will search all files completely to collect accurate statistics, even though --quiet alone would normally exit after the first match. This means --stats disables --quiet's early-exit optimization. If you're just checking for pattern existence in a large codebase, using both flags together will be much slower than --quiet alone, as it must search all files to completion.
Statistics Output Formats¶
Statistics can be output in two formats:
Human-readable output appended after search results:
3 matches
2 matched lines
1 files contained matches
5 files searched
150 bytes printed
1500 bytes searched
0.025000 seconds spent searching
0.001000 seconds total
Best for: Interactive use, quick performance checks
Machine-readable JSON with "type": "summary" message:
{
"type": "summary",
"data": {
"elapsed_total": {
"human": "0.001000s",
"secs": 0,
"nanos": 1000000
},
"stats": {
"elapsed": {
"secs": 0,
"nanos": 25000000
},
"searches": 5,
"searches_with_match": 1,
"bytes_searched": 1500,
"bytes_printed": 150,
"matched_lines": 2,
"matches": 3
}
}
}
Best for: Scripts, automation, metric collection systems
JSON Statistics Parsing
The elapsed_total object includes a human field with formatted time for readability, plus precise secs and nanos fields for programmatic use. Use jq to extract specific metrics: rg --json --stats pattern | tail -1 | jq '.data.stats.matches'
Examples¶
Example 1: Find Most Common Pattern¶
# Compare prevalence of different patterns
echo "Searching for error handling patterns:"
rg --stats -trs 'Result<' | grep matches
rg --stats -trs 'Option<' | grep matches
rg --stats -trs '\.unwrap\(' | grep matches
Example 2: Performance Benchmark¶
# Measure search performance on large codebase
echo "Performance test:"
rg --stats --no-ignore 'pattern' /usr/share/
Example 3: Codebase Analysis¶
Example 4: Debugging Slow Search¶
Interpreting Results¶
flowchart TD
Start[Review Statistics] --> Check1{"High matches,
few files?"}
Check1 -->|Yes| Common["Pattern is common
or concentrated"]
Check1 -->|No| Check2{"Many files,
few matches?"}
Common --> Action1["Consider:
- More specific pattern
- Add context filters"]
Check2 -->|Yes| Rare["Pattern is rare
or too specific"]
Check2 -->|No| Check3{"Long search
time?"}
Rare --> Action2["Consider:
- File type filters -t
- Broader pattern"]
Check3 -->|Yes| Slow[Performance issue]
Check3 -->|No| Good[Statistics look normal]
Slow --> Action3["Check:
- File count
- Regex complexity
- Binary files
- Ignore patterns"]
style Start fill:#e1f5ff
style Common fill:#fff3e0
style Rare fill:#fff3e0
style Slow fill:#ffebee
style Good fill:#e8f5e9
Figure: Decision flow for interpreting statistics and identifying optimization opportunities.
High Match Count, Few Files¶
Suggests: - Pattern is very common - Matches concentrated in specific files - May benefit from more specific pattern
Many Files Searched, Few Matches¶
Suggests: - Pattern is rare - Good filtering by file type may help - Pattern might be too specific
Long Search Time¶
Possible causes: - Large number of files - Complex regex pattern - Binary files being searched - Inefficient ignore patterns
Performance Tips Based on Statistics¶
Optimization Strategies
Use these guidelines to optimize based on what statistics reveal:
- High files searched count: Use file type filters (
-t) or glob patterns - High bytes searched: Consider excluding large files or binary data
- High search time: Simplify regex patterns or use fixed strings (
-F) - Low parallelism benefit: Check if sorting or other options disabled threading
Statistics in Scripts¶
Capture statistics for reporting.
For production scripts, prefer JSON output for reliable parsing:
#!/bin/bash
OUTPUT=$(rg --json --stats 'pattern' | tail -1)
MATCHES=$(echo "$OUTPUT" | jq '.data.stats.matches')
FILES=$(echo "$OUTPUT" | jq '.data.stats.searches_with_match')
echo "Found $MATCHES matches across $FILES files"
Alternatively, you can parse text output for simpler cases:
#!/bin/bash
OUTPUT=$(rg --stats 'pattern' 2>&1)
MATCHES=$(echo "$OUTPUT" | grep "^[0-9]* matches" | cut -d' ' -f1)
FILES=$(echo "$OUTPUT" | grep "files contained matches" | cut -d' ' -f1)
echo "Found $MATCHES matches across $FILES files"
Common Questions¶
Why is search time greater than total time?
This is normal for parallel searches. "Seconds spent searching" is cumulative CPU time across all threads, while "seconds total" is wall-clock time. A 4:1 ratio with 4 threads indicates full parallelization. See Thread Time vs. Wall Time for details.
Why are matches and matched lines different?
A single line can contain multiple matches. For example, the line "the quick brown fox jumps over the lazy dog" contains 2 matches for the pattern the, but only counts as 1 matched line. See Matches vs. Matched Lines.
Does --stats slow down my search?
No, ripgrep tracks these metrics internally regardless. The --stats flag only adds the minimal cost of formatting and printing the summary at the end. However, combining --stats with --quiet disables quiet mode's early-exit optimization.
Why does --stats with --quiet search all files?
To provide accurate statistics, ripgrep must search all files completely when --stats is enabled. This overrides --quiet's normal behavior of exiting after the first match. If you only need to check for pattern existence, use --quiet alone for better performance.
Best Practices¶
- Use
--statsto verify search coverage - Compare statistics when optimizing patterns
- Monitor search time for performance regression testing
- Use statistics to understand codebase composition
- Include
--statsin documentation examples for transparency - Combine with
--debugfor detailed troubleshooting
Limitations¶
- Statistics are written to stdout after search results
- Thread time may exceed wall time with parallel execution (this is normal for parallel searches)
- Byte counts include full file scans, not just matched content
See Also¶
- Performance - Performance tuning and optimization