Backreferences¶
Backreferences allow you to match previously captured groups within a regex pattern. Requires PCRE2 (-P flag).
PCRE2 Engine Required
Backreferences are ONLY available with the PCRE2 engine and are NOT supported by ripgrep's default regex engine.
The default engine uses finite automata which cannot support backreferences. PCRE2 uses a backtracking algorithm that enables backreferences, but at a performance cost (see Performance Considerations below).
Quick tip: Use --engine auto to let ripgrep automatically select PCRE2 when your pattern contains backreferences.
flowchart TD
Start[Write Regex Pattern] --> HasBackref{"Pattern uses
backreferences?"}
HasBackref -->|Yes| NeedPCRE[PCRE2 Required]
HasBackref -->|No| DefaultOK[Default Engine OK]
NeedPCRE --> AutoEngine{"Using
--engine auto?"}
AutoEngine -->|Yes| AutoSelect[Ripgrep selects PCRE2]
AutoEngine -->|No| ManualFlag{"Using -P
or --pcre2?"}
ManualFlag -->|Yes| PCRE2[PCRE2 Engine]
ManualFlag -->|No| Error["Error: backreferences
not supported"]
AutoSelect --> PCRE2
DefaultOK --> Fast["Fast: O(n) linear time"]
PCRE2 --> Slower["Slower: Potential O(2^n) with backtracking"]
Error --> Fix[Add -P flag]
Fix --> PCRE2
style NeedPCRE fill:#fff3e0
style DefaultOK fill:#e8f5e9
style PCRE2 fill:#e1f5ff
style Error fill:#ffebee
style Fast fill:#e8f5e9
style Slower fill:#fff3e0
Figure: Engine selection flow for backreference patterns - shows automatic detection and performance trade-offs.
Numbered Backreferences¶
# Find repeated words (word followed by same word)
rg -P '(\w+)\s+\1' # (1)!
# Find repeated patterns
rg -P '(\d{3})-\1' # (2)!
(\w+)captures a word,\1matches the same word again - finds "the the" or "test test"(\d{3})captures 3 digits,\1matches same digits - finds "123-123" pattern
The \1 refers to the first capture group, \2 to the second, etc.
Named Backreferences¶
Use named captures with (?P<name>...) and reference with \k<name>:
(?P<word>\w+)captures word with name "word",\k<word>references it by name - more readable than\1
Backreferences in Replacements¶
Backreferences are particularly useful with the -r flag for replacements:
flowchart LR
Input["Input Text:
'john@example.com'"] --> Pattern["Pattern:
(\w+)@(\w+)\.com"]
Pattern --> Capture1["Capture Group 1:
'john'"]
Pattern --> Capture2["Capture Group 2:
'example'"]
Capture1 --> Replace["Replacement:
'User: $1, Domain: $2'"]
Capture2 --> Replace
Replace --> Output["Output:
'User: john, Domain: example'"]
style Input fill:#e8f5e9
style Capture1 fill:#e1f5ff
style Capture2 fill:#e1f5ff
style Output fill:#f3e5f5
Figure: Backreference replacement flow - shows how captured groups are extracted and substituted in replacement expression.
# Swap two words
rg -P '(\w+)\s+(\w+)' -r '$2 $1' # (1)!
# Transform patterns
rg -P '(\w+)@(\w+)\.com' -r 'User: $1, Domain: $2' # (2)!
- Captures two words, swaps their order in replacement - "foo bar" becomes "bar foo"
- Extracts email parts into structured format - "john@example.com" becomes "User: john, Domain: example"
Replacement Syntax Rules¶
When using backreferences in replacements, keep these syntax constraints in mind:
- Group names must use only
[_0-9A-Za-z]characters - Longest match rule:
$1aattempts to match a group named1afirst - Disambiguation: Use
${1}ato reference group1followed by literala - Invalid references are replaced with empty strings
Disambiguation Examples
See the Replacements chapter for more details.
Performance Considerations¶
Backreferences have significant performance implications compared to ripgrep's default engine:
Why PCRE2 is Slower¶
Engine Architecture Differences:
- Default engine: Uses finite automata with guaranteed linear time complexity O(n)
- PCRE2 engine: Uses backtracking algorithm with potential for exponential time complexity O(2^n) on pathological patterns
Backtracking Complexity: PCRE2's backtracking can exhibit catastrophic performance on certain patterns, especially:
- Nested quantifiers like
(a+)+or(a*)* - Complex alternations with overlapping possibilities
- Patterns with extensive backtracking on non-matches
Pathological Pattern Example
graph LR
subgraph Default["Default Engine: O(n) Linear Time"]
direction LR
D1["Input:
'test test'"] --> D2[Finite Automata] --> D3[Single Pass] --> D4["Result in
~n steps"]
end
subgraph PCRE2Good["PCRE2 with Backreferences: O(n)"]
direction LR
P1["Input:
'test test'"] --> P2["Pattern:
(\w+)\s+\1"] --> P3["Capture
'test'"] --> P4[Match space] --> P5["Compare
with \1"] --> P6["Result in
~n steps"]
end
subgraph PCRE2Bad["PCRE2 Pathological: O(2^n)"]
direction LR
B1["Input:
'aaaa...'"] --> B2["Pattern:
(a+)+b"] --> B3["Try:
a,a,a,a..."] --> B4["Backtrack:
aa,a,a..."] --> B5["Backtrack:
a,aa,a..."] --> B6["Backtrack:
aaa,a..."] --> B7["... 2^n
combinations"] --> B8["No match
found"]
end
style D2 fill:#e8f5e9
style D4 fill:#e8f5e9
style P3 fill:#e1f5ff
style P6 fill:#e1f5ff
style B3 fill:#ffebee
style B4 fill:#ffebee
style B5 fill:#ffebee
style B6 fill:#ffebee
style B7 fill:#ffebee
style B8 fill:#ffebee
Figure: Comparison of engine complexity - default engine maintains linear time, PCRE2 can be linear for good patterns but exponential for pathological ones.
Performance Optimizations¶
PCRE2 JIT Compilation: When available, PCRE2's JIT (Just-In-Time) compiler can improve performance by 2-10x compared to interpreted PCRE2. However, JIT-compiled PCRE2 is typically still slower than ripgrep's default engine.
Automatic Engine Selection: Use --engine auto to let ripgrep choose the best engine based on pattern features:
# Source: crates/core/flags/defs.rs:1710-1721
# Automatically selects PCRE2 for backreferences
rg --engine auto '(\w+)\s+\1'
# Automatically uses default engine for simple patterns
rg --engine auto 'simple_pattern'
Best Practices¶
- Use default engine unless you specifically need backreferences or other PCRE2-only features
- Test patterns on representative data before using in production
- Limit search scope when using PCRE2 (specific files/directories rather than entire codebases)
- Avoid nested quantifiers that can cause exponential backtracking
- Profile with
--statsto measure actual performance impact
See Performance for detailed comparisons and optimization techniques.
Troubleshooting¶
"Backreferences are not supported" Error¶
If you try to use backreferences without the PCRE2 engine, ripgrep will detect this and provide a helpful error message:
# This will fail with default engine
$ rg '(\w+)\s+\1' file.txt
error: regex parse error:
(\w+)\s+\1
^^
backreferences are not supported
Consider enabling PCRE2 with the --pcre2 flag, which can handle backreferences
and look-around.
Solutions:
Automatic Detection
Ripgrep analyzes error messages and automatically suggests using --pcre2 when it detects backreference syntax in patterns that fail with the default engine.
Named Capture Groups¶
Named capture groups make complex patterns more readable and are essential for sophisticated replacements.
Defining Named Captures¶
Both engines support named capture groups with (?P<name>...) syntax:
Using Named Captures in Replacements¶
Reference named captures in replacements with $name or ${name}:
# Extract function names
rg '(?P<func>\w+)\(' -r 'Function: $func'
# Restructure matches
rg '(?P<first>\w+), (?P<last>\w+)' -r '$last, $first'
# Use braces for disambiguation
rg '(?P<word>\w+)' -r '${word}s' # Pluralize
Named vs Numbered Captures¶
# Numbered captures ($1, $2, ...)
rg '(\w+)@(\w+)\.com' -r 'User: $1, Domain: $2'
# Named captures (more readable for complex patterns)
rg '(?P<user>\w+)@(?P<domain>\w+)\.com' -r 'User: $user, Domain: $domain'
Named captures are particularly valuable for: - Complex patterns with many capture groups - Making replacement expressions self-documenting - Maintaining readability in large patterns
See the Replacements chapter for more examples.
Related Topics¶
- PCRE2 Engine - Comprehensive guide to PCRE2 features and capabilities
- Regex Basics - Understanding default engine limitations and when to use PCRE2
- Performance - Detailed performance comparisons and optimization strategies
- Replacements - Using backreferences in replacement expressions