Regular Expressions¶

Ripgrep uses Rust's regex-automata library (a DFA-based engine) by default, which provides excellent performance for most patterns while supporting a rich set of regex features.

Basic Metacharacters¶

# . matches any character (except newline)
rg "error."          # Matches "error1", "errors", "error!"

# * means zero or more of the preceding element
rg "lo*p"            # Matches "lp", "loop", "looop"

# + means one or more of the preceding element
rg "lo+p"            # Matches "loop", "looop" but not "lp"

# ? means zero or one of the preceding element
rg "colou?r"         # Matches "color" or "colour"

# ^ matches start of line
rg "^TODO"           # Matches lines starting with "TODO"

# $ matches end of line
rg "error$"          # Matches lines ending with "error"

Character Classes¶

# [abc] matches any single character a, b, or c
rg "[aeiou]"         # Matches any vowel

# [a-z] matches any character in the range
rg "[0-9]+"          # Matches one or more digits
rg "[a-zA-Z]+"       # Matches alphabetic words

# [^abc] matches any character except a, b, or c
rg "[^0-9]"          # Matches any non-digit character

Character Class Negation

The ^ character only means negation when it's the first character inside square brackets. [^0-9] means "not a digit", but [0-9^] means "a digit or a caret character".

Prefer Predefined Classes

Use predefined classes like \d instead of [0-9] and \w instead of [a-zA-Z0-9_] for better readability.

Note on Unicode support: In the default regex engine, \d is equivalent to [0-9] (ASCII digits only). For matching Unicode digits across all scripts, use \p{N} instead. PCRE2 may have different Unicode handling.

Predefined Character Classes¶

# \d matches any digit [0-9]
rg "\d+"             # Matches numbers like "42", "123"

# \w matches word characters [a-zA-Z0-9_]
rg "\w+"             # Matches words

# \s matches whitespace characters (space, tab, newline)
rg "\s+error"        # Matches "error" with leading whitespace

# \D matches non-digits
# \W matches non-word characters
# \S matches non-whitespace

Unicode Character Classes¶

Ripgrep supports Unicode character classes using the \p{} syntax:

# \p{L} matches any Unicode letter
rg "\p{L}+"          # Matches words in any language

# \p{N} matches any Unicode number
rg "\p{N}+"          # Matches numeric characters

# \p{P} matches any Unicode punctuation
rg "\p{P}"           # Matches punctuation marks

# \p{Ll} matches lowercase letters
rg "\p{Ll}+"         # Matches lowercase words

# \p{Lu} matches uppercase letters
rg "\p{Lu}+"         # Matches uppercase words

# \P{} negates the class (matches anything except)
rg "\P{L}+"          # Matches non-letter characters

Common Unicode categories:

Pattern	Description	Example
`\p{L}`	Any letter	Matches "café", "日本", "hello"
`\p{N}`	Any number	Matches "42", "৪২" (Bengali)
`\p{P}`	Any punctuation	Matches ".", "!", "?"
`\p{S}`	Any symbol	Matches "$", "€", "©"
`\p{Ll}`	Lowercase letter	Matches "abc", "ñ"
`\p{Lu}`	Uppercase letter	Matches "ABC", "Ñ"

Unicode scripts allow matching specific writing systems:

# \p{Greek} matches Greek script characters
rg "\p{Greek}+"       # Matches "Ελληνικά", "Ω"

# \p{Han} matches Chinese/Japanese/Korean Han characters
rg "\p{Han}+"         # Matches "日本語", "中文"

# \p{Arabic} matches Arabic script
rg "\p{Arabic}+"      # Matches "العربية"

# \p{Cyrillic} matches Cyrillic script
rg "\p{Cyrillic}+"    # Matches "Русский"

Quantifiers¶

# {n} matches exactly n times
rg "[0-9]{3}"        # Matches exactly 3 digits like "123"

# {n,} matches n or more times
rg "[a-z]{5,}"       # Matches words with 5 or more letters

# {n,m} matches between n and m times
rg "[0-9]{2,4}"      # Matches 2-4 digits like "42", "123", "1234"

Greedy vs Non-Greedy Quantifiers

By default, quantifiers (*, +, ?, {n,m}) are greedy - they match as much text as possible. Add ? after the quantifier to make it non-greedy (match as little as possible):

.* matches as much as possible (greedy)
.*? matches as little as possible (non-greedy)

Example: In "<div>text</div>", the pattern <.*> matches the entire string, but <.*?> matches just <div>.

Groups and Alternation¶

# () creates a capturing group
rg "(error|warning): (.+)"    # Captures error/warning and message

# Named capture groups with (?P<name>...)
rg "(?P<level>error|warning): (?P<msg>.+)"   # Named captures for clarity

# | means "or"
rg "error|warning"            # Matches either "error" or "warning"

# Non-capturing groups with (?:)
rg "(?:http|https)://\S+"     # Matches URLs

Named capture groups are especially useful with replacement operations (see the Replacements chapter) where you can refer to captures by name instead of number.

Common Pattern Examples¶

Real-World Patterns

These patterns are useful for searching codebases and logs:

# Email addresses
rg "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"  # (1)!

# IP addresses (simplified)
rg "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"                # (2)!

# Hexadecimal colors
rg "#[0-9a-fA-F]{6}"                                       # (3)!

# Function calls (basic)
rg "\w+\([^)]*\)"                                          # (4)!

# URLs
rg "https?://[^\s]+"                                       # (5)!

\b ensures word boundaries, + matches one or more allowed email characters
Matches four groups of 1-3 digits separated by dots (doesn't validate valid IP ranges)
# followed by exactly 6 hexadecimal digits for colors like #FF5733
\w+ matches function name, [^)]* matches any characters except closing paren
s? makes the 's' in https optional, [^\s]+ matches any non-whitespace

Regex Engine Selection¶

Ripgrep supports multiple regex engines that you can select using the --engine flag:

# Use default engine (Rust regex-automata, DFA-based)
rg pattern              # Implicit default
rg --engine default pattern

# Use PCRE2 engine (same as -P flag)
rg --engine pcre2 pattern
rg -P pattern           # Shorthand

# Auto-select engine based on pattern
rg --engine auto pattern
rg --auto-hybrid-regex pattern  # DEPRECATED: Use --engine auto instead

When to use different engines:

Default (regex-automata): Fast, efficient for most patterns. Use for general searches.
PCRE2: Supports advanced features not in default engine. Use when you need backreferences, lookahead/lookbehind, or other Perl-compatible features.
Auto: Lets ripgrep choose the best engine for your pattern.

flowchart TD
    Start[Need Regex Pattern] --> Check{"Pattern
Requirements?"}

    Check -->|Basic matching, Character classes, Quantifiers| Default["Use Default Engine
Fast & Efficient"]
    Check -->|Backreferences, Lookahead/behind, Advanced features| PCRE["Use PCRE2 Engine
-P flag"]
    Check -->|Not sure| Auto["Use Auto Mode
--engine auto"]

    Default --> Search[Execute Search]
    PCRE --> Search
    Auto --> Detect{"Auto-detect
Pattern Type"}
    Detect -->|Simple| UseDefault[Select Default]
    Detect -->|Complex| UsePCRE[Select PCRE2]
    UseDefault --> Search
    UsePCRE --> Search

    style Default fill:#e8f5e9
    style PCRE fill:#fff3e0
    style Auto fill:#e1f5ff
    style Search fill:#f3e5f5

Figure: Decision flow for selecting the appropriate regex engine based on pattern requirements.

Auto Engine Selection

Use --engine auto when you're not sure which engine to use. Ripgrep will automatically select the best engine based on your pattern's complexity.

Deprecated Flag

The --auto-hybrid-regex flag is deprecated. Use --engine auto instead for future compatibility.

PCRE2 Engine¶

Use the -P or --pcre2 flag to enable the Perl-compatible regex engine, which supports advanced features not available in the default engine.

Features available only with PCRE2:

BackreferencesLookahead/LookbehindAdvanced Features

# Refer to previously captured groups with \1, \2, etc.
rg -P "(\w+)\s+\1"       # Finds repeated words like "the the"

# Named backreferences
rg -P "(?P<word>\w+)\s+\k<word>"  # Same with named groups

# Positive lookahead (?=...)
rg -P "error(?=:)"       # Matches "error" only if followed by ":"

# Negative lookahead (?!...)
rg -P "test(?!ing)"      # Matches "test" but not "testing"

# Positive lookbehind (?<=...)
rg -P "(?<=@)\w+"        # Matches username after "@" in email

# Negative lookbehind (?<!...)
rg -P "(?<!un)happy"     # Matches "happy" but not "unhappy"

# Possessive quantifiers (prevent backtracking)
rg -P "\d++\."           # More efficient matching with possessive +

# Atomic groups (?>...) - also prevent backtracking
rg -P "(?>error|warning):"   # No backtracking in group

Performance Tradeoffs

PCRE2 is more powerful but typically slower than the default engine
Use PCRE2 only when you need its specific features
The default engine is optimized for speed and handles most use cases

Default Engine Limitations¶

The default Rust regex engine provides excellent performance but does not support some advanced features:

Not supported in default engine: - Backreferences (\1, \2, etc.) - Lookahead and lookbehind assertions - Possessive quantifiers (*+, ++, etc.) - Atomic groups ((?>...))

If you need these features, use the PCRE2 engine with -P:

# This pattern requires PCRE2 for backreferences
rg -P "(\w+)\s+\1"

# This pattern requires PCRE2 for lookahead
rg -P "error(?=:)"

Troubleshooting Regex Patterns

If your regex pattern isn't working as expected and uses backreferences or lookahead/lookbehind, try adding the -P flag to enable PCRE2.

Multiline Patterns

For multiline pattern matching, ripgrep provides the -U flag. You can also use --multiline-dotall to make . match newlines in multiline mode. See the Advanced Patterns chapter for details.