Use Regular Expressions in Python — re module, patterns, flags
PythonSearch, extract, split, and replace text reliably with Python’s re.

Python’s re module provides fast, pattern-based text processing for searching, extracting, splitting, and replacing strings. This guide shows practical methods, with steps you can copy and adapt, plus the key flags and match APIs you’ll use day to day.
Before you start
Regular expressions (regex) are patterns that describe sets of strings. In Python, patterns live in plain strings, and backslashes both in Python and in regex have special meaning. Use raw strings (prefix r''
) for patterns so backslashes aren’t interpreted twice.
Method 1 — Compile a pattern once and reuse it
Compiling is best for loops and repeated use because it avoids re-parsing the pattern each call and gives you convenient methods.
Step 1: Import re
.
import re
Step 2: Compile the regex with a raw string.
email = re.compile(r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}')
Step 3: Call the compiled pattern’s methods (e.g., search
, match
, findall
).
m = email.search('Contact us at support@example.com.')
print(m.group(0)) # 'support@example.com'
Method 2 — Use module-level functions for quick operations
For one-off checks, call re.search
, re.match
, or re.fullmatch
directly; Python caches recent patterns internally.
Step 1: Choose the right function for position sensitivity.
search
finds a match anywhere in the string.match
requires the match at the beginning.fullmatch
requires the whole string to match.
Step 2: Use raw-string patterns to avoid double escaping.
import re
txt = 'The rain in Spain'
print(bool(re.search(r'Spain', txt))) # True
print(bool(re.match(r'The', txt))) # True
print(bool(re.fullmatch(r'\w+\s\w+\s\w+', txt))) # True
Method 3 — Extract multiple matches with findall
or finditer
findall
returns a list; finditer
yields match objects lazily (better for large inputs or when you need positions).
Step 1: Use findall
when you only need the matched text.
import re
numbers = re.findall(r'\d+', 'Call 65490, ext 12, ref 2025.')
print(numbers) # ['65490', '12', '2025']
Step 2: Use finditer
to loop matches and access spans and groups.
for m in re.finditer(r'\b\w+ly\b', 'Carefully but quickly.'):
print(m.group(0), m.span())
Method 4 — Split strings by regex delimiters
re.split
can use complex separators and optionally keep delimiters with capturing groups.
Step 1: Split on a pattern, not just a literal string.
import re
print(re.split(r'\W+', 'Words, words, words.')) # ['Words', 'words', 'words', '']
Step 2: Capture the delimiter to keep it in the result.
print(re.split(r'(\W+)', 'Words, words, words.'))
# ['Words', ', ', 'words', ', ', 'words', '.', '']
Method 5 — Replace text with sub
and count changes with subn
Regex-based replacement can use backreferences or a function for custom logic.
Step 1: Replace matches with a string (optionally limit count
).
import re
print(re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE))
# 'Baked Beans & Spam'
Step 2: Replace using a function for computed results.
def to_hex(m):
return hex(int(m.group(0)))
print(re.sub(r'\d+', to_hex, 'Call 65490, ref 49152.'))
# 'Call 0xffd2, ref 0xc000.'
Step 3: Use subn
to also get the number of replacements.
new_text, n = re.subn(r'\s+', ' ', 'Too many spaces')
print(new_text, n) # 'Too many spaces', 2
Method 6 — Work with match objects: group
, span
, start
, end
Match objects hold positions and captured groups so you can extract structured data.
Step 1: Capture groups and access them by number or name.
import re
m = re.search(r'(?P<month>[A-Za-z]+) (?P<day>\d+)', 'Born June 24')
print(m.group(0), m.group('month'), m.group(2)) # 'June 24', 'June', '24'
Step 2: Get index positions to slice the original string.
s = 'The rain in Spain'
m = re.search(r'\bS\w+', s)
print(m.span(), s[m.start():m.end()]) # (12, 17), 'Spain'
Method 7 — Apply flags for case, multiline, dot matches, and readability
Flags modify how patterns behave. Common ones are re.I
(ignore case), re.M
(multiline anchors), re.S
(dot matches newline), and re.X
(verbose pattern layout).
Step 1: Use case-insensitive matching with re.IGNORECASE
.
import re
print(bool(re.search(r'foo', 'FOO', flags=re.I))) # True
Step 2: Anchor on each line with re.MULTILINE
.
print([m.group(0) for m in re.finditer(r'^bar', 'foo\nbar\nbaz', flags=re.M)])
# ['bar']
Step 3: Let dot match newlines with re.DOTALL
.
print(bool(re.search(r'foo.*bar', 'foo\nbar', flags=re.S))) # True
Step 4: Format complex patterns with re.VERBOSE
.
phone = re.compile(r'''
^(\(\d{3}\))? # optional area code
\s* # optional spaces
\d{3}[-.] # prefix
\d{4}$ # line number
''', re.X)
print(bool(phone.search('(712) 414-9229'))) # True
Method 8 — Use groups, named groups, and backreferences
Grouping structures data and enables reuse within the same pattern.
Step 1: Group alternatives and repeat them as a unit.
m = re.search(r'(foo|bar)+', 'barfoo')
print(m.group(0)) # 'barfoo'
Step 2: Name groups for clearer access.
m = re.search(r'(?P<user>[\w.-]+)@(?P<host>[\w.-]+)', 'alice@example.com')
print(m.group('user'), m.group('host'))
Step 3: Reuse a previous group with a backreference.
m = re.search(r'(\w+),\1', 'foo,foo')
print(bool(m)) # True
Method 9 — Process files line by line with a compiled regex
For large files, stream lines and apply a precompiled pattern to keep memory usage low and speed consistent.
Step 1: Compile the pattern once before iterating the file.
import re
call = re.compile(r'f\(\s*([^,]+)\s*,\s*([^,]+)\s*\)')
Step 2: Iterate lines and use search
or findall
per line.
with open('input.txt', encoding='utf-8') as fh:
for line in fh:
for a, b in call.findall(line):
print(a, b)
Method 10 — Prefer string methods for simple tasks
Benchmarking aside, simple fixed-string operations are clearer and typically faster with built-in string methods or operators.
Step 1: Use in
, str.find
, or str.replace
when no pattern logic is required.
s = 'swordfish'
print('fish' in s) # True
print(s.replace('fish', 'pie')) # 'swordpie'
Step 2: Switch to regex only when you need pattern features like character classes, quantifiers, or anchors.
Method 11 — Write reliable patterns with raw strings and debug tools
Escaping can be tricky; raw strings and debugging output keep patterns correct and maintainable.
Step 1: Always write regexes as raw strings to avoid accidental escapes.
path = r'C:\Users\Name' # not 'C:\\Users\\Name'
Step 2: Use re.DEBUG
to inspect how a pattern is parsed.
import re
re.search(r'foo.bar', 'fooxbar', flags=re.DEBUG) # prints tokenization
Step 3: Lay out complex patterns with re.VERBOSE
and comments for long-term readability.
Quick reference — useful tokens
Common metacharacters you’ll use frequently:
.
any char except newline;re.S
makes it include newline.\d \w \s
digits, word chars, whitespace (and\D \W \S
are the opposites).^ $
start/end of string; withre.M
they also match line starts/ends.* + ? {m,n}
quantifiers for repetition (append?
for non-greedy).[]
character class,|
alternation,()
grouping,(?:)
non-capturing group.\A \Z \b \B
anchors for start, end, word boundary, and non-boundary.
Regex gives you precise, repeatable control over text processing in Python. Start with compiled patterns for loops, prefer string methods for simple tasks, and lean on flags, groups, and match objects to keep your code both robust and readable.
Comments