Post

Regular Expression

Let’s explore regular expressions with confidence.

Regular Expression

Regular expressions (RegEx) are one of the most powerful tools in a developer’s toolkit for working with text. Whether you’re validating user input, parsing log files, or cleaning up data, understanding RegEx can save you hours of manual work.

Regular Expression

A regular expression is a pattern that describes a set of strings. It’s used for matching, searching, replacing, and extracting text based on patterns rather than fixed words.

Common RegEx Syntax

PatternMeaningExample
.Any character except newlinea.b matches acb, a1b, etc.
*Zero or more repetitions of the previous characterlo* matches l, lo, loo, …
+One or more repetitionsgo+gle matches gogle, google
?Zero or one occurrencecolou?r matches color or colour
[]A set of characters[abc] matches a, b, or c
[^]Negation set[^0-9] matches any non-digit
^Start of string^Hello matches strings that start with “Hello”
$End of stringend$ matches strings that end with “end”
{n}Exactly n timesa{3} matches aaa
{n,m}Between n and m timesa{2,4} matches aa, aaa, or aaaa
|OR condition 
()Grouping(ab)+ matches ab, abab, …

backslash \

The backslash \ in regular expressions is an escape character. It has two main purposes:

  • Case1) It turns normal characters into special patterns that match specific types of characters:
PatternMeaningMatches
\dDigitMatches any number (0–9)
\wWordMatches letters, digits, and underscore ([a-zA-Z0-9_])
\sWhitespaceMatches space, tab, newline, etc.
\bWord boundaryMatches the edge of a word
\nNewlineLine break
\tTabTab character
  • Case2) If you want to match them literally, you need to escape them using \.
PatternMeaning
.Any character (except newline)
\.A literal dot (.)
\*A literal asterisk (*)

Why Use r"..." in Python?

In Python strings, the backslash is also used for escape sequences like \n (newline), \t (tab), etc. Python will treat \d as an invalid escape sequence. Instead, you should use a raw string:

1
2
3
pattern = "\d+"  # This may cause issues

pattern = r"\d+" # Correct

Examples

PatternDescription
^\d{4}-\d{2}-\d{2}$Validates dates in YYYY-MM-DD format
^[a-zA-Z0-9_]{4,12}$Username: 4–12 characters, letters/numbers/underscores
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$Valid email address
^https?://[^\s]+$URL starting with http or https
(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}Password: min 8 chars, must include uppercase, lowercase, and digit
1
2
3
4
5
6
7
8
9
10
11
12
13
import re

pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

text = "Contact: jaoneol@gmail.com"
match = re.search(pattern, text)
if match:
    print("Found:", match.group()) # jaoneol@gmail.com

text = "Contact: jaon=eol@gmail.com"
match = re.search(pattern, text)
if match:
    print("Found:", match.group()) # eol@gmail.com
1
2
3
4
5
6
7
8
9
10
import re

pattern = r"(\d+)[번호\s]*" 

text = "13번으로 하자"
match = re.search(pattern, text)
if match:
    print(match.group(0))  # "13번"
    print(match.group(1))  # "13"
    print(match.group(2))  # IndexError: no such group
1
2
3
4
5
6
7
8
9
10
import re

pattern = r"(\d+)-([a-z]+)"

text = "123-abc"
match = re.search(pattern, text)
if match:
    print(match.group(0)) # 123-abc
    print(match.group(1)) # 123
    print(match.group(2)) # abc
1
2
3
4
5
6
7
8
import re

pattern = r"[abc]"

text = "hello bat"
match = re.search(pattern, text)
if match:
    print(match.group()) # b

Common RegEx Functions in Python

FunctionDescriptionExample
re.search()Returns the first match anywhere in the stringre.search(r"\d+", "Age: 25") → Match: 25
re.match()Matches a pattern only at the beginning of the stringre.match(r"\d+", "123abc") → Match: 123 re.match(r"\d+", "abc123")None
re.fullmatch()Matches the entire stringre.fullmatch(r"\d+", "123") → Match: 123 re.fullmatch(r"\d+", "123abc")None
re.findall()Returns all non-overlapping matches as a listre.findall(r"\d+", "a1b22c333")['1', '22', '333']
re.finditer()Returns an iterator of match objects (useful with positions)for m in re.finditer(r"\d+", text): print(m.group())
re.sub()Substitutes matches with a replacement stringre.sub(r"\d+", "#", "a1b22c")'a#b#c'
re.split()Splits the string by the patternre.split(r"[,;]", "a,b;c")['a', 'b', 'c']

Example

  1. re.search() Find the first match anywhere in the string.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import re

pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
text = "jaoneol@gmail.com"
match = re.match(pattern, text)
if match:
    print(match.group(0))  # Output: "jaoneol@gmail.com"

pattern = r"\d{4}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
    print(match.group(0))  # Output: "25-06-01"
  1. re.match() Match must start at the beginning of the string.
1
2
3
4
5
6
7
8
9
10
11
pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 is the deadline"
match = re.match(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match: # None
    print(match.group(0))
  1. re.fullmatch() Match must cover the entire string.
1
2
3
4
5
6
7
8
9
10
pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01"
match = re.fullmatch(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 today"
match = re.fullmatch(pattern, text)
print(match)  # Output: None
  1. re.findall() Return all matches as a list of strings (or groups).
1
2
3
4
pattern = r"\d+"
text = "I have 3 apples and 15 bananas."
matches = re.findall(pattern, text)
print(matches)  # Output: ['3', '15']
  1. re.finditer() Return iterable of match objects (with position info).
1
2
3
4
5
6
7
8
9
pattern = r"\d+"
text = "Order 12, then 345, and 6789."
matches = re.finditer(pattern, text)
for match in matches:
    print(match.group(0), "at", match.span())
# Output:
# 12 at (6, 8)
# 345 at (14, 17)
# 6789 at (23, 27)
  1. re.sub() Replace all matches with another string.
1
2
3
4
pattern = r"\d+"
text = "Replace 123 and 456 with X"
result = re.sub(pattern, "X", text)
print(result)  # Output: "Replace X and X with X"
  1. re.split() Split string using matches as delimiters.
1
2
3
4
pattern = r"[,\s]+"  # split by comma or whitespace
text = "apple, banana  cherry   mango"
parts = re.split(pattern, text)
print(parts)  # Output: ['apple', 'banana', 'cherry', 'mango']
This post is licensed under CC BY 4.0 by the author.