Regular expressions (RegEx) are one of the most powerful tools in a developer’s toolkit for working with text. Whether you’re validating user input, parsing log files, or cleaning up data, understanding RegEx can save you hours of manual work.
Regular Expression
A regular expression is a pattern that describes a set of strings. It’s used for matching, searching, replacing, and extracting text based on patterns rather than fixed words.
Common RegEx Syntax
Pattern | Meaning | Example |
---|
. | Any character except newline | a.b matches acb , a1b , etc. |
* | Zero or more repetitions of the previous character | lo* matches l , lo , loo , … |
+ | One or more repetitions | go+gle matches gogle , google |
? | Zero or one occurrence | colou?r matches color or colour |
[] | A set of characters | [abc] matches a , b , or c |
[^] | Negation set | [^0-9] matches any non-digit |
^ | Start of string | ^Hello matches strings that start with “Hello” |
$ | End of string | end$ matches strings that end with “end” |
{n} | Exactly n times | a{3} matches aaa |
{n,m} | Between n and m times | a{2,4} matches aa , aaa , or aaaa |
| | OR condition | |
() | Grouping | (ab)+ matches ab , abab , … |
backslash \
The backslash \
in regular expressions is an escape character. It has two main purposes:
- Case1) It turns normal characters into special patterns that match specific types of characters:
Pattern | Meaning | Matches |
---|
\d | Digit | Matches any number (0–9 ) |
\w | Word | Matches letters, digits, and underscore ([a-zA-Z0-9_] ) |
\s | Whitespace | Matches space, tab, newline, etc. |
\b | Word boundary | Matches the edge of a word |
\n | Newline | Line break |
\t | Tab | Tab character |
- Case2) If you want to match them literally, you need to escape them using
\
.
Pattern | Meaning |
---|
. | Any character (except newline) |
\. | A literal dot (. ) |
\* | A literal asterisk (* ) |
Why Use r"..."
in Python?
In Python strings, the backslash is also used for escape sequences like \n
(newline), \t
(tab), etc. Python will treat \d
as an invalid escape sequence. Instead, you should use a raw string:
1
2
3
| pattern = "\d+" # This may cause issues
pattern = r"\d+" # Correct
|
Examples
Pattern | Description |
---|
^\d{4}-\d{2}-\d{2}$ | Validates dates in YYYY-MM-DD format |
^[a-zA-Z0-9_]{4,12}$ | Username: 4–12 characters, letters/numbers/underscores |
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ | Valid email address |
^https?://[^\s]+$ | URL starting with http or https |
(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,} | Password: min 8 chars, must include uppercase, lowercase, and digit |
1
2
3
4
5
6
7
8
9
10
11
12
13
| import re
pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
text = "Contact: jaoneol@gmail.com"
match = re.search(pattern, text)
if match:
print("Found:", match.group()) # jaoneol@gmail.com
text = "Contact: jaon=eol@gmail.com"
match = re.search(pattern, text)
if match:
print("Found:", match.group()) # eol@gmail.com
|
1
2
3
4
5
6
7
8
9
10
| import re
pattern = r"(\d+)[번호\s]*"
text = "13번으로 하자"
match = re.search(pattern, text)
if match:
print(match.group(0)) # "13번"
print(match.group(1)) # "13"
print(match.group(2)) # IndexError: no such group
|
1
2
3
4
5
6
7
8
9
10
| import re
pattern = r"(\d+)-([a-z]+)"
text = "123-abc"
match = re.search(pattern, text)
if match:
print(match.group(0)) # 123-abc
print(match.group(1)) # 123
print(match.group(2)) # abc
|
1
2
3
4
5
6
7
8
| import re
pattern = r"[abc]"
text = "hello bat"
match = re.search(pattern, text)
if match:
print(match.group()) # b
|
Common RegEx Functions in Python
Function | Description | Example |
---|
re.search() | Returns the first match anywhere in the string | re.search(r"\d+", "Age: 25") → Match: 25 |
re.match() | Matches a pattern only at the beginning of the string | re.match(r"\d+", "123abc") → Match: 123 re.match(r"\d+", "abc123") → None |
re.fullmatch() | Matches the entire string | re.fullmatch(r"\d+", "123") → Match: 123 re.fullmatch(r"\d+", "123abc") → None |
re.findall() | Returns all non-overlapping matches as a list | re.findall(r"\d+", "a1b22c333") → ['1', '22', '333'] |
re.finditer() | Returns an iterator of match objects (useful with positions) | for m in re.finditer(r"\d+", text): print(m.group()) |
re.sub() | Substitutes matches with a replacement string | re.sub(r"\d+", "#", "a1b22c") → 'a#b#c' |
re.split() | Splits the string by the pattern | re.split(r"[,;]", "a,b;c") → ['a', 'b', 'c'] |
Example
- re.search() Find the first match anywhere in the string.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| import re
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
text = "jaoneol@gmail.com"
match = re.match(pattern, text)
if match:
print(match.group(0)) # Output: "jaoneol@gmail.com"
pattern = r"\d{4}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
print(match.group(0)) # Output: "2025-06-01"
pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
print(match.group(0)) # Output: "25-06-01"
|
re.match()
Match must start at the beginning of the string.
1
2
3
4
5
6
7
8
9
10
11
| pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 is the deadline"
match = re.match(pattern, text)
if match:
print(match.group(0)) # Output: "2025-06-01"
pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match: # None
print(match.group(0))
|
re.fullmatch()
Match must cover the entire string.
1
2
3
4
5
6
7
8
9
10
| pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01"
match = re.fullmatch(pattern, text)
if match:
print(match.group(0)) # Output: "2025-06-01"
pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 today"
match = re.fullmatch(pattern, text)
print(match) # Output: None
|
re.findall()
Return all matches as a list of strings (or groups).
1
2
3
4
| pattern = r"\d+"
text = "I have 3 apples and 15 bananas."
matches = re.findall(pattern, text)
print(matches) # Output: ['3', '15']
|
re.finditer()
Return iterable of match objects (with position info).
1
2
3
4
5
6
7
8
9
| pattern = r"\d+"
text = "Order 12, then 345, and 6789."
matches = re.finditer(pattern, text)
for match in matches:
print(match.group(0), "at", match.span())
# Output:
# 12 at (6, 8)
# 345 at (14, 17)
# 6789 at (23, 27)
|
re.sub()
Replace all matches with another string.
1
2
3
4
| pattern = r"\d+"
text = "Replace 123 and 456 with X"
result = re.sub(pattern, "X", text)
print(result) # Output: "Replace X and X with X"
|
re.split()
Split string using matches as delimiters.
1
2
3
4
| pattern = r"[,\s]+" # split by comma or whitespace
text = "apple, banana cherry mango"
parts = re.split(pattern, text)
print(parts) # Output: ['apple', 'banana', 'cherry', 'mango']
|