Regular Expression

Let’s explore regular expressions with confidence.

Posted Dec 7, 2024

By DS2Man

5 min read

Regular Expression

Regular expressions (RegEx) are one of the most powerful tools in a developer’s toolkit for working with text. Whether you’re validating user input, parsing log files, or cleaning up data, understanding RegEx can save you hours of manual work.

Regular Expression

A regular expression is a pattern that describes a set of strings. It’s used for matching, searching, replacing, and extracting text based on patterns rather than fixed words.

Common RegEx Syntax

Pattern	Meaning	Example
`.`	Any character except newline	`a.b` matches `acb`, `a1b`, etc.
`*`	Zero or more repetitions of the previous character	`lo*` matches `l`, `lo`, `loo`, …
`+`	One or more repetitions	`go+gle` matches `gogle`, `google`
`?`	Zero or one occurrence	`colou?r` matches `color` or `colour`
`[]`	A set of characters	`[abc]` matches `a`, `b`, or `c`
`[^]`	Negation set	`[^0-9]` matches any non-digit
`^`	Start of string	`^Hello` matches strings that start with “Hello”
`$`	End of string	`end$` matches strings that end with “end”
`{n}`	Exactly n times	`a{3}` matches `aaa`
`{n,m}`	Between n and m times	`a{2,4}` matches `aa`, `aaa`, or `aaaa`
`\|`	OR condition
`()`	Grouping	`(ab)+` matches `ab`, `abab`, …

backslash `\`

The backslash \ in regular expressions is an escape character. It has two main purposes:

Case1) It turns normal characters into special patterns that match specific types of characters:

Pattern	Meaning	Matches
`\d`	Digit	Matches any number (`0–9`)
`\w`	Word	Matches letters, digits, and underscore (`[a-zA-Z0-9_]`)
`\s`	Whitespace	Matches space, tab, newline, etc.
`\b`	Word boundary	Matches the edge of a word
`\n`	Newline	Line break
`\t`	Tab	Tab character

Case2) If you want to match them literally, you need to escape them using \.

Pattern	Meaning
`.`	Any character (except newline)
`\.`	A literal dot (`.`)
`\*`	A literal asterisk (`*`)

Why Use `r"..."` in Python?

In Python strings, the backslash is also used for escape sequences like \n (newline), \t (tab), etc. Python will treat \d as an invalid escape sequence. Instead, you should use a raw string:

  
pattern = "\d+"  # This may cause issues

pattern = r"\d+" # Correct

Examples

Pattern	Description
`^\d{4}-\d{2}-\d{2}$`	Validates dates in `YYYY-MM-DD` format
`^[a-zA-Z0-9_]{4,12}$`	Username: 4–12 characters, letters/numbers/underscores
`^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$`	Valid email address
`^https?://[^\s]+$`	URL starting with `http` or `https`
`(?=.[A-Z])(?=.[a-z])(?=.*\d).{8,}`	Password: min 8 chars, must include uppercase, lowercase, and digit

  
import re

pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

text = "Contact: jaoneol@gmail.com"
match = re.search(pattern, text)
if match:
    print("Found:", match.group()) # jaoneol@gmail.com

text = "Contact: jaon=eol@gmail.com"
match = re.search(pattern, text)
if match:
    print("Found:", match.group()) # eol@gmail.com

  
import re

pattern = r"(\d+)[번호\s]*" 

text = "13번으로 하자"
match = re.search(pattern, text)
if match:
    print(match.group(0))  # "13번"
    print(match.group(1))  # "13"
    print(match.group(2))  # IndexError: no such group

  
import re

pattern = r"(\d+)-([a-z]+)"

text = "123-abc"
match = re.search(pattern, text)
if match:
    print(match.group(0)) # 123-abc
    print(match.group(1)) # 123
    print(match.group(2)) # abc

  
import re

pattern = r"[abc]"

text = "hello bat"
match = re.search(pattern, text)
if match:
    print(match.group()) # b

Common RegEx Functions in Python

Function	Description	Example
`re.search()`	Returns the first match anywhere in the string	`re.search(r"\d+", "Age: 25")` → Match: `25`
`re.match()`	Matches a pattern only at the beginning of the string	`re.match(r"\d+", "123abc")` → Match: `123` `re.match(r"\d+", "abc123")` → `None`
`re.fullmatch()`	Matches the entire string	`re.fullmatch(r"\d+", "123")` → Match: `123` `re.fullmatch(r"\d+", "123abc")` → `None`
`re.findall()`	Returns all non-overlapping matches as a list	`re.findall(r"\d+", "a1b22c333")` → `['1', '22', '333']`
`re.finditer()`	Returns an iterator of match objects (useful with positions)	`for m in re.finditer(r"\d+", text): print(m.group())`
`re.sub()`	Substitutes matches with a replacement string	`re.sub(r"\d+", "#", "a1b22c")` → `'a#b#c'`
`re.split()`	Splits the string by the pattern	`re.split(r"[,;]", "a,b;c")` → `['a', 'b', 'c']`

Example

re.search() Find the first match anywhere in the string.

  
import re

pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
text = "jaoneol@gmail.com"
match = re.match(pattern, text)
if match:
    print(match.group(0))  # Output: "jaoneol@gmail.com"

pattern = r"\d{4}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match:
    print(match.group(0))  # Output: "25-06-01"

re.match() Match must start at the beginning of the string.

  
pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 is the deadline"
match = re.match(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{2}-\d{2}-\d{2}"
text = "Today's date is 2025-06-01."
match = re.search(pattern, text)
if match: # None
    print(match.group(0))

re.fullmatch() Match must cover the entire string.

  
pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01"
match = re.fullmatch(pattern, text)
if match:
    print(match.group(0))  # Output: "2025-06-01"

pattern = r"\d{4}-\d{2}-\d{2}"
text = "2025-06-01 today"
match = re.fullmatch(pattern, text)
print(match)  # Output: None

re.findall() Return all matches as a list of strings (or groups).

  
pattern = r"\d+"
text = "I have 3 apples and 15 bananas."
matches = re.findall(pattern, text)
print(matches)  # Output: ['3', '15']

re.finditer() Return iterable of match objects (with position info).

  
pattern = r"\d+"
text = "Order 12, then 345, and 6789."
matches = re.finditer(pattern, text)
for match in matches:
    print(match.group(0), "at", match.span())
# Output:
# 12 at (6, 8)
# 345 at (14, 17)
# 6789 at (23, 27)

re.sub() Replace all matches with another string.

  
pattern = r"\d+"
text = "Replace 123 and 456 with X"
result = re.sub(pattern, "X", text)
print(result)  # Output: "Replace X and X with X"

re.split() Split string using matches as delimiters.

  
pattern = r"[,\s]+"  # split by comma or whitespace
text = "apple, banana  cherry   mango"
parts = re.split(pattern, text)
print(parts)  # Output: ['apple', 'banana', 'cherry', 'mango']

Grammer, Python

Python Grammer

This post is licensed under CC BY 4.0 by the author.

Regular Expression

Common RegEx Syntax

backslash \

Why Use r"..." in Python?

Examples

Common RegEx Functions in Python

Example

Trending Tags

backslash `\`

Why Use `r"..."` in Python?