Python regular expressions (re module): match, search, sub, findall, replace, examples

This post is an easy-to-understand explanation of the re module, a regular expression module for Python. The best tutorial is always the official Python documentation. This article will get you started, and then you can read the official documentation to get a feel for what's going on. Let's get started!

1. What is a regular expression?

Regular expressions, commonly referred to as Regex, or re in Korean, are a way to define search patterns using a set of symbols. Regular expressions allow you to manipulate and find text patterns very flexibly and quickly, making them a powerful tool that every programmer should have in their arsenal.

Python's built-in re module provides a powerful and efficient way to work with regular expressions. Since it's a built-in module, you don't need to install it, you can use it by importing it as shown below.

import re

2. List of regular expression symbols

Python regex has the form r"[regex_pattern] with a raw string. For example, to find all strings starting with the characters "python", you could use the regex pattern r"^python".

Here are the basic regex symbols we use to define regular expression patterns. I've broken them down into a few categories to make them easier to remember.

.: means any character (except newline).

2.1.The beginning and end of a string (boundary anchors)

^: denotes the beginning of a string.
$: means the end of a string.

2.2. String Counts (Quantifiers)

?: means no or one preceding character.
*: No preceding character or one or more preceding characters.
+: means one or more preceding characters.
{n}: means the preceding character occurs exactly n times.
{n,}: means the preceding character occurs more than n times.
{n,m}: means the preceding character occurs between n and m times.

2.3. String format (character classes and sets)

\d: means all numbers.
\D: means all characters except numbers.
\S: means all spaces created by space, \t, \n, \r, \f, \v.
\S: means no space.
\w: means all alphabets and numbers, _.
\W: Means all alphabets, numbers and all characters except _.
.: Means all characters except \n.
[]: All characters enclosed in square brackets.

2.4. Logical operators

(abc): This is a group operator, i.e. the group of characters enclosed in curly braces appears as it is.
[^abc]: A set operator, which means that all characters other than those in the braces will appear.
(A|B): OR operator, meaning either A or B will appear.

3. re module functions basics

The Python re module provides a number of functions for working with regular expressions. The most commonly used functions are listed below.

The usage patterns of the functions are illustrated by the examples in Section 4.

re.match(): Checks if a regular expression pattern at the beginning of a string matches.

import re
 
pattern = r"Hello"
text = "Hello, World!"
 
match = re.match(pattern, text)
if match:
    print("Match found!")
else:
    print("No match found.")
 
# Output: Match found!

re.search(): Checks the entire string for matches to the regular expression pattern.

import re
 
pattern = r"World"
text = "Hello, World!"
 
search = re.search(pattern, text)
if search:
    print("Match found!")
else:
    print("No match found.")
 
# Output: Match found!

re.findall(): Returns all non-overlapping regular expression patterns in a string as a list of strings.

import re
 
pattern = r"\d+"
text = "I have 3 apples and 5 oranges."
 
matches = re.findall(pattern, text)
print(matches)
 
# Output: ['3', '5']

re.finditer(): Returns an iterator that matches all non-overlapping regular expression patterns in a string.

import re
 
pattern = r"\d+"
text = "I have 3 apples and 5 oranges."
 
matches_iter = re.finditer(pattern, text)
for match in matches_iter:
    print(match.group(), end=" ")
 
# Output: 3 5

re.sub(): Replaces all occurrences of the regular expression pattern in a string with the specified replacement string.

import re
 
pattern = r"apple"
text = "I have an apple. It is a delicious apple."
 
new_text = re.sub(pattern, "orange", text)
print(new_text)
 
# Output: I have an orange. It is a delicious orange.

re.split(): Splits a string based on the number of occurrences of a regular expression pattern.

import re
 
pattern = r"\s+"
text = "Hello   World!   How are you?"
 
substrings = re.split(pattern, text)
print(substrings)
 
# Output: ['Hello', 'World!', 'How', 'are', 'you?']

4. re module function examples

Now that the re module functions are familiar, let's look at some practical examples of their use:

Example 1: Find all email addresses in a text

Let's say you have a text that contains several email addresses and you want to extract them all. Here's how to do it using the re.findall() function:

import re
 
text = "Contact us at info@example.com or support@example.org for any inquiries."
email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
emails = re.findall(email_pattern, text)
 
print(emails)  # Output: ['info@example.com', 'support@example.org']

Example 2: Replace URLs in text with a replacement string

Let's say you have some text that contains multiple URLs and you want to replace them with a replacement string, such as [URL]. You can use the re.sub() function for this task:

import re
 
text = "Visit our website at https://example.com and check our blog at https://blog.example.com."
url_pattern = r"https?://[^\s]+"
replaced_text = re.sub(url_pattern, "[URL]", text)
 
print(replaced_text)  # Output: Visit our website at [URL] and check our blog at [URL].

Example 3: Extracting phone numbers from different formats

Let's extract phone numbers from different text formats using the re.finditer() function.

import re
 
text = "Call John at (123) 456-7890 or reach Jane at 987-654-3210 for more information."
phone_pattern = r"\(?\d{3}\)?[-\s]?\d{3}[-\s]?\d{4}"
phone_numbers = [match.group() for match in re.finditer(phone_pattern, text)]
 
print(phone_numbers)  # Output: ['(123) 456-7890', '987-654-3210']

Example 4: Validating a password

In this example, we will check if the password entered by the user meets the following criteria: at least one uppercase letter, at least one lowercase letter, at least one number, and at least eight characters in total. We will use re.search() for this task.

import re
 
def is_valid_password(password):
    if len(password) < 8:
        return False
    if not re.search(r"[A-Z]", password):
        return False
    if not re.search(r"[a-z]", password):
        return False
    if not re.search(r"\d", password):
        return False
    return True
 
password = "P4ssw0rd!"
print(is_valid_password(password))  # Output: True

Example 5: Splitting a string based on multiple symbols

In this example, we'll use the re.split() function to split a string based on multiple delimiters (comma, semicolon, pipe):

import re
 
text = "apple,banana;orange|grape"
delimiter_pattern = r"[,;|]"
fruits = re.split(delimiter_pattern, text)
 
print(fruits)  # Output: ['apple', 'banana', 'orange', 'grape']

5. Going a step further

Beyond the basic regex syntax and functions, there are other topics worth exploring with regex.

Lookahead and lookbehind assertions: How to register additional patterns to a given regular expression pattern.
Named Sets and Backreferences: How to use techniques similar to hierarchies and variables in regular expressions.
Non-capturing groups: A technique that uses groups to search for patterns, but ignores their values.

You'll also learn other useful techniques such as case-insensitive searching, multiline mode, and more.

6. Tips for using the re module

Here are some tips and tricks to make working with regular expressions easier:

Use raw strings (r"") in your regular expression patterns to avoid having to escape backslashes. Using backslashes for regular expressions in Python requires escaping, but with raw strings you don't have to.
Regex101 is a handy online tool for testing and debugging regular expression patterns. Once you get the hang of it, you'll use it all the time.
Creating a complex regex pattern is hard to read and hard to manage; breaking it into smaller, more manageable units makes it more readable and maintainable.
Use comments and verbose mode (re.VERBOSE) to make regular expression patterns easier to understand. It can be used in conjunction with re.compile() to add verbose comments.

7. Final thoughts

Regular expressions are a powerful tool for text manipulation and pattern matching in Python. By familiarizing yourself with the re module and understanding the basic syntax of regular expressions, you can easily handle a variety of textual tasks.

now()