Tutorial

RegEx in Python

4 min read

RegEx in Python is mostly used for pattern matching and text manipulation. RegEx (Regular expression) allows us to define patterns that match specific characters’ sequences within strings. This makes it useful for tasks like data validation, text search and replacement, and data extraction.

Python provides a built-in module called re that enables us to work with regular expressions.

To start using RegEx in Python, we need to import the re module:

import re

Once the module is imported, we can use various functions provided by the re module to work with regular expressions.

Using the re.search() Function of RegEx in Python

One of the most commonly used functions is re.search(), which searches a string for a pattern match:

import re

text = "Hello, World!"
pattern = r"World"

match = re.search(pattern, text)

if match:
    print("Pattern found")
else:
    print("Pattern not found")

Here, we define the pattern as the string “World” using the r prefix to indicate a raw string. The re.search() function is then used to search for the pattern within the text string. If a match is found, the match object is returned, which indicates the presence of the pattern.

Metacharacters in Python

Regular expressions support a wide range of special characters and metacharacters that allow us to define complex patterns. Some commonly used metacharacters include:

  • .: Matches any single character except a newline.
  • *: Matches zero or more occurrences of the previous character or group.
  • +: Matches one or more occurrences of the previous character or group.
  • ?: Matches zero or one occurrence of the previous character or group.
  • []: Defines a character class, and matches any single character within the brackets.
  • |: Acts as an OR operator, matches either the expression before or after the pipe symbol.
  • ^: Matches the beginning of a string.
  • $: Matches the end of a string.

Example:

import re

text = "Hello, 123 World!"

pattern = r"\d+"
matches = re.findall(pattern, text)

print(matches)  # Output: ['123']

Here, the pattern \d+ is used to match one or more digits. The re.findall() function returns a list of all matches found in the text string.

Besides re.search() and re.findall(), the re module provides other useful functions like re.match() (matches the pattern only at the beginning of the string), re.sub() (substitutes matches with a replacement string), and more.

Match Objects in Python

In Python’s re module, when we use functions like re.search() or re.match() to perform regular expression matching, they return a match object that provides information about the search results.

A match object represents a successful match and contains various methods and attributes to access the matched content and additional details.

Let’s explore some of the commonly used methods and attributes of match objects:

  • group(): This method returns the actual string that matches the pattern. By default, calling group() without any arguments returns the entire match. We can also provide a group number or a group name as an argument to retrieve a specific captured group.
import re

text = "Hello, World!"
pattern = r"(\w+), (\w+)"

match = re.search(pattern, text)

print(match.group())       # Output: Hello, World!
print(match.group(1))      # Output: Hello
print(match.group(2))      # Output: World
  • start(): This method returns the starting index of the match in the original string.
import re

text = "Hello, World!"
pattern = r"World"

match = re.search(pattern, text)

print(match.start())       # Output: 7
  • end(): This method returns the ending index (exclusive) of the match in the original string.
import re

text = "Hello, World!"
pattern = r"World"

match = re.search(pattern, text)

print(match.end())         # Output: 12
  • span(): This method returns a tuple containing the starting and ending indices of the match.
import re

text = "Hello, World!"
pattern = r"World"

match = re.search(pattern, text)

print(match.span())        # Output: (7, 12)
  • groupdict(): This method returns a dictionary containing the named groups of the match. It is useful when you use named groups in your regular expression pattern.
import re

text = "John Doe, 30 years old"
pattern = r"(?P\w+) (?P\d+) years old"

match = re.search(pattern, text)

print(match.groupdict())   # Output: {'name': 'John', 'age': '30'}

These are just a few examples of the methods and attributes available in match objects. We can refer to the Python documentation for the re module to explore more options and details about working with match objects. Match objects are essential when we need to extract specific information from strings, validate patterns, or perform complex text manipulation using regular expressions in Python.

Do you want to learn Python coding, learn from a Python tutor!