In This Article
RegEx in Python is mostly used for pattern matching and text manipulation. RegEx (Regular expression) allows us to define patterns that match specific characters’ sequences within strings. This makes it useful for tasks like data validation, text search and replacement, and data extraction.
Python provides a built-in module called re
that enables us to work with regular expressions.
To start using RegEx in Python, we need to import the re
module:
import re
Once the module is imported, we can use various functions provided by the re
module to work with regular expressions.
Using the re.search() Function of RegEx in Python
One of the most commonly used functions is re.search()
, which searches a string for a pattern match:
import re
text = "Hello, World!"
pattern = r"World"
match = re.search(pattern, text)
if match:
print("Pattern found")
else:
print("Pattern not found")
Here, we define the pattern
as the string “World” using the r
prefix to indicate a raw string. The re.search()
function is then used to search for the pattern within the text
string. If a match is found, the match
object is returned, which indicates the presence of the pattern.
Metacharacters in Python
Regular expressions support a wide range of special characters and metacharacters that allow us to define complex patterns. Some commonly used metacharacters include:
.
: Matches any single character except a newline.*
: Matches zero or more occurrences of the previous character or group.+
: Matches one or more occurrences of the previous character or group.?
: Matches zero or one occurrence of the previous character or group.[]
: Defines a character class, and matches any single character within the brackets.|
: Acts as an OR operator, matches either the expression before or after the pipe symbol.^
: Matches the beginning of a string.$
: Matches the end of a string.
Example:
import re
text = "Hello, 123 World!"
pattern = r"\d+"
matches = re.findall(pattern, text)
print(matches) # Output: ['123']
Here, the pattern \d+
is used to match one or more digits. The re.findall()
function returns a list of all matches found in the text
string.
Besides re.search()
and re.findall()
, the re
module provides other useful functions like re.match()
(matches the pattern only at the beginning of the string), re.sub()
(substitutes matches with a replacement string), and more.
Match Objects in Python
In Python’s re
module, when we use functions like re.search()
or re.match()
to perform regular expression matching, they return a match object that provides information about the search results.
A match object represents a successful match and contains various methods and attributes to access the matched content and additional details.
Let’s explore some of the commonly used methods and attributes of match objects:
group()
: This method returns the actual string that matches the pattern. By default, callinggroup()
without any arguments returns the entire match. We can also provide a group number or a group name as an argument to retrieve a specific captured group.
import re
text = "Hello, World!"
pattern = r"(\w+), (\w+)"
match = re.search(pattern, text)
print(match.group()) # Output: Hello, World!
print(match.group(1)) # Output: Hello
print(match.group(2)) # Output: World
start()
: This method returns the starting index of the match in the original string.
import re
text = "Hello, World!"
pattern = r"World"
match = re.search(pattern, text)
print(match.start()) # Output: 7
end()
: This method returns the ending index (exclusive) of the match in the original string.
import re
text = "Hello, World!"
pattern = r"World"
match = re.search(pattern, text)
print(match.end()) # Output: 12
span()
: This method returns a tuple containing the starting and ending indices of the match.
import re
text = "Hello, World!"
pattern = r"World"
match = re.search(pattern, text)
print(match.span()) # Output: (7, 12)
groupdict()
: This method returns a dictionary containing the named groups of the match. It is useful when you use named groups in your regular expression pattern.
import re
text = "John Doe, 30 years old"
pattern = r"(?P\w+) (?P\d+) years old"
match = re.search(pattern, text)
print(match.groupdict()) # Output: {'name': 'John', 'age': '30'}
These are just a few examples of the methods and attributes available in match objects. We can refer to the Python documentation for the re
module to explore more options and details about working with match objects. Match objects are essential when we need to extract specific information from strings, validate patterns, or perform complex text manipulation using regular expressions in Python.
Do you want to learn Python coding, learn from a Python tutor!