Regular expressions are a formatted sequence or pattern of characters that can be used in a search operation. They are written in a specific syntax and then are usually used to search for patterns in other text, or returning whether or not that text has met the pattern.
Python has a built-in module just for this, the re
module.
Using Regular Expressions Functions
Here are the functions that the re
module offers to us:
findall
: Returns a list with all the matchessearch
: Returns a Match object if a match was foundsplit
: Returns a list of the string split at every matchsub
: Substitutes all the matches with a string
Let's see these all in action.
findall Function
Use the findall()
function when you want to find all the matches you have described:
import re
example = "I pledge allegiance."
results = re.findall("le", example)
print(results)
['le', 'le']
Python found le
twice. If nothing was found, the list returned will be empty. You can take the length of this list to the number of results found.
search Function
The search()
function searches the string for a match. It returns back a re.Match
object.
import re
example = "I pledge allegiance."
results = re.search("le", example)
print(results)
<re.Match object; span=(18, 20), match='le'>
Using this re.Match
object, you can get the index of the first match, like this:
import re
example = "I pledge allegiance."
results = re.search("le", example)
print(results.start())
3
split Function
The split()
function returns the string split at every match.
import re
example = "I pledge allegiance."
results = re.split("le", example)
print(results)
['I p', 'dge al', 'giance.']
Pretty straightforward, it cuts out all the string passed in when matched, and splits the string at that point.
sub Function
The sub()
function substitutes a match with a string of your choice:
import re
example = "I pledge allegiance."
results = re.sub("le", "ABC", example)
print(results)
I pABCdge alABCgiance.
Special Sequences
In addition to string literals, you can use special sequences in your regular expressions to make them more powerful.
Here is a list of the special sequences you can use:
.
: Matches any character\w
: Matches an alphanumeric character (includes underscores)\W
: Matches a non-alphanumeric character (excludes underscores)\b
: Space between word and non-word characters\s
: Matches a single whitespace character\S
: Matches a non-whitespace character\t
: Matches a tab\n
: Matches a newline\r
: Matches a return\d
: Matches a numeric character\^
: Matches the start of a string\$
: Matches the end of a string