Perhaps you’ve heard of regex but aren’t quite sure how it can be used in SEO or whether it fits into your own strategy.
Regular expressions, or ‘regex’, are like an in-line programming language for text searches that allow you to include complex search strings, partial matches and wildcards, case-insensitive searches, and other advanced instructions.
You can think of them as searching for a pattern, rather than a specific string of text.
Therefore, they can help you to find entire sets of search results that, at first glance, may appear to have little in common with each other.
Regex expressions are a language all their own and the first time you see one, it can look quite alien.
In this guide, you’ll learn common regex operators, how to use more advanced regex filters for SEO, how to use regex in Google Analytics and Google Search Console, and more.
You’ll find examples of regex at work in different ways in SEO, too.
What Does Regex Look Like?
A regular expression typically includes a combination of text that will match exactly in the search results, along with several operators that act more like wildcards to achieve a pattern match rather than an exact text match.
This can include a single-character wildcard, a match for one or more characters, or a match for zero or more characters, as well as optional characters, nested sub-expressions in parentheses, and ‘or’ functions.
By combining these different operations together, you can build a complex expression that can achieve very far-reaching, yet very specific results.
Common Regex Operators
A few examples of common regex operators include:
. A wildcard match for any single character.
.* A match for zero or more characters.
.+ A match for one or more characters.
d A match for any single numerical digit 0-9.
? Inserted after a character to make it an optional part of the expression.
| A vertical line or ‘pipe’ character indicates an ‘or’ function.
^ Used to denote the start of a string.
$ Used to denote the end of a string.
( ) Used to nest a sub-expression.
Inserted before an operator or special character to ‘escape’ it.
g Returns all matches instead of just the first one.
i Returns case-insensitive results.
m Activates multiline mode.
s Activates ‘dotall’ mode.
u Activates full Unicode support.
y Searches the specific text position (‘sticky’ mode).
As you can see, together these operators and flags start to build up to a complex logical language, giving you the ability to achieve very specific results across large, unordered data sets.
How Do You Use Regex For SEO?
Regex can be used to explore the queries different user segments use, which queries are common to specific content areas, which queries drive traffic to specific parts of your site, and more.
In this article, Hamlet Batista demonstrated how to use regex in Python to analyze server log files, for example.
And in this one, Chris Long showed you how to use regex to extract the position, item, and name of the breadcrumbs associated with each URL of your site as part of a scalable keyword research and segmentation process.
Here are a couple tips from SEO Twitter (you’ll notice it’s a pretty quiet hashtag – add your own examples if you have them!):
— hannes-jeremia jaacks (@HannesJaacks) December 31, 2021
— JC Chouinard (@ChouinardJC) June 17, 2021
Using Regex On Google Analytics
One of the most common uses of regex for SEO is in Google Analytics, where regular expressions can be used to set up filters so that you only see the data you want to see.
In this sense, the expression is used to exclude results, rather than to generate a set of inclusive search results.
For example, if you want to exclude data from IP addresses on your local area network, you might filter out 192.168.*.* to remove the full range from 192.168.0.0 to 192.168.255.255.
More Advanced Regex SEO Filters
As a more complex example, let’s imagine you have two brands: regex247 and regex365.
You might want to filter results that match any combination of URLs that contain these brand names, such as regex247.biz or www.regex365.org.
One way to do this is with a fairly simple ‘or’ expression:
This would remove all matching URLs from your Analytics data, including subfolder paths and specific page URLs that appear on those domain names.
A Word Of Warning
It is worth noting that – similar to your robots.txt file – a poorly written regex expression can quite easily filter out most or all of your data by including an unrestricted wildcard match.
The good news is that in many SEO cases, the filter is only applied to your data at the reporting stage, and by editing or deleting your regex expression, you can restore full visibility to your data.
You can also test regular expressions on a number of online testing tools, in order to see if they achieve the intended outcome – allowing you to ‘sandbox’ your regex expressions before you let them loose across your entire data set.
To create regex filters on Google Analytics, first, navigate to the type of Report you want to create (e.g. Behaviour > Site Content > All Pages or Acquisition > All Traffic > Source/Medium).
Below the graph, at the top of the data table, look for the search box and click advanced to display the advanced filter options.
Here you can include or exclude data based on a particular dimension or metric. In the dropdown list after you select your dimension, choose Matching RegExp and then enter your expression into the text box.
‘Or’ And ‘And’ In Google Analytics Regex
To create an ‘or’ expression in Google Analytics, just include the pipe character (the | vertical stroke symbol) between the appropriate segments of your expression.
Google Analytics regular expressions do not support ‘and’ statements within a single regex; however, you can just add another filter to achieve this.
Below your first regex, just click Add a dimension or metric and enter your next regex. In this way, you can stack as many expressions as you want and they will be processed as a single logical ‘and’ statement when filtering your data.
Using Regex In Google Search Console
In 2021, Google Search Console began supporting the Re2 syntax of regex, allowing webmasters to include and exclude data within the user interface.
You’ll find all metacharacters supported by Google Search Console in this RE2 regex syntax reference on GitHub.
At the time of writing, there is a character limit of 4096 characters (which is usually enough…).
Examples you can use in Search Console can be filtering for queries containing a specific brand and the variations users could type, such as Facebook:
Filter out users finding your website through “commercial” intent terms:
Why Is Regex Important For SEO?
Finally, why does all this matter?
Well, it’s all about taking control of your data and filtering out the parts of it that don’t help you to improve your SEO – whether that’s particular pages or parts of your website, traffic from a specific source or medium, or your own local network data.
You can create quite simple regex expressions to achieve a basic ‘include’ or ‘exclude’ filter, or write longer expressions that work similarly to programming code to achieve complex and very specific results.
And with the right regex for each campaign, you can verify that your SEO efforts are achieving your aims, ambitions, and outcomes – a powerful way to prove positive ROI on your future SEO investments.
- Google Search Console Adds New Regex Filter Options
- Google Search Analytics API Can Finally Pull Discover Data
- Advanced Technical SEO: A Complete Guide
Featured Image: Optura Design/Shutterstock