Writing a Regular Expression with Regulazy
The first tool we’ll need is Regulazy from Roy Osherove - available at Roy’s blog. Regulazy allows you to enter text that you want to filter on and then build a Regular Expression (Regex) by substituting the text that may change with matching characters.
For example, say you get a lot of spam with “Viagra is sold here for cheap” in the subject. Unfortunately, spammers are pretty smart, so they’ll mess up basic filters by adding numbers or symbols into the phrase. This fools the filter, but doesn’t fool you.
As an example, you may receive spam with the following subject lines based on the one listed above:
- V1@gra is sold here for ch3ap
- V!agra 1s sold h3r3 for cheap
- V!@gra sold here for cheap
There are a lot of variations on that simple structure. You could attempt to come up with each variation and type it into your spam filter, but as soon as the spammers change a single character that you don’t have a case for, your filter will need updating.
Instead of dreaming up variations, you can use regular expressions to help build a framework for filtering. Regulazy is the tool that will help you out.
In order to use Regulazy you’ll need to have the Microsoft .NET Framework 2.0 installed – available here.
Writing the Expression
- Unzip Regulazy and fire it up.
- Using our example above, we’ll enter “Viagra is sold here for cheap” in the top text box.
- Click the Regex Edit button and click Yes on the prompt.
- The expression will now be displayed in the bottom window. This is about as basic as it gets. You could copy the Regex feed it into your spam filter and any time “Viagra is sold here for cheap” pops up, it’ll be blocked. Obviously that’s not too exciting, so now we’ve got to get creative. See figure 1.
- Using the example variations above, we know we see Viagra spelled a few different ways – “V1@gra”, “V!agra” and “V!@gra”. In this example, we know the expression will always start with V and end with “gra”. We can highlight the “ia” and right click on the selection.
- As you can see from Figure 2, we have a few different options. In the examples above, we know the 2nd and 3rd letters will be a letter, number or symbol. In this case, we’ll want to click “2 anything”. This means that for our first word “Viagra”, as long as there is a “v” followed by two characters and then “gra”, it will match the expression.
- Moving on to the next word – “is”, we can see from the examples that sometimes “is” is spelled with a number or isn’t there at all. In this case, we’ll highlight the word “is” in the top window, right click and select “0 or more anything” meaning that there may not be any text or there could be.
- We can repeat step six and seven for the rest of the phrase substituting the “e” in the word “here” and the “e” in the word “cheap” for “1 letter/number”.
- Once you’re finished, you should have an expression that looks like that in Figure 3.
Now that we have the expression written, we can do some testing and update our SPAM filter with our new expression. These topics will be covered in the second part of this series.
This post is part of the series: Free SPAM Filtering using Regular Expressions
This series will walk you through creating a regular expression to filter SPAM as well as walking you through testing the expression and implementing it into a popular FREE Spam filter - Spamihilator.