Ezine 188 – PowerShell’s Regex

Introduction to PowerShell Regex – Regular Expression

Typical jobs for Regex are to find patterns in text, and to replace characters or even whole words.  It’s often when numbers mix with text that confusion occurs, and that’s when you need a PowerShell script to solve the problem.  For example, telephone numbers and bank sort codes can be tricky to process because they contain dashes, or a specific grouping of numbers.  Keep in mind that it’s rare that you would use regex in isolation; therefore, my examples are designed to master this one technique so that you can incorporate pattern recognition in a bigger script.

Topics for PowerShell Regex – Regular Expression

Introducing Regex

It all started back in the days when DOS was king.  How we loved typing the wildcard * if we wanted to display all files.  What PowerShell’s regex does is refine ‘all’ so that you can filter a sub-set of data into the output.  It’s defining the subset that makes regex so potent, yet so difficult to control unless you are an expert in its logic and its syntax.

The problem for beginners is that regex has a bewildering array of whacky syntactic symbols.  As a newbie you may find that other people’s examples do not make sense, furthermore you realize that if you experiment, then one wrong character and the command will not produce the desired results.  My mission is to give you a grounding of the basic structure of regex, from there it will be over to you to employ regex to solve your particular problem.

Start with -match

Before investigating Regex, it helps if you gain experience by testing comparison operators such as, -match, -like or -contains.  Please note that my examples contain variables, while they aren’t strictly necessary, $Variables help me to identify different sections of the expression.

$Name = "Alan Thomas 1949"
$Name -match "Alan"

$Matches  # Command for built-in variable

Expected PowerShell result:
True

Note in passing, PowerShell also creates a $Matches variable which I use for troubleshooting unexpected results.

PowerShell Regular Expression Examples Using Regex

Regex::IsMatch() can be considered the same as -cmatch  (case sensitive match).  When developers work with regular expressions they prefer to work with Regex::IsMatch(), professionals say that this method is nearer the underlying PowerShell class System.Text.RegularExpressions.  However, Guy favours -match or -cmatch as they are shorter and seem to produce exactly the same results.

$Name = "Alan Thomas 1949"
[Regex]::IsMatch($Name,"Alan")

The expected result is
True

In the above example [Regex] calls for the method IsMatch().  Then it’s up to us to supply two values, the input string ($Name) followed by a comma, and the pattern to match ("Alan").  While I have chosen to control the input via a $Variable, you could simplify the expression thus:

[Regex]::IsMatch("Alan Thomas 1949","Alan")

Basic Regex Punctuation

Quotes, or "speech marks" play a key role with Regex.  In PowerShell’s regular expression constructions it does not matter if you use single or double quotes.  The only difference between the example above and the example below is the type of quotes.  However, double quotes come into their own when the PowerShell expression contains a $variable that needs expanding; single quotes would treat the $variable as a literal.

$Name = ‘Alan Thomas 1949’
[Regex]::IsMatch($Name,’Alan’)

The style of brackets is always significant in PowerShell.  IsMatch() requires the rounded parenthesis bracket.  Whereas a portion of the search string [a-z] needs a pair of square brackets.  If you see pattern matching code with {curly} brackets, they often refer to quantifiers, once again, follow the correct bracket syntax, or else you will get unexpected results.

Let us suppose we wanted to test the data for either the name Alan or Alun.  Here is a simple pattern matching example where the third letter can be either ‘a’ or ‘u’.  Another method would be to employ the period ‘.’, however that would require "Al.n", and not "Al[.]n".  Other uses of this technique are if you want to check for a number, for example [0-9].  In this case we use the dash to tell PowerShell to expect a contiguous range of all numbers from zero to nine.

$Name = "Alan Thomas 1949"
[Regex]::IsMatch($Name,"Al[au]n")

Use of + At first I could not see the point of incorporating the + symbol in regex expressions, but then I had a particular problem, some people spell their name Allan.  How could I cope with this double ll?  The answer was to insert a plus into the pattern, thus:

$Name = "Allan Thomas 1949"
[Regex]::IsMatch($Name,"Al+[au]n")

Summary, I now have a pattern that finds Alun, Alan and Allan.  Without the + it’s particularly difficult to find Allan as it contains 5 letters whereas Alan and Alun only have 4 letters.  + means 1-n matches (* means 0-n matches).  To see the power, and point of this symbol, try removing the plus. 

Backslash has several roles in regex, firstly to introduce special characteristics such as anchors like \b (word boundary).  The backslash is also used to introduce literals, for example the period ‘.’ in the IP example below.  In terms of pattern matching, \ can also be used as a escape character, for example \s means whitespace and not the letter ‘s’. 

Here is an example which employs \d (decimal) to match the basic format of an IP address, however, it does not test for numbers bigger than 254.

$ipaddress = "192.168.10.10"
$ipaddress -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

\d tests for a number (as opposed to a letter), while these curly brackets {1,3} mean containing 1, 2, or 3 digits.

Guy Recommends: The Free IP Address Tracker (IPAT) IP Tracker

Calculating IP Address ranges is a black art, which many network managers solve by creating custom Excel spreadsheets.  IPAT cracks this problem of allocating IP addresses in networks in two ways:

For Mr Organized there is a nifty subnet calculator, you enter the network address and the subnet mask, then IPAT works out the usable addresses and their ranges. 

For Mr Lazy IPAT discovers and then displays the IP addresses of existing computers. Download the Free IP Address Tracker

Matching the Beginning and End of Strings – Anchors

In many countries Thomas could be a first name or a surname, if we wanted to search for only a last name of "Thomas" then we would append the $, thus we need Thomas$.   Naturally, we have to assume that the surname would be at the very end of the data input.

$Name = "Alan Thomas"
[Regex]::IsMatch($Name,"Thomas$")

To learn more about anchors, compare the above example with $Name = "Thomas Smith", or my original example, $Name = "Alan Thomas 1949"

Alternatively, if wanted to search specifically for a Christian name (first name) of Thomas, then we employ the ^ caret symbol.  For example, ^Thomas.

Other anchors include \b matches on word boundary, beginning or end of a word.

Advanced Search Techniques

$strText = "The man called Mr Grey wore the big red coat."
$Pattern = "the"
$matched = [regex]::matches($strText, $pattern)

"Result of using the match method, we get the following:"

$matched |format-table index, length, value -auto

Note: This example will only find one instance of ‘the’.  To make the search case insensitive try introduce (?i) before ‘the’, thus: $Pattern = "(?i)the".  As a result you should find two instances of ‘the’.

Alternative Match Techniques

The purpose is to check that you have a single block of text with no spaces.

$Block = [regex]"^[A-Za-z0-9]*$"
$Block.match("GuyThomas") | `
format-table Value, success, Length -auto

The block [A-Za-z0-9] caters for upper [A-Z] and lower case [a-z] letter and numbers [0-9].

Using Regex to Replace Text

$strText = "The man wearing the gray overcoat"
$Pattern = "Gray"
$New = "Grey"
$strReplace = [regex]::replace($strText, $pattern, "$New")
"We will now replace $Pattern with $New :"

$strReplace

The key command here is replace, as in [Regex]::replace.  Observe how replace has three arguments, the input text, the pattern to search for and finally, the pattern to replace.

Notice in passing that because we employ the double quotes PowerShell expands the variables $Pattern and $New.

Resources

  • Cheat Sheet
  • At the PowerShell command line try:
    help about_Regular_Expression

Guy Recommends: Tools4ever’s UMRAUMRA The User Management Resource Administrator

Tired of writing scripts? The User Management Resource Administrator solution by Tools4ever offers an alternative to time-consuming manual processes.

It features 100% auto provisioning, Helpdesk Delegation, Connectors to more than 130 systems/applications, Workflow Management, Self Service and many other benefits. Click on the link for more information onUMRA.

More Examples of the Regex Family

  • Regex.Regex(String, RegexOptions) class constructor.
  • Regex.Split(String, String, RegexOptions) method.
  • Regex.IsMatch(String, String, RegexOptions) method.
  • Regex.Match(String, String, RegexOptions) method.
  • Regex.Matches(String, String, RegexOptions) method.
  • Regex.Replace(String, String, String, RegexOptions) and also:
  • Regex.Replace(String, String, MatchEvaluator, RegexOptions) methods.

Summary of Microsoft PowerShell Regex

Regex is bigger and better than the old DOS * wildcard.  The only problem is that the increased ability to control regular expressions brings greater complexity for the beginner.  As ever, my advice is to start slowly, choose a simple example and then build on success.  The key to mastering regex is to understand the syntax.

If you like this page then please share it with your friends

 


See more Microsoft PowerShell tutorials

PowerShell Tutorials  • Methods  • Cmdlets  • PS Snapin  • Profile.ps1  • Exchange 2007

Command & Expression Mode  • PowerShell pipeline (|)  • PowerShell ‘where‘  • PowerShell ‘Sort’

Windows PowerShell Modules  • Import-Module  • PowerShell Module Directory 

If you see an error of any kind, do let me know.  Please report any factual mistakes, grammatical errors or broken links.