PowerShell - where, .where or Where?
There are a number of different ways to filter data in PowerShell and the options have expanded since the original release of v1.0. I thought it worth summarising them here, particularly from my experiences of attempting to convey the different choices during PowerShell training I have delivered. Typically these revolve around a dataset and the word ‘where’ used in one form or another, however….
1) Filter on the left
The number 1 rule is if possible use the filtering options of the cmdlet you are originally working with to reduce the size of the dataset. Stay as far to the left of the command as possible, i.e. the first step of a pipeline. That way you will avoid generating a dataset larger than necessary and then having to use other ‘where’ tools to filter it down.
For example, Get-WmiObject contains a filter parameter which means you can reduce the data returned in a query. So if looking at NICs with IP enabled you can be smarter with a reduced dataset, rather than piping it to Where-Object.
Note: not all Get-* cmdlets contain a filter type parameter, so you will need to check the help of the cmdlet you are using to see if it is possible.
2) Where-Object
Available since PowerShell v1.0, Where-Object is often one of the first cmdlets to learn about and enables you to take a dataset and pass it down the PowerShell pipeline for filtering.
Note: just to confuse things, sometimes Where-Object is shortened to Where, since Where is an alias for Where-Object!
Where-Object is one of the fundamentals of PowerShell, however often in classes students new to the language have struggled with the somewhat fiddly syntax of encapsulating the filter criteria in curly braces and particularly the use of $_.propertyname to refer to a property of the current object in the pipeline. So step foward……
3) Simplified Syntax
PowerShell version 3 introduced some simplified syntax for certain areas to help alleviate some of this syntax pain. (There is an excellent reference from Keith Hill ’s blog here) So from then onward it was possible to drop the curly braces and the $_ in a more SQL style form of where query, which typically seems to be a more natural way to write these things.
Note: just to confuse things again, it is possible to use the full Where-Object in this style too!
The question then became which one did you teach to students? Typically I went with the standard Where-Object first and then moved onto revealing the simplified syntax later. However, often we ended up covering it earlier because many students would write it the SQL style way without even knowing it was possible - it seemed to be the natural way they wanted to do it.
4) .where Method
Introduced in PowerShell v4 the .where method enables filtering a collection or set of properties if you do not require or want to stream data to the pipeline. Continuing our example, it is possible to do this:
Why would you want to do this? Particularly given that it is possibly even fiddlier than the original Where-Object syntax to write and we now have the even clearer simplified syntax as well. The answer is in performance. On a small dataset we are only talking milliseconds difference in pipeline vs non-pipeline:
However, this can of course be a significant difference with a large dataset. In this contrived example, 6 seconds vs 2 seconds:
Depending on what you are doing, streaming vs non-streaming may be preferable for you, so worth trying out each one in your scenario to determine the best option for you.
There where method also includes some additional options (named mode), documented here, which are quite nice.
Similar to Select-Object, there is a First and Last mode. So:
Also interesting is the split mode. Effectively splitting the collection in two parts; the first which meets the condition and the second which doesn’t.