A Beaten and Burnt 6 Year Old or a Sexually Molested 6 Year Old?

I woke up to news of a six year old who had attempted suicide 3 times so far and had the most sorrowful stare. Her eyes and general demeanor looked and felt so sad. She lived with her very abusive…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Optimising Microsoft Graph PowerShell scripts

We all have probably been there and developed a PowerShell script that took some fair amount of time until the execution completed, weren’t we? Of course one could argue and say that as long a script ‘works’ it is good enough but depending on the use case and environment a PowerShell script that runs 30 to 60 minutes exceeds the patience of most (IT) people and can also lead to increased costs. But what makes those kinds of scripts that awfully slow and can’t we just tweak them to run faster?

I would say I have already read and (tried to) understand hundreds of PowerShell scripts out there and I often notice the following flaws which lead to poor performance:

Slow algorithms are extricably linked to the understanding of data structures when it comes to PowerShell scripting. Here the top two cases I often observe:

Where-Object

Where-Object queries within a loop looking for a particular element in a collection can be very time consuming when we actually expect a single result or match.

This pattern can be avoided by creating a hash table and checking whether the hash table contains the key we’re looking for and then directly accessing the element.

The creation of the hash table can be easily delegated to the Group-Object cmdlet. The specified property will then become the key to the hash table.

Voilà, you just improved the time complexity of that Where-Object snippet from O(n^2) (worst case with linear search for n elements with n possibilities) to O(n) (still iterating over n elements but the hash table lookup has O(1) complexity) by using a hash table. Don’t get scary by that O notation, it’s just an estimate for the runtime behaviour of algorithms and when the expression of O gets smaller, the code get’s faster.

Array copy

You like to write$myArray += $someObject ? Of course you can do so but an array has a fixed size and when adding an element the whole array needs to be copied in each iteration and this is not optimal from a performance perspective.

PowerShell inherits all the different collection types from C# and we can use them to have more performant collections like lists. Lists and half dynamic data structures were designed to handle additions and removals of elements and perform better than frequent array copy operations.¹

Please also make sure that you do not use the deprecated System.ArrayList type anymore. It has not only that annoying return type of boolean on each add but should also not be used anymore. The System.Collections.Generic namespace provides us a List implementation that we can and should use!²

A bad design approach would be to first query all users of a tenant and then querying the manager for each user individually within a loop. Imaging an Azure Active Directory tenant with 1000 users, this approach would lead to 1 + 1000 API requests (assuming that we have a page size of 1000 and only need one request to fetch a list of all users).

To prevent bad architecture and design of scripts it’s essential to have profound knowledge of the API and even more important that the API documentation provides the information about the different endpoints.

A common flaw of RESTful APIs is under- and/or over-fetching. Resulting in that we receive to much data (we might not need) or that we receive to less data and need to make multiple requests to actually retrieve the data we are interested in. Microsoft Graph API supports:

When it comes to filter and select operations we want to have them as early as possible (this approach is sometimes referred to as filter left) within the process, ideally already when placing the API request.

So instead of:

You should prefer to do:

Of course this only makes sense if we do not require to have the full list of users available. Because if we require all users it is probably faster to fetch initially all users and then processing them but you still could optimise your query by using the select parameter and only fetching the attributes you need. For certain operations Microsoft Graph also provides export operations to bigger amounts of data.³

Aaaaaand if you made it that far — here’s where I actually want to set the focus of this post! Because after we have elected the right algorithms and data structures for our script and have the right design in place we have still one of the most time consuming parts in place: Input and output, also referred to as I/O.

Console output

I/O can be output to the console or to a file:

API Requests

Each Microsoft Graph API request is transmitted as an HTTP request to the API. This involves multiple components such as your machine (PowerShell, operating system, hardware), network and Microsoft to process your request and send you a response. Sounds obvious but the more requests you place the slooooooooower your scripts get. I count this also towards I/O.

So once you nailed all of the previously described points we can try to optimise or tune Microsoft Graph requests within the next chapter.

Assuming we have a script that requires a lot of requests to Microsoft Graph resources we can try to optimise the process to save some runtime.

Before I started to play around with optimisations I needed a baseline to measure the success of my adjustments. In my dev tenant I have 1280 fake-users that I fetched at the beginning of my testing. For subsequent tests I will simply place a request to each users details and measure only this part, as there is no possibility to improve the above stated initial request.

Microsoft Graph offers a feature called batching / batch requests. Multiple requests can be combined into a single request and the whole batch can be submitted to the APIs batch endpoint via POST. We then receive a response containing all the batch results. It is even possible to create batches with dependent actions but in my case I just used the simple example of individual requests for users.

A maximum of 20 requests can be placed into a single batch. The response of the batch needs then to be reassembled again to have a full list of responses.

Thanks to batching we can reduce the number of requests from 1280 individual requests to 1280 / 20 = 64 requests. Before even taking measurements this approach sounds very promising as we can drastically reduce the amount of requests compared to the amount of the original requests.

A little bit of a downside is, that with the added logic for the batching our code fragment becomes slightly bigger and is more difficult to understand. Especially the part where we create the chunks with the individual requests and merge the responses together.

With PowerShell 7 we have new parameter for the ForEach-Object cmdlet. We can supply the -Parallel switch to leverage concurrency⁵. So each script-block will run on a separate thread and by default 5 threads will be spawned.

Writing concurrent code or scripts is actually easy but writing concurrent code or scripts that work is difficult! Concurrent operations run in no particular order (non-deterministic) and we are responsible to use types and collections that are thread safe. This means we cannot use a regular list and need to switch to thread safe collection⁴ like System.Concurrent.ConcurrentBag that can handle multiple threads modifying the collection.

Compared to the regular approach PowerShell does not need to wait until a single request has completed and can now place them in parallel and store the results within our collection.

Nice is also that we only need small change by using the Parallel parameter and using the thread-safe collection. The downside is that this requires PowerShell 7 but with the Requires statement we can document this dependency in an understandable way.

By combining the previous approaches we can leverage both batching and concurrency features to submit and reassemble our batch requests to the API. But the work can be shared amongst multiple threads and should be hopefully processed faster.

The snippet has now a serial part where the individual batches are created and a parallel/concurrent part where the batches are submitted and reassembled. Compared to the baseline the snippet has become quite large and the intention of the code is probably not that self explanatory anymore. So another side effect of optimisation: It can make your code more difficult to read.

Difficulties of Microsoft Graph PowerShell SDK, Batch Requests and Concurrency

I measured all the different code snippets with the built-in Measure-Command cmdlet to get the amount of elapsed time for each script-block. And I was actually very impressed by the results! As mentioned in the beginning I was doing my tests with a little bit over 1'200 individual requests.

I was already familiar with the batching approach but combining this with PowerShell 7 concurrency is a real game changer. Bringing down the runtime from more than 3 minutes to just around 5 seconds is very promising. Imaging this at a scale like for 10'000 this is very promising.

Interesting is also the performance gain compared between batching and concurrency although this could be further tweaked as I only used the default setup of 5 threads.

Maybe an additional factor that comes into play when making a lot of requests is throttling. And because I was using the Microsoft Graph PowerShell SDK which has built-in support for throttling I actually do not know how many of the requests have been throttled but maybe this is also a good thing because you will face that in production as well. But for the comparison it could be that some of the requests were throttled and others not leading to a false result (although I could not find evidence for that during multiple runs).

Of course the absolute results from my measurements will probably not translate to other machines or environments as these depends on a lot of factors such as: hardware, os, network, request types. But the calculated speedup should be reachable with similar setups.

I hope you learned one or another thing about PowerShell and Microsoft Graph and that you can now write faster PowerShell scripts although this does not necessarily mean that these scripts will be written faster because optimisation requires time and measurements 🪩😉.

²Lists have also overhead but this is neglectable for bigger amounts of data. So do not compare an array copy for two elements with a list. :)

Thanks also to everyone for the positive notes on my recent tweet.

Add a comment

Related posts:

A Walk in the Woods

A photo journal of sights during my walk through the woods.

When Winter Turns to Spring

At seventeen years old, I visited a small Christian college in Los Angeles. I took it upon myself to walk into one of their buildings which offices a renowned psychology doctoral program. I hadn’t…

How Training Lab Can Streamline Remote Employee Development

Remote work has become a norm for many companies in recent times, and this has brought about new challenges for employee training. In-person training sessions are no longer popular, and it can be…