I hit upon a class called ‘Parallel’ in the ‘System.Threading.Tasks’ namespace and found it worth a mention on my blog.
This blog is on the same lines as the Parallel LINQ – PLINQ article I wrote last month – performing tasks in parallel to make the entire activity more efficient. But there are some differences. Let’s just get to know this class first.
So, here’s my requirement – pull the ‘title’ element of a bunch of websites. The normal way would be to loop through the list of url’s in a for loop (recall that this will be done synchronously). Of course, we’re doing it another way, if not this article would not have existed.
Following is my set up:
The ReadTitleFromUrl method is the heart of the operations - download the web request as a string, parse the title and add display the details. Just a note on the ‘messy’ GetTitle method, I had to do it this way because the returned download string was not compliant to XML rules (so could not load it into any kind of xml readers). Now comes the ‘magic’ method.
Read the method as: For each of the url in the list, run the ReadTitleFromUrl() method, in-parallel. Now, one thing to note is that since these tasks are done in parallel, there’s no guarantee for the order of the return values. This is similar to the behavior of PLINQ resultset.
On this run, I get the following output on my Core 2 Duo.
You see the order is a little messed up.
As for the timing issues, the average for Parallel tasks came up to be 3880ms and while using a ‘for’ loop, I get 5957ms. We clearly have a winner.
Digging more detail on this, I found out there’s something called Task Parallel Library dedicated to allow applications be more productive by making them work in parallel. TPL makes use of all the processes available on your machine. It handles the partitioning of the activity into smaller tasks, state management of the threads and other inner details. TPL is suited for both I/O bound operations as well as CPU bound partitions of work.
So then how is PLINQ different? You can control the number of threads in PLINQ, whereas the Parallel class manages this for you. Also, when you have an activity that is mix of I/O and CPU bound operation, Parallel seems to be the choice.
The code used in the blog can be downloaded from here.
1. ‘Parallel’ is always faster – false
2. Not all looping is suitable for parallel processing
3. Parallelism adds complexity to your application
4. Larger the size of the activity, greater the benefit of parallel processing