Follow me on Twitter at Twitter.com/wbm
FYI, I'm blogging most of my stuff over at More Wally now.
You might want to add my rss feed to your reader at:http://morewally.com/cs/blogs/wallym/rss.aspx
My next bottleneck on Web Search with .NET: The Mysterious ThreadPool - Wallace B. McClure

Wallace B. McClure

All About Wally McClure - The musings of Wallym on .NET, Sql, ASP.NET, and other crazy shenanigans

News

Personal Blog

Work Blog

.NET

Book Authors

Business

Family

Friends

Georgia Tech Bloggers

Personal

My next bottleneck on Web Search with .NET: The Mysterious ThreadPool

When I started mapping this “Web Search with .NET“ project out, I figured that using the threadpool was the right thing to do.  It made a lot of sense in concept.  “Doing a small amount of work that doesn't depend on anyone else” sounds like the right thing to use the ThreadPool for.  Well, let's look underneath the covers.  The ThreadPool is limited by design in .NET to 25 threadpool threads per process per CPU and there is nothing that you can do to change it (except use your own ThreadPool, as several have pointed out including my buddy Scott Sargent).  Now, what happens when you use an object within a ThreadPool thread that itself uses the ThreadPool?  You get an error, that's what you get.  Yes boys and girls, using an object within the ThreadPool, that itself uses the ThreadPool is a bad idea.  In my case, I used the WebClient object and got an error back from it that said that there were not enough ThreadPool threads to complete the request.  Well, that was bad.  Thanks to Dave Wanta for suggesting to use Async Sockets for this.  He said it uses the IO Completion Ports, which have 1000 available in .NET. 

I believe that another problem with my design is that I have only 1 dispatch command that feeds individual Urls to threadpool threads.  I am thinkiing it would be a good idea for each threadpool thread to get its own set set of Urls and to iterate through them separately and not depend on the dispatcher for the Urls.

I am looking into moving away from using the ThreadPool and using entirely managed threads.  I plan on moving to the Async Sockets first to see if that helps to resovle the problem first.

I am planning on posting info about these changes over the coming days as I make solid progress.  My current goal is to fill up my network connection as much as possible ( preferably 100%, though I don't think that is truly possible).

Wally

Comments

TrackBack said:

Do you want control over the number of threads used by ThreadPool. Checkout the ManagedThreadPool class.
# January 8, 2004 12:03 PM

Scott Galloway said:

You may want to take a look at Mike Woodring's Custom Threadpool...http://staff.develop.com/woodring/dotnet/
# January 8, 2004 1:19 PM

Wally said:

Thanks Scott and Scott.

Wally
# January 8, 2004 1:40 PM

Duncan Godwin said:

You might also want to look at the <system.net/connectionManagement> config element, as the max remote connections you can make to a specific site is 2 by default. Which you can override per host. Just thinking your requests might be backing up, waiting for connections to finish.
# January 8, 2004 4:22 PM

TrackBack said:

# January 22, 2004 10:04 PM

TrackBack said:

This post points out a problem you can run into when using the ThreadPool and a HttpWebRequest or WebClient for example. It offers a very simple solution based on managed thread pool.
# April 6, 2005 3:51 AM

Weblog said:

As I was first building my Web Spider, i figured that the easiest thing to build the spider with would...

# October 3, 2006 2:07 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)