My next bottleneck on Web Search with .NET: The Mysterious ThreadPool

When I started mapping this “Web Search with .NET“ project out, I figured that using the threadpool was the right thing to do.  It made a lot of sense in concept.  “Doing a small amount of work that doesn't depend on anyone else” sounds like the right thing to use the ThreadPool for.  Well, let's look underneath the covers.  The ThreadPool is limited by design in .NET to 25 threadpool threads per process per CPU and there is nothing that you can do to change it (except use your own ThreadPool, as several have pointed out including my buddy Scott Sargent).  Now, what happens when you use an object within a ThreadPool thread that itself uses the ThreadPool?  You get an error, that's what you get.  Yes boys and girls, using an object within the ThreadPool, that itself uses the ThreadPool is a bad idea.  In my case, I used the WebClient object and got an error back from it that said that there were not enough ThreadPool threads to complete the request.  Well, that was bad.  Thanks to Dave Wanta for suggesting to use Async Sockets for this.  He said it uses the IO Completion Ports, which have 1000 available in .NET. 

I believe that another problem with my design is that I have only 1 dispatch command that feeds individual Urls to threadpool threads.  I am thinkiing it would be a good idea for each threadpool thread to get its own set set of Urls and to iterate through them separately and not depend on the dispatcher for the Urls.

I am looking into moving away from using the ThreadPool and using entirely managed threads.  I plan on moving to the Async Sockets first to see if that helps to resovle the problem first.

I am planning on posting info about these changes over the coming days as I make solid progress.  My current goal is to fill up my network connection as much as possible ( preferably 100%, though I don't think that is truly possible).

Wally

2 Comments

  • Thanks Scott and Scott.



    Wally

  • You might also want to look at the <system.net/connectionManagement> config element, as the max remote connections you can make to a specific site is 2 by default. Which you can override per host. Just thinking your requests might be backing up, waiting for connections to finish.

Comments have been disabled for this content.