Message Queueing with my Web Search

I haven't worked on my Web Search code recently due to a bunch of work for customers.  So, I got up early this morning and decided to integrate aynchronous operations into my Web Search with .NET code with the MS Message Queue.  So, now my basic algorithm goes something like this:

  1. Get a set of URLs that have not already been searched.
  2. Get the html contents of one of the URLs.
  3. Put the Url and it's content into Queue1.
  4. Parse the Url's content for new Urls to search.  Place the new Urls into Queue2.
  5. Repeat for all of the Urls in the set retrieve from step 1.

Queue1 and Queue2 are handled by a separate Winforms application that handle the MSMQ messages asynchronously using the BeginRecieve() method and RecieveCompleted event handler.  By using MSMQ, I could easily separate out to use multiple machines.

Important Side Information:

I have two tables to hold data.  tblSearchUrl holds the urls that will be searched at some time.  tblSearchResults holds the html content of urls.  My current hardware looks like a Dell Inspiron 8200 with P4 1.8 GHz, 1 gig of ram, a Maxtor FireWire 250 gbyte hard drive, and am using the dsl line at my hotel with is running at about 512 kbytes / sec, or something like that.

The large number of inserts into tblSearchUrl had become a major bottleneck and was slowing the whole system down.  By making the operations asynch, the bottleneck is moved from the spider application to the insert application, which is just fine by me at this time.

Where do I go from here?

Well, the next thing is to see where my next bottleneck is at.  After that, there is code cleanup.  I have probably hardcoded some things I shouldn't have and I am sure that I have some rather interesting mistakes that will show up going forward.  Sometime, I need to create a web interface.  And then I need to ..............................

Wally

1 Comment

Comments have been disabled for this content.