Archives / 2004 / January
  • Intel 64-bit processing

    As a followup to my post about Intel's x64 bit x86 plans, has posted an article about an upcoming Intel demo.

    Why am I so interested in this stuff?  Well, before I turned to software, I was very interested in the design and development of chips at the transistor level.  I have a BS and MS in Electrical Engineering from Georgia Tech, where I specialized in micro-electronics, VLSI design, and digital signal processing. 

    I was talking with Codeboy this afternoon about this bit (no pun intended) of news.  The thing that a lot of these companies miss when they attempt to drive a new CPU into the marketplace is that they really need high volume acceptance.  To get that,  they have to go after the whitebox style crowd and they have to have applications that run on those systems (Windows, Office, and Development tools).  I have watched PowerPC, Alpha, MIPS, and now IA64 make that same mistake.  Oh well, what's a few billion here and there.........................


  • More on 64 bits with Intel and AMD

    I just saw an article that leads me to believe that Intel is going to come out with a 64 bit extension for the 32-bit Intel x86 architecture.

    Intel President and Chief Operating Officer Paul Otellini on Wednesday said the world's largest chipmaker would likely give its 32-bit microprocessors an upgrade to 64 bits once supporting software becomes available.

    "You can be fairly confident that when there is software from an application and operating system standpoint that we'll be there," Otellini said, responding to a question about 64-bit technology, in an interview with a Wall Street analyst that was broadcast over the Web.

    Sounds like Intel will come out with something that is binary compatible with the AMD 64 bit extensions.  Hmmm, IA64 looks more and more like IBM Microchannel every day.


  • Scott Mitchell Articles on MSDN

    I had a chance to sit down and read Scott Mitchell's articles that are on MSDN regarding data structures in .NET.  First off, I like the content of them.  While I have been programming professionally for 14.5 years, I have a BS and MS in Electrical Engineering, not in Computer Science.  As a result, I sometimes miss certain basic items.  It was good to read the info in the articles.  Secondly, I like the fact that he spent some time focusing on algorithms and how long operations take.  Algorithms and the amount of time spent solving a problem is an area that is very near and dear to my heart as I see a lot of programmers implementing algorithms that are sub-optimal.

    Article 1.

    Article 2.


  • Why Microsoft .NET Presentation

    Just thought I would share with everyone a Microsoft PowerPoint Presentation that I did a few months ago regarding what Microsoft .NET is to me and why a company should be interested in it.  While it doesn't preach the Web Services Everywhere Manifesto I hear many people say, it does seem to hit the major issues that organizations have. 


    • This is not a Developer Oriented Talk.  This is a talk geared towards technical managers and folks making the technology direction and purchasing decisions.
    • This is not a marketing talk in that I am not attempting to solicit business from you.  This is merely a presentation that I did that was very well recieved by a group of technology managers from different companies in Oak Ridge, TN at an organization called Tech2020.

    The part of the talk that got the most interest and feedback was one bullet point that stated that you no longer needed separate VB/GUI team and WEB/ASP team due to the nature of using ASP.NET with Visual Studio produces a similar design and development environment as a VB/Delphi like GUI development environment.  I had several people come upto me afterwards and say that they were pleasantly surprised that I talked as more about how .NET could effect their developer organizations and business as oppossed to talking about the bits-n-bytes of a “cool“ technology.

    Presentations on our site.

    More presentations will be posted.  These are much more technical in nature.



  • Server Side Cursors

    One of the things that I always found interesting when looking at someone elses code in Classic ADO 2.x was the number of developers that misused the all of the cursor and locking options with a Classic ADO RecordSet when running against Sql Server.  There are some good situations where there is a need for a scrollable updatable server-side cursor, but I would say about 50% of the time that I see one, it is not necessary.  Well, with .NET Whidbey, ADO.NET will have the a scrollable updatable server-side cursor in the framework.  The advantage to the .NET version will be that it is not directly associated with the SqlDataReader or the SqlDataAdapter, so it will be harder to misuse.  This is unlike the situation with Classic ADO 2.x where creating a scrollable updatable server-side cursor was one of several options within the recordset object. 

    Warning: Ideally, you would only want to use scrollable updatable server-side cursors when doing programming directly within the database (such as Yukon).  So, don't try this at home without a trained expert standing by........

    Here is an example of some code that I wrote to use the SqlResultSet, which is the name of the object that provides the scrollable updatable server-side cursors.  Note that the SqlResultSet is created by a call to the SqlCommand object.  I also thought it interesting that you can't get the number of records back, merely whether or not there are records.  The .HasRows property is good enough for me.

    SqlConnection sqlCn = new SqlConnection(strCn);
    SqlDataAdapter sqlDa = new SqlDataAdapter(strSql, sqlCn);
    SqlCommand sqlCm = new SqlCommand();
    SqlResultSet sqlRs;

    sqlCm.Connection = sqlCn;
    sqlCm.CommandText = strSql;
    CommandType = CommandType.Text;
    sqlRs = sqlCm.ExecuteResultSet(
    if ( sqlRs.HasRows == true )
    //do something.......

    Additional info on the SqlResultSet object.



  • Managed Threads (MT) vs. ThreadPool Threads (TP)

    As I was first building my Web Spider, i figured that the easiest thing to build the spider with would be the TP.  So based on my previous ramblings, I was disappointed by the fact that the WebClient also used the TP to retrieve its results, even when used in a synchronous fashion.  This effectively cut my possible performance in half.  Add to this the fact that the TP in .NET only supports 25 threads per cpu at any one moment and I was double frustrated.  The result was that I could only fire up 12.5 threads per cpu on my development system.  I just knew that if I could just switch to managed threads, I would be able to pull in 25 threads per cpu (based on the WebClient in System.NET).  While I am also constrained by the bandwidth at my office, I knew that the addition of more threads would allow me to “smooth” in the waves when the TP version wasn't able to access the network due to other work that was going on.  I just knew that I could outsmart the TP scheduling mechanism that will only allocate a specific number of threads based on system resources.

    Given the above, I worked this morning on implementing MTs.  I got through my bugs, set the system to run with 20 MTs, hit the start button, and watched with excitement as....................the performance of the app went in the toilet compared to using the TP.  How in the world could this happen?  Well, as I scaled back the number of threads, I watched performance increase.  It appears that the fact that I had too many threads attempting to run across my limited bandwidth was causing too many problems.  It looks like, given the amount of bandwidth at my office that 8 MTs is the appropiate number of threads.  Maybe I am not smarter than the TP manager in .NET..............

    Just remember folks, throwing more threads at a solution does not make that solution run better. 


  • Full-Text Search with Yukon

    While I as sitting here fiddling with things this evening, I decided to do a little test to see just how well full-text search worked in Yukon.  Man, I was blown away.  Granted I don't have millions of rows in my table to search through, but I do have a system with about 70,000 rows setup for full-text search and am adding them at the rate of about 40 URLs per minute (hey I am bandwidth constrained at my office).  I decided to do a full-text lookup for 'President Bush' on the Yukon database while simultaneously runing the spider and having already setup a full-text index in Yukon.  In just a few seconds, I got about 200 rows back.  Doing this same test on my other system that is running Sql Server 2000 and is taking hours to build the full-text index resulted in a query that took several orders of magnitude longer to search a table with about 700,000 rows in it.  Now, I realize that this is not a fair comparison for several reasons.  Once the full-text index is built on my Sql Server 2000 system, I am going to run a comparison.  No, I am not going to post the complete results.

    Here is the sql statement I ran:

    select * from tblSearchResults where contains(SearchText, '"President Bush"')

    Full-text search with Yukon.


  • AMD financial uptick

    Given the fact that AMD has made a pretty good financial uptick, I can't help but think of Intel's IA64 technology as the IBM Microchannel of the 2000s.  I view Itanium as a dragster car and the AMD64 family as a four-door sportscar.  Which is better to go to the grocery store in?  And this is from someone that used to tow the Intel party line when I worked at Coca-Cola in the IT department.  Of course, Coca-Cola invested pretty heavily in IBM microchannel systems........


  • Size Limits of Sql Server Indexed columns

    FYI, there is a limit on the size of a column that can be used for an index with Sql Server (and I assume other databases).  With Sql Server, a column over 900 bytes in size can not be indexed.  I would assume that the total number of bytes for an index can not be over 900 bytes, but I am not sure on that.  I tried to index my UrlAddress field, which is defined as varchar(4096) and I got a nice message box saying that this was not possible.


  • .NET Stored Procedure in Yukon for Web Search in .NET

    Here is how to write a .NET Stored Procedure in Yukon using c#:

    • Reference the appropiate assemblies.  The two biggies are Microsoft.VisualStudio.DataTools.SqlAttributes and sqlaccess.
    • Reference the appropiate namespaces within your class file.  The ones that I referenced are System.Data.Sql and System.Data.SqlServer.
    • Note that the method you want to call as a stored proc should have the attribute SqlProcedure and should be public.
    • All methods that are touched must be static (c#) / shared (at least I think it is shared in
    • Perform the CREATE ASSEMBLY command:
      CREATE ASSEMBLY dbWebSearch FROM 'path to\dbWebSearch.dll'  //Note that my dll also needed WITH PERMISSION_SET = UNSAFE
    • Perform the CREATE PROCEDURE command:
      CREATE PROCEDURE sp_Add_URL_DOTNET( @Url as nvarcahr(4000) ) AS EXTERNAL NAME dbWebSearch:[dbWebSearch.cSqlServer]::sp_Add_URL_DOTNET
    • Here is a section of my code:

    using System;
    using System.Data;
    using System.Data.Sql;
    using System.Data.SqlServer;
    using System.Diagnostics;
    namespace dbWebSearch
    public class cSqlServer
    public cSqlServer()
    SqlProcedure] public static void sp_Add_URL_DOTNET(string pstrUrl)
    SqlCommand sqlCm = SqlContext.GetCommand();
    string strSql;
    string strDomainName = CalculateDomainName(pstrUrl);  //method ommitted for brevity.
    string strSearchCode = CalculateSearchCode(pstrUrl); //method ommitted for brevity
    strSql = "select count(*) from tblSearchUrl where UrlAddress='" + SqlEscape(pstrUrl) + "' and DomainName='" + strDomainName + "' and " + "SearchCode=" + strSearchCode;
    sqlCm.CommandText = strSql;
    if ( Convert.ToInt32(sqlCm.ExecuteScalar()) == 0 ) {
    strSql = "insert into tblSearchUrl (UrlAddress, UrlStatus, DomainName, SearchCode ) values (" + "'" + pstrUrl + "', 'NEW_URL', '" + SqlEscape(strDomainName) + "', " + strSearchCode + ")";
    sqlCm.CommandText = strSql;
    catch (System.Exception sysExc)  //Yes it is bad form to catch a System.Exception, but it works for this and is fairly simple.
    EventLog.WriteEntry("dbWebSearch", "Error Message: " + sysExc.Message, EventLogEntryType.Information);

    Now, before you say that I shouldn't say anything about this, just in case there is any confusion on this, please check this link.


  • First thoughts on Yukon

    Seems that MS Sql Server Yukon, even at this level makes better use of memory than Sql Server 2000.  I have been working with my Web Search routines and the database engine just seems to use less memory than Sql Server 2k.  Granted, I am running on two separate machines but both systems have 1 gig of ram.  Who would have thunk it, a new piece of software from anyone that uses less memory than the previous version of that software.


  • Why searching the Web is slow.

    The Web is nothing more than a really large graph.  The problem is that when you get a node, you don't want to walk over to another node that you have already walked (within some constraint).  As I watch Sql Profiler and see that I am calling my stored procedure to add nodes to my Search Url table, I see that it is literally being called hundreds of times per minute, yet I don't see the URLs added at anywhere near the same rate.  Then **doink** it hit me.  Within my sproc, I check and if the URL already exists in my Search Url table, I don't add it again.  What am I getting at?  Well, I have the logic to see if the URL already exists within the table within my sproc.  I only add a URL if the URL does not already exist within the URL table.  The end result is that my routine expends significant processing power to search and see if the URL already exists within the Search URL table before it actually adds the entry to the table.  Why do things like this?  Well, I don't want to add entries and then merely check them before I input them into the Search Results table.  If I did things that way, I would most likely end up with infinite recursion, which would be bad.


  • The Ballad of Clayon Homes - (Warning: Not Technical)

    I picked up my copy of FastCompany from the mail drop today as I have been out of town for two weeks at our customer in Washington, DC.  I came across this article in it.  It is a very interesting read.  My wife and I were invited to a private Christmas Party at Jim & Kay Clayton's house back before Christmas.  There were about 30 people there and we were treated to Jim's guitar playing about 9:00 pm that evening.


  • Plans going forward with my "Web Search with .NET"

    I got the following question on Sunday regarding my posts about the Spider and associated Web Search:

    I'm curious: is there a possibility you will ever release the source of your pet project? =)

    I've been reading your posts for the last days, and i'm truly impressed by what your doing and it would be *great* to learn from your skills :)

    1/11/2004 5:48 AM | David Cumps
    First off, I think David gives me too much credit. I don't feel that I am a great programmer as much as I know what to do, and more importantly, what not to do.  Afterall, I am not a muscian like my buddy Rob Birdwell, or a mathematician like David Penton, or a complete genius like Paul Wilson, or anyone at Microsoft, or ..... (you get the picture).  I'm a guy just trying to get by.
    To answer this question, I really need to figure out where I am going with this pet project.  Here is where I think I am going.  Please note that there is nothing I am going to say is set in stone.
    1. Build a web interface to my application.  Note that I am almost done with a rudimentry search interface using ASP.NET and the Datagrid.  This interface is not great, but it is a start.
    2. Web Service interface.  I am thinking about implementing a Web Service interface to this application so that a Winforms app could connect over the Internet.  While I am not interested in making this a public search engine, I am more interested in using this as a demo of our capabilities at our company.
    3. Convert the Spider from the ThreadPool to Managed Threads.  This will remove a bottleneck.
    4. Get rid of “magic numbers.”  I have several situations in my code where I am using magic numbers that I just plugged in because they worked but are essentially hardcoded values with no rhyme-or-reason behind them.
    5. Remove certain assumptions, such as the uniprocessor assumptions.  There was some logic within the system that assumed I was running on a single processor system with regards to the TP.  I believe that I have successfully completed this, but I don't have a system to test this on.
    6. Convert to a Windows Service.  Right now, this runs as a Winforms app, but I want to change this to a Windows Service.
    7. General code cleanup so that I am not too embarrassed by what the code that I put out with my name on it.
    8. Convert the code to take advantage of Yukon and the .NET 2.0 framework.  I have had some ideas for this code for a while, but the push I needed to get started was a talk I listened to regarding Yukon and the .NET 2.0 framework.  There are a couple of items within this application that literally SCREAM for Yukon and the .NET 2.0 framework (.NET sprocs, server-side cursors, and a new implementation of TOP come to mind).
    9. Figure out a license scheme to properly implement.  I would love suggestions on this.  While I am a programmer, I am also a capitalist.  I don't want others to unfairly profit from my work.  At the same time, I don't want to stop others from learning.  Something must be done that is fair to everyone, including myself. 

    Anyway, these are my current thoughts, with nothing set in stone.


  • Modified Web Search Algorithm

    With new file space in tow, I updated my search algorithm. Instead of having a single threaded Url Dispatch, Each thread is now responsible for grabbing a set of Urls from the database to search.  With that change, I appear to have gotten a bump in performance in that I no longer have a set of downtime in getting URLs.  I am going to let it run for the next few hours and see how things go.  So far, I like what I see.


  • Sql Server Bottleneck Resolved

    I fixed my Sql Server problem with my Web Spider today.  I had run out of hard disk space.  I bought a 250 GByte FireWire hard drive today.  I hooked it upto my laptop, copied my Sql Server files, reattached my database and bang, it ran just fine.  The amazing thing is that I figured I'd have some type of speed problem.  While FireWire is “suppossed” to run at  400 Mbits / sec or some speed like that, I still figured that somehow it will still be somewhat slow.  Man was I wrong.  Everything seems to be running just fine and with no let down in speed.


  • A Sql Server bottleneck in my Web Search Project?

    Ok, not necessarily a bottleneck, but something I wanted to mention.  Indexes are a great thing when used properly.  Over using indexes causes stability trouble, but it also causes file space trouble.  Since I am running my Web Search on my laptop at this time, disk space is at a premium even on a system with 60 gigs of drive space.  Indexes take up some amount of space.  Be careful in using too many indexes.


  • Socket Connections use the ThreadPool, don't they?

    I tried a suggestion to use the Socket class to get around the ThreadPool problem I mention yesterday.  Seems the Socket class also uses the ThreadPool.  Does anyone have any way to retrieve http content from a web site (no user interface is allowed and I may switch to a Windows Service soon) without getting the ThreadPool involved?


  • My next bottleneck on Web Search with .NET: The Mysterious ThreadPool

    When I started mapping this “Web Search with .NET“ project out, I figured that using the threadpool was the right thing to do.  It made a lot of sense in concept.  “Doing a small amount of work that doesn't depend on anyone else” sounds like the right thing to use the ThreadPool for.  Well, let's look underneath the covers.  The ThreadPool is limited by design in .NET to 25 threadpool threads per process per CPU and there is nothing that you can do to change it (except use your own ThreadPool, as several have pointed out including my buddy Scott Sargent).  Now, what happens when you use an object within a ThreadPool thread that itself uses the ThreadPool?  You get an error, that's what you get.  Yes boys and girls, using an object within the ThreadPool, that itself uses the ThreadPool is a bad idea.  In my case, I used the WebClient object and got an error back from it that said that there were not enough ThreadPool threads to complete the request.  Well, that was bad.  Thanks to Dave Wanta for suggesting to use Async Sockets for this.  He said it uses the IO Completion Ports, which have 1000 available in .NET. 

    I believe that another problem with my design is that I have only 1 dispatch command that feeds individual Urls to threadpool threads.  I am thinkiing it would be a good idea for each threadpool thread to get its own set set of Urls and to iterate through them separately and not depend on the dispatcher for the Urls.

    I am looking into moving away from using the ThreadPool and using entirely managed threads.  I plan on moving to the Async Sockets first to see if that helps to resovle the problem first.

    I am planning on posting info about these changes over the coming days as I make solid progress.  My current goal is to fill up my network connection as much as possible ( preferably 100%, though I don't think that is truly possible).


  • Howto Web Search with .NET

    Here was my initial thinking regarding database tables in my Web Search with .NET Application.  This database setup is my initial design.  It is by no means my final design.  I chose this database setup initial because I wanted to perform full text lookups on as few rows as possible and wanted to perform as few operations against the Search Results table as possible.  There are two tables in this application:

    1. Search Urls.  The Search Urls table is a list of possible URLs to search through.  This table contains a bigint primary key, the Url, ServerName, HashCode for the URL, date the row was entered. and the date the row was last updated.
    2. Search Results.  The Search Results table contains a bigint primary key, the Url, contents of URLs that have been retrieved, server name, HashCode for the URL, date the row was entered, and the date the row was last updated.

    Here is how things go:

    1. Get a value from the Search Urls table.
    2. Flip the value of the UrlStatus to “SEARCHING” for that Url.
    3. Check to see if that Url has already been searched.  If so, go get another Url and start over.
    4. Retrieve the contents of that Url.  Curently, this is done using the WebClient of .NET.  This was a bad choice on my part, which will become apparent in my ThreadPool post, which is up next.
    5. Parse the Url for links and put them into the Search Urls table.  Insertion is done by a sproc which checks to see if that url already exists.
    6. Put the contents of that Url into the Search Results table.
    7. Repeat until entire Web is searched (or somebody hits the “StopSearching“ button).  ;-)

    On my laptop, this runs pretty well.  It fills up my cable modem at home and the DSL line at the hotel here in Washington DC, fairly quickly.  Given the limits I have and the environment I have to work with, I think performance is pretty good.  For example, this morning, when the world was fairly quiet, I was pulling back about 250 URLs per minute for insertion into the Search Results table and about 2400 Search Urls per minute into the Search Urls table.  This is with about 250,000 entries in the Search Results table and about 3.8 million in the Search Url table.  The time to add a single URL to the Search Urls table is showing up at about 10-20 msec per insert.


  • What is Proper Database Indexing?

    Much like “Beauty is in the eye of the beholder,“ “Proper database indexing is in the eye of the workload.” 

    To understand how to properly index a database table, you must first know what the workload of the table is.  For example, indexing every column of a table is probably a bad idea if all you are doing is inserting rows into a table.  Indexing more than is necessary will result in an application that is slower than optimal and may result in an application that causes the database server to freeze under load.  The other extreme is to not use any indexes or have any primary keys.  While this typically allows for inserts to be performed as quickly as possible, it will result in other operations (select, delete, and update) being verrrrryyyyy slowwwwww.  Lack of proper indexing may result in a database performing the dreaded table scan, which you want to avoid as much as possible.  So, somewhere in between the two extremes is the proper set of indexes.

    My ultimate suggestion is to use the proper set of indexes for your application by knowing what your application is doing with regards to the database and using the MS Sql Profiler and Index Tuning Wizard mentioned in a previous post.


  • Database Indexing - SQL Server Tools

    Proper Database Indexing is important with any application.  It is even more important with the applications that I am involved with due to the large amount of data, number of transactions, and number of users of the application that I am typically involved with.  As I have been working on my Web Search Engine with .NET, I have learned about the importance of proper database indexing once again.  First, let's look at the tools with Sql Server.  Sql Server comes with two really good tools for performance tuning:

    Sql Profiler.  Sql Profiler gives you information about the operations that are being sent to your computer.  It provides you with info about the command itself, the number of reads and writes performed, and most importantly to me, the duration of the operation.  As I first worked on the Spider part of my Web Search Engine, I saw from using the Sql Profiler, that my database would quickly become a bottleneck.  Operations that initially took 0-10 msec, were taking 2000 msec and more after the database tables had any significant amount of data in them.  I began by looking at the commands that took the longest and wrote some indexes that were optimized for those operations.  After I applied each index, the commands took less and less time.  That's great, but what if you don't want to write your own indexes?  Well, MS has a tool for you.

    Index Tuning Wizard.  Given the data that Sql Profiler generates, the Index Tuning Wizard can generate a set of indexes for a given workload.  Trust me, it works great.  I used it to verify that my indexes that I created were good, which the indexes were.

    Using the Sql Profiler and Index Tuning Wizard, I have been able to create operations that run against millions of records in 0-10 msec as oppossed to 2,000, 4,000, or 40,000 msec against those same millions of records.

  • Pet Project - Web Search with .NET

    I know that everyone is SO upset that I have been away, as “Wally Who?” echos through your mind.  Besides Christmas, I have been working on a pet project of mine. 

    I have always been involved in the development of applications that deal with large amounts of data, large numbers of users, and a large number of transactions.  Well, I decided to take this to the ultimate level and get a copy of the World's largest database of information, the Web.  Add to this some some stuff about the forthcoming Yukon version of Sql Server from Microsoft. 

    I decided to get a jump on Yukon by building a web search engine using the .NET Framework Version 1.1 and Sql Server 2k.  You will see a number of posts about items over the coming days about my .NET Search Engine.  Right now, I only have the spider running, but there will be a user interface for it (one day).