Jason Salas' WebLog

On-air and online: making people laugh, making people think, pissing people off

Sponsors

ASP.NET sites that kick ass

Pals with blogs

Podcasts I listen to

Well, it's not Google, but it works...

I've been struggling with an architecture problem since a friend made a suggestion that led to my diving into amping up my site's internal search capabilities.  The server that my site is on doesn't have SQL Server 2000 Full-Text Indexing installed, so I'd either have to (a) migrate to a new database server, or (b) build something from scratch.

I opted with the latter, choosing to make all 10,000+ of my company's news articles searchable.  This is something that's actually been bugging me for a long time, and let me tell you...for large sites, there's no harder thing to do than create great search capability.  I essentially wrapped a WebForm around a T-SQL LIKE statement (I'd much rather have used the FTI-dependent FREETEXTTABLE function, had I been privvy to it), and returned results.

It's not as slow as I was thinking, and while it does take about 4-5 seconds at most for large recordsets to come down the wire and the results aren't quite as accurate as I would hope with the aforementioned FREETEXTTABLE, it performs quite nicely.  I tried the same thing a couple of years ago with Access, which nearly crashed my entire server.  :(

A couple of points on why it runs smoothly for you architecture-heads out there on what I'm using to make it do what it does:

  • uses a stored procedure for speed & security as opposed to straight T-SQL
  • uses a SqlDataAdapter binding to a single DataTable
  • stores the derived recordset as a Session variable prior to databinding, reducing the need for recurring database roundtrips
  • pages the resultset in a Repeater control rather than the normative DataGrid by specifying a PagedDataSource object as the Repeater's DataSource property

Comments

Sreejathh S. Warrier said:

Jason,
When I read that you are using SPs instead of T-SQL for speed, I was reminded of Frans Bourma's post on the topic.
I suspect you would already have seen this, but just in case you havent.
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx
Cheers,
Sreejath
# June 7, 2004 7:46 AM

David Crowell said:

I implemented my own search functionality for a site here at work also.

I took a slightly different approach. I have a base page class that overrides the Render method, and builds a keyword list from the HTML. I have two tables to stored indexed data, SearchedPages and SearchedWords. I also use the database and caching to store the last time a page was indexed, and don't index it again for 24 hours.

It's actually very fast, the results leave a little to be desired though. I'm probably not smart enough to write the next Google, so I'll look into using full text searching next time.

As far as stored procedures... That's a topic for a religious war (again). Use whatever you are comfortable with :)
# June 7, 2004 8:29 AM

Scott Galloway said:

Check out Lucene.NET (http://sourceforge.net/projects/lucenedotnet), this is a high performance file-based search engine...coincidentally, it's also used in .TEXT 0.96...
# June 7, 2004 8:29 AM

TrackBack said:

Take Outs for 7 June 2004
# June 7, 2004 8:31 AM

Phil Scott said:

I was stuck with the same problem with our site. I took the easy way out and just stuck a branded google search with a site:www.blahblahblah.com around it.

Honestly, it works great. I still feal a bit guilty about "cheating," but I've actually had a couple of people inquire on how I did it because it works so well. I've had no compliants either.
# June 7, 2004 10:00 AM

David Crowell said:

I would have loved to use a branded google search. The client would not go along with it though. We could have saved some serious time (and we were overbudget as it was).

At least I got to learn something along the way.
# June 7, 2004 10:39 AM

TrackBack said:

# June 8, 2004 11:45 AM

TrackBack said:

# June 8, 2004 11:47 AM
Leave a Comment

(required) 

(required) 

(optional)

(required)