RSS Feeds at weblogs.asp.net should be static

Phil Winstanley blogs about the very slow pace of the SqlServer(s) behind the blogs here at weblogs.asp.net. I agree, the blogs are very very slow (posting a comment takes ages and reading the main feed is often resulting in a timeout).

I think part of the deal is that the RSS feeds here are produced dynamically. This means that if there are 10,000 users pulling the main feed's RSS data every 10 minutes, the engine will every 10 minutes produce 10,000 versions of the RSS feed dynamically, consulting the database and other logic. For not that popular feeds, it's not that bad. However with a highly popular feed like weblogs.asp.net, it's not that efficient.

Now, imagine that the engine would produce every T minutes a new RSS feed file. The 10,000 requests for this file will only be fulfilled if the file is changed. If the file hasn't been changed, the webserver will return a HTTP 304 message. If the file has been changed, no database activity has to be performed: just send the bits. Scott told me that due to some clustering issues this is not yet possible here, but perhaps that can be (and has to be) changed.

Using a fixed time window of T minutes, you can produce content which is always producible within a fixed time period smaller than T (otherwise you have to set T higher). This is the mechanism used by high traffic sites like Slashdot for years: every T minutes the complete content is generated. If something changes in that window, you can decide to generate the content dynamically into a cache (a comment for example) or wait for the next time window. For the main feed(s) for example, you could generate every 30 minutes a new file or when a new blog is posted.

That would greatly reduce the access to the database server, which seems the bottleneck at the moment.

6 Comments

  • No need for a file, Frans,



    Just what is called OutputCache :-)



    Not like this is not proposed standard procedure for ASP.NET applications.

  • With caching there is always the possibility of stale data, however it's usually worth the chance of getting stale data.

  • Ah the output cache, hadn't thought of that (is that shareable between webservers btw? files are easier shared with a simple service)



    Stale data is unavoidable, as rss is a pull mechanism: when you check your rss reader, the actual state of all the blogs subscribed might be different, you only know the actual state when you refresh constantly ;)

  • The feeds are served from the cache. Caching feeds to disk might be a future version, but it is not high on the list just yet.

  • scott... not high on the list? you should reconsider that statement.



    and why couldn't you use a static file on a network share for the web servers... IIS caches static files in memory, so network traffic is basically minimumal for the static case



    for instance, weblogs.asp.net/rss virtual directory points to like \\domainhost\rss which contains a single file, rss.xml. all the webservers get a domain aspnet account that has read/write access to that file, and when a stale condition is detected, have one of the webservers rewrite it...



    and I've encountered that timeout issue myself... it seems kinda unprofessional... but then again, my website is currently offline... :-(

  • I'm not sure I see a huge benefit to caching from disk vs the current memory cache. The current implementation does not go to the database for each request. It is stored in the ASP.NET Cache object until the data changes. Then it's recalculated and re-cached on the next request.

Comments have been disabled for this content.