ASP.NET 1.1 database cache dependency (without triggers)

Saturday, May 7, 2005

Overview

Here's a way to implement a database cache dependency in an ASP.NET 1.1 application. At a high level, it uses a timer on a background thread (in an HTTP Module) that checksums database tables. No database triggers are required.

Background

Way back in December I posted about a CMS driven ASP.NET site I'd worked on that's using a SQL cache dependency. I based my code on some posts by Milan (ASP.NETResources.com) and Peter Bromberg, Ph.D. (EggHeadCafe.com), which were partly derived from Rob Howard's technique here .

Why and How?

Why use a database dependant cache?

Many websites make one or many database requests for relatively static data every page hit - if you're componentizing your pages with user or server controls, they may each make separate trips to the database. You can scale a site out at the webserver level, but if they're all going to the same database, you've got a bottleneck that you can't horizontally scale out of. It's more efficient to distribute some of that data to your webserver, and only hit the database when needed. Setting a database dependency on cached data can reduces you database load and network traffic dramatically - the database load is no longer proportional to the site load.

Why not use ASP.NET OutputCache to cache your pages or page fragments?

The ASP.NET Output Cache (both Page and Fragment) is a great, simple way to reduce your webserver processing and database server load. You can set a cache timeout, optionally specify parameter dependencies, and ASP.NET will return the same HTML without processing or hitting the database as long as the cache is valid. The problem is that your dependency is set for a period of time, so your cached page can be showing stale data, which is often unacceptable. If you crank down on the cache time to keep the page relatively current, you decrease the benefit the cache was giving you.

Output caching doesn't always work so well if your page takes a while to generate, because the first page hit after the cache has expired leaves your user waiting for the page to load. The site is pretty quick most of the time, but every now and then a page takes forever to load. That kind of unpredictable user experience is never appreciated.

Is this method efficient?

It depends. This can work well for a relatively static, high volume content site. As an example, I'm using it on a website that delivers about 100K page impressions per day to about 10K users. The site is driven by a CMS database that is updated about 50 times a day by a group of about 100 content editors. The site is database driven, but delivers roughly 20K virtual pages. The CMS database hierarchy defines the site structure - navigation, breadcrumbs, etc. The navigation is dependant on both the site structure and content, since deactivating all the news articles on a page must cause the page to disappear from the site structure. This means that building the pages on the fly for each page hit is impractical, since each page hit would require rebuilding the entire site structure. This is time sensitive financial information, so setting an arbitrary cache timeout is not an option. Additionally, the database is shared with other applications and is overloaded.

This technique wouldn't make sense on large tables. In my case, we're checksumming 3 tables totalling 65K rows and the query returns in less than half a second on average. Of course, testing response time of one query gives a very incomplete picture - the database may be able to cache results when you bang away at one query, but when it's under normal load the cache availablilty may be quite different. We stress tested the heck out of this, and I'd recommend you do to. I would not use this for large tables - for instance, computing the aggregate checksum on a 1.3 million row table took over 10 seconds. If you're dealing with large tables you'll probably need to use the trigger based approach, but if you're dealing with that much information you probably won't have the memory on your webservers to cache a meaningful portion of the data anyways.

In our case, moving the data needed for the most commonly used content over to the webservers made a lot of sense. The webservers keep an XML cache of the site structure and content for the top four levels of the site hierarchy, and only hit the database when content changes or users navigate to deep content.

Why not use the Trigger based approach?

Rob Howard's trigger based approach is a workable solution. This is pretty much how database dependencies are implemented in ASP.NET 2.0 on pre-Yukon databases (Yukon uses SQL Notification Services to push data dependency update notifications to ASP.NET). The thing is, you still have to hit the database to check the Change Notification table. All the triggers do is update a table, which your application still has to ping to check if there have been changes. I've got some reservations on triggers, and I'm not alone here - see Milan's arguments here and here. It works, but since the SQL checksum functions are so efficient, why bother with triggers and notification tables when you can just checksum the table?

What the heck is an HTTP Module? Why use it?

HTTPModules are cool. They're the ASP.NET equivalent of ISAPI filters, but they're really easy to write and configure. They start up with your ASP.NET application starts, and receive all the application and request events. A lot of the cool plumbing of ASP.NET uses HTTPModules. It's the ASP.NET way to componentize background operations. You can also put this kind of code in Global.asax.cs, but it's mixed in with any other application specific code; using a module keeps the code simple and reusable. Also, HTTPModules are configured via web.config setting, so you can disable or repoint a module without recompiling your application.

Why use the Application object rather than the Cache?

I lost this argument to my friend and co-worker Michael Ravan:
The cache is faster, and the application is essentially deprecated, I said.
The performance differance is minimal, he shot back.
But if we're just storing the data to speed up the site, so that means it should be in the cache, I bellowed.
If the cache is expensive to create and it is unacceptable for a site user to wait for the cache to build, it's not a cache - it's an application resource, he trumpeted.

Trumpet beats bellow. Who knew?

This would be different if cache remove callbacks fired when the cached object becomes invalid, but they don't - they fire when an invalidated object is accessed. So instead of going to the store when the last loaf of bread is used up, it waits until I ask for toast and there's no bread. This means that expensive cached objects can cause slow response and can lead to cache thrash.

I looked at using a cached object with cache remove callbacks, but they have the same problem - you don't get the callback until it's too late. To quote Steve Smith:

One potential use for this feature would be to refresh cached data in the background so that users never need to wait for the data to be populated, but the data is kept relatively fresh. Unfortunately in practice, this doesn't work very well with the current version of the caching API, because the callback doesn't fire or complete execution prior to the cached item being removed from the cache. Thus, a user will frequently make a request that will try to access the cached value, find that it is null, and be forced to wait for it to repopulate. In a future version of ASP.NET, I would like to see an additional callback, which might be called CachedItemExpiredButNotRemovedCallback, which if defined must complete execution before the cached item is removed.

How can you communicate with it since it's in an HTTP Module?

Since the HTTP Module runs on a background thread, you can't talk to it directly from a web page (or if you can, I wouldn't advise it). Probably the simplest threadsafe method of communication between threads in an ASP.NET application is with application variables, so that's what I went with.

Getting information out of the module is a bit easier - the module reports on what it's doing via the ASP.NET trace. Trace.Write statements in the Application_BeginRequest event worked well for me.

I changed the table but the checksum didn't change. Why?

I'll let Milan field this one for me:
You're right, checksums do not reflect changes to certain fields:

"BINARY_CHECKSUM ignores columns of noncomparable data types in its computation. Noncomparable data types are text, ntext, image, and cursor, as well as sql_variant with any of the above types as its base type." - Books Online

However, most tables contain more than an ntext field. Most of them come with some sort of "name", "description", "date last updated" fields, etc, and those SQL Server does track changes to.

I'm getting locked file warnings in Visual Studio when I rebuild and the website is running. What gives?

This was a bit tricky to figure out. If the application is running on your development box and you switch between debug and release build, the assembly with this module is locked by the ASP.NET worker process and can't be overwritten. Sneakily enough, Visual Studio buries this information in the routine build messages so it's easy to miss, so you end up with an old module DLL and invalid dependencies which may or may not work when deployed. When you're doing your build, it's best to close Visual Studio and delete the contents of the obj and bin folders, then start VS back up and do your build. Big ups to Ravan for figuring that out.

This is putting heavy load on my dev database - there's no way I'm putting this into production!

This was a fun one - we had eight developers working on the application, and dev database server was dog slow. There was concern that the site was slow in development with no web traffic, so there was no way it could handle thousands of users in production. I had a hard time explaining that the database load was proportional to the number of webservers pinging the database, not the web traffic, so the load was actually much worse in development where each developer was running a webserver. We changed the ping frequency to every 5 minutes in development and things calmed down. Truth be told, this did point out an error I'd made in the SQL that was locking the tables, which lead to using "WITH (NOLOCK)" on each table.

What are you doing with the data?

My sample code shows the guts of the dependency and calls out to NavigationXML.UpdateNavXML(). That's where the application specific code comes in - in my case, we build an XML document and shove it in the Application object. The base page pulls out the chunk of XML that's relevant depending on where the user is in the site and pops it in the HttpContext, and user controls either run XSL transforms on the page specific XML or bind to it. Contextualizing the XML was important in our case, since XSL against the entire 3MB XML site digest was slow and CPU intensive; doing a one time XSL transform on page load to contextualize it slimmed it way down and made the invidual user control XSL very fast. If you want to stay out of XML land, you could probably cache a datatable or something, but we needed the hierarchy information and XML made that easy.

This part is up to you, though - this will tell you when your data has changed, and you need to figure out what you want to do with that information.

Isn't this new fangled crazy talk?

Not so! People were doing this kind of caching back in 2000 on classic ASP. It's a proven technique, this just modernizes it for ASP.NET.

So does it really work?

I delayed posting on it since the application was deployed to production in December in case there were problems. We haven't had any. Database load has decreased (compared to the previous ASP version, which didn't use any caching), stress testing shows we can support many more users with higher response time, the data is fresh, and the application is very responsive.

And now the code:

using System;
using System.Web;
using System.Threading;
using System.Data;
using System.Data.SqlClient;
using System.Configuration;
//using CompanyName.SiteName.Website.Navigation;
//using CompanyName.SiteName.Website.Instrumentation;

//Place the following (uncommented) in your web.config to enable this module.
//<httpModules>
//    <add name="BackgroundService" type="CompanyName.SiteName.Website.BackgroundService,DLLName" />
//</httpModules>

namespace CompanyName.SiteName.Website
{
    public class BackgroundService : IHttpModule
    {
        static Timer timer;
        int interval = 5000;
        static Int64 sum = 0;
        static DateTime processingStarted = DateTime.MinValue;

        public string ModuleName
        {
            get { return "BackgroundService"; }
        }

        public void Init(HttpApplication application)
        {
            application.BeginRequest += (new EventHandler(this.Application_BeginRequest));
            application.EndRequest += (new EventHandler(this.Application_EndRequest));

            if(ConfigurationSettings.AppSettings["ContentUpdate.Interval"] != null)
            {
                try
                {
                    //Content update configuration is listed in seconds
                    interval = (Convert.ToInt32(ConfigurationSettings.AppSettings["ContentUpdate.Interval"])) * 1000;
                }
                catch
                {
                    interval = 5000;
                }
            }

            if (timer == null)
                timer = new Timer(new TimerCallback(ScheduledWorkCallback), application.Context,0, interval);
        }

        private void Application_BeginRequest(Object source, EventArgs e)
        {
            try
            {
                HttpApplication application = (HttpApplication)source;

                //An admin page sets Application["StopTimer"]="true" to disable the timer
                if(Convert.ToString(application.Context.Application["StopTimer"])=="true")
                {
                    timer = null;
                }
                else
                {
                    if(processingStarted==DateTime.MinValue)
                    {
                        if (application.Context.Application["CacheTimestamp"]==null)
                        {
                            application.Context.Trace.Warn("Nav Data","Not updating, not available.");
                        }
                        else
                        {
                            DateTime lastUpdated = (DateTime)application.Context.Application["CacheTimestamp"];
                            application.Context.Trace.Write("Nav Data","Up to date. Last updated: " + lastUpdated.ToLongTimeString());
                        }
                    }
                    else
                    {
                        application.Context.Trace.Write("Nav Data","Updating, started at " + processingStarted.ToLongTimeString());
                    }
                    if (timer == null)
                    {
                        timer = new Timer(new TimerCallback(ScheduledWorkCallback), application.Context, interval, interval);
                        application.Context.Trace.Warn("Nav Data","Content update timer restarted, interval: " + interval.ToString());
                    }
                }
            }
            catch {}
        }

        /*
        //Not using this, leaving stub code in place in case an EndRequest event is needed.
        private void Application_EndRequest(Object source, EventArgs e)
        {
            HttpApplication application = (HttpApplication)source;
            HttpContext context = application.Context;
        }
        */

        public void Dispose()
        {
            timer = null;
        }

        private void ScheduledWorkCallback (object sender)
        {
            HttpContext context = (HttpContext) sender;
            Poll(context);
        }

        #region DB Poll
        private void Poll (HttpContext context)
        {
            SqlConnection connection = new SqlConnection(ConfigurationSettings.AppSettings["ConnectionString.CMS"]);

            try
            {
                //The production application uses a stored procedure. This shows inline SQL for clarity.
                SqlCommand command =
                    new SqlCommand(
                    "DECLARE @sum bigint;" +
                    "SELECT @sum = CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM distribution WITH (NOLOCK);" +
                    "SELECT @sum = @sum + CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM content WITH (NOLOCK);" +
                    "SELECT @sum = @sum + CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM zone WITH (NOLOCK);" +
                    "SELECT @sum;", connection);

                command.CommandTimeout =5;

                connection.Open();
                Int64 current = (Int64)command.ExecuteScalar();

                if(mustUpdate(context,current))
                {
                    if(!(timer==null))
                        timer.Change(10 * interval,interval);

                    //UpdateNavXml calls another more database intensive stored procedure that pulls
                    //data from the above tables and builds an XML document which is then stored in the
                    //application object.
                    NavigationXML.UpdateNavXml(context,false);
                    sum = current;

                    //This method is in the website instrumentation code
                    //Log.Trace("Nav XML updated",context);
                }
            }
            finally
            {
                processingStarted = DateTime.MinValue;

                if(!(timer==null))
                    timer.Change(interval,interval);

                connection.Close();
            }
        }

        private bool mustUpdate(HttpContext context,Int64 current)
        {
            //Check if the Nav XML exists and the checksum is current
            return (NavigationXML.NavXmlExists(context)==false)||(current!=sum);
        }

        #region "Commented - Cache code"
//        private void GetXMLFromCache(string zone, System.Web.HttpContext context)
//        {
//            string navXmlKey = "WebsiteName.APP.CACHE." + zone + ".NavXML";
//
//            System.Xml.XPath.XPathDocument navXml = context.Cache[navXmlKey] as System.Xml.XPath.XPathDocument;
//
//            if (navXml==null)
//            {
//                // get new xml Cache
//                navXml = NavigationXML.CreateXMLFromZone(zone);
//                context.Cache.Insert(navXmlKey,navXml,null,DateTime.Now.AddSeconds(30),System.TimeSpan.Zero);
//                context.Application["CacheTimestamp"] = System.DateTime.Now;
//            }
//            return;
//        }
        #endregion
        #endregion
    }
}

Wow, very meaty post!

One question, I've seen implementations where a trigger writes to a file. The ASP.NET application Cache has a FileCacheDependency watching that particular file, and thus the cache is immediately invalidated when a record changes. Doesn't that solve the problem of restocking the toast when it's out?

Especially if you have a CacheInvalidationCallback registered so you can restock the toast.

Just curious as I honestly haven't gotten to the point of implementing either of these approaches (just haven't needed to yet).

Haacked (aka Phil Haack) - Saturday, May 7, 2005 1:12:00 AM

Ahh, I see! Thanks for the clarification. Man, you've taken a long break from blogging and have come back strong pouring all this useful info into the blogosphere. Good stuff!

Haacked (aka Phil Haack) - Saturday, May 7, 2005 9:48:00 AM

How well does CHECKSUM do for tables with 100k - 1Mil rows? What about for really long tables? Any gottchas to watch out for in using CHECKSUM? This solution seems almost too easy. Great find and implementation! Wonder why the asp.net team didn't use this technique?

thanks

Paul

Paul Welter - Sunday, May 8, 2005 8:45:00 AM

Hi,
I read this article. I like to implment database cache dependency concept in my .net appliation for caching.

The example you have given is for SQL server database. I like to implement this using Oracle database. I am not sure if CHECKSUM concept will work oracle database.

Please advise.

Regards,
Lalitha

Lalitha - Friday, July 20, 2007 3:08:32 AM

Too bad we didn’t spend enough time on ASP way back in college. I was looking for an online tutorial for people who are beginners on this but couldn’t find any. I think I badly have to be oriented on this because most projects require people with knowledge on ASP.

portraits art - Thursday, November 29, 2007 8:46:17 PM