ClubStarterKit – Caching for performance

Friday, December 11, 2009

First of all, if you haven’t heard, I recently released ClubStarterKit v3 Preview. If you haven’t had a chance to look at it, I highly encourage you to take a look at the whole new codebase.

My whole goal when building any application whether it’s an application for a client or the open source ClubStarterKit project is to make the app as fast as possible. There are quite a few layers to the caching mechanisms in CSK that I think could be applied to many other projects. In fact, some of the caching ideas came from an internal web framework we use at eagleenvision.net.

There are two “caches” that CSK uses. One is the HTTP Cache that IIS and other web servers use in the background. The other is the client’s web cache. I will give a brief overview of each in this post.

Web Cache

The web cache is the simplest to understand of the two types of caches. When a request comes down from the web application, a response key is set on the server side that informs the browser to not look up the same file until a certain date. The trick we can do is to set the date as far as possible so the cache NEVER expires. (In CSK I think the cache is something like 5 years…). But what happens when you change your website? That’s where the application ID comes into play.

At every application startup, an identifier is pushed in storage (we use HTTP Application) of a string-based token. In CSK the token is the DateTime of the insertion into the cache so that there are no other collisions. This token is appended onto every CSS file request, image request, and Javascript file request so that the client doesn’t have to wait every time a page is loaded for the same CSS, image and Javascript files to load into the browser when running a particular “application instance”. If, for some reason, the application ID is not passed to the request, then the response isn’t cached.

When a part of your application changes, you have the ability to reset the application ID yourself. In CSK you just navigate to /sitecontent/reset and the application ID should be reset.

The obvious advantage to this strategy is the reduction in unnecessary bandwidth. And users don’t have to wait for something to download that they already have on their computer. So there are some real benefits to using client-side caching of static files.

HTTP Cache

Just like the web cache, the HTTP cache reduces unnecessary bandwidth. The center of the HTTP cache, however, is around the database. It can sometimes be costly to hit a database for the same query. To counteract this we use HTTP caches. These things store data onto the application server and store them for a certain amount of time until it has expired or is expired. A value can be expired by the application server, when the specified TimeSpan is reached or when the item is removed from the cache by the web application. In the CSK, an item is expired from the cache when something is added, such as a new article or a forum post. Once the item is pulled from the cache, the next request forces the cache to go to the database for the query result and store it in the cache.

In the CSK we are also refreshing caches for every application id. This just ensures that there isn’t a leak in the caching mechanism and the data can be easily refreshed by a website owner.

The rationale for this feature is that hitting the database is a lot more costly when you’re dealing with load. Memory cache is really cheap comparatively. So it just makes sense to “hold” the query results until they are expired by either the application’s usage or the refresh of the application ID. In the worst case, ASP.NET removes the item from the cache because of lack of storage. In that case the query is regenerated anyways.

In the CSK there are a few abstractions that I will detail in later posts that are particularly useful when dealing with data. These are the CollectionDataCache, used for storing a collection from the DB, PagedDataCache, used for storing a paged list from the DB, and the SingleItemDataCache, used for storing single items from the DB. All these abstractions are sortable, constrainable, and easy to use. They all use the HTTP cache as the backing store. The abstractions also take care of the data access using the UnitOfWork and Repository patterns we employ in CSK.

There is also an HttpSession cache, which is particularly useful when storing user data. It operates off the same cache interface as the HttpCacheBase.

The Changes Ahead

Like I said, I took a lot of code from the internal framework I built a few years ago. The code I put into the CSK was really useful in that framework. But in the application of CSK, there is a sense of “code smell”. I would really like to do a few things differently, namely abstracting caches even further for CollectionDataCache, PagedDataCache and SingleItemDataCache. So maybe you might want to store a paged list into the session state. Currently you can’t do that without writing your own cache, which isn’t bad… it’s just not particularly fun to do. I would like to add, what I am calling, “cache strategies” to the infrastructure. So expect changes down the line for further abstraction.

So you’ve seen that caches are particularly useful when dealing with data, whether it’s file data or database data. As always, questions, comments, and other feedback are GREATLY appreciated. Just send me an email, comment on this blog, or post on the ClubStarterKit forums.

Aren't you in danger of feature creep here? Have lots of people asked for caching/increased performance? I can't imagine any club has so many hundreds of members *concurrently* accessing the site such that performance becomes a problem. I would have thought that the CSK project is aimed at small clubs with less than a few thousand members and therefore performance will never be an issue?

In my experience, adding caching to a project just introduces lots of confusion and support issues and also makes the code more complex for developers to customise.

Nick Gilbert - Monday, June 21, 2010 4:25:32 PM

@Nick

I disagree with you here. I've had small sites that I have found that caching has saved a lot of bandwidth. Look, caching is cheap. Very cheap. Why not take advantage of it? Hitting the database is expensive and unnecessary if the same values will be returned every time. It doesn't matter if you have hundreds of users, thousands of users, or 10 users, caching can save time when loading a page for the second time to prevent another unnecessary run to the database.

zowens - Wednesday, June 23, 2010 2:39:06 PM

Web Cache

HTTP Cache

The Changes Ahead

2 Comments