Output Caching to Disk in Whidbey

Oddur Magnusson asked an interesting question about caching out of process with ASP.NET.

With ASP.NET V1 and V1.1, cached items are always stored in-memory.  With ASP.NET Whidbey -- starting in the spring Beta -- we'll automatically save output cached entries to disk.  This will enable cache items to survive worker process restarts.  It also enables you to cache items for far longer durations without requiring tons of memory.  Applications using aggresive output caching today should see a significant reduction in memory working set as a result.

The disk based output caching support also works well with an additional new Whidbey feature -- which is database driven cache invalidations.  You could use the two to cache a page for weeks -- but automatically re-generate it if necessary when the database rows it uses are updated on the backend.  Just like a cached page in memory, you can setup the cache item to automatically vary by querystring or other browser values. 

 

14 Comments

  • Very cool. This should be fanstastic for caching dynamically generated images, where you don't necessarily want to suck up server memory but would like to avoid regenerating the images every times. I was looking at implementing a dynamic image cache for a project of mine - perhaps I needn't bother.

  • Oddur is the bomb! allways inspiring with his out of the box ideas an thoughts!

  • Can the new cache be generated on a schedule, rather than wait for a user to request the page and have to wait as it is regenerated?

  • We don't automatically support scheduled re-generations (where no user is required to hit the page) -- although we have a new server execute feature in Whidbey that would allow you to write some custom code that should cause this behavior.

  • I assume that with this feature there will be a way to control how much disk space gets used by the cache. Current caching flushes items out of memory when it needs to. Will the disk based caching have similiar features?

  • Will there be options for cache's like LRU based cache, lateral caching between nodes in a cluster, hibrid caches (memory offloads to disk at certain usage levels)? And what about objects that are not [serializable] will you still be able to cache them in memory?

  • I am especially interested in if .NET 2.0 will offer a centralized caching service similar to the out of process session service. I think this would be extremely useful in webfarm scenerios. It might not be as fast as pulling from memory or even from a local file, but it will still be WAY faster than re-rendering the content from a database.



    Scott, is this something you guys are considering?

  • The disk based cache supports a number of configuration settings to control overall size and cleanup symantics. It will also have "scavanging" features to automatically delete items as the disk fills up (similar to the in-memory implementation today).



    We aren't doing work in the Whidbey timeframe to enable centralized cache servers (where multiple machines go against separate cache servers for web farm scenarios). Instead, we are recommending that you keep separate cache instances local on each machine.



    There are a couple (both lame and less-lame) reasons we aren't doing the centralized cache store. The lame reason of course being that maintaining cache coherancy accross a distributed set of cache stores in a reliable way is a lot of work, and if done poorly can end up affecting overall reliability of the entire cluster system.



    The less-lame reason being that a number of attempts have been made in the past (some by Microsoft -- if you remember the IMDB project -- and a lot elsewhere), and in the end it was found that for a lot of workload scenarios the distributed caching ended up slowing things down as opposed to speeding them up.



    On the surface this sounds counter-intuitive (surely a cache server trip must be faster than going back to a database?), but when you factor in the network round-trips and data-buffer copies, it often ends up being a wash when compared to a decently tuned database.



    [Important Caveat: There are of course situations where you have a highly latent remote database, or a very slow data store resource -- in which case a distributed cache stored closer to your middle tier servers could give a (potentially very big) performance win.]



    One of the reasons for a centralized distributed cache (the desire to make sure two machines don't cache separate versions of a page/object), also goes away in the Whidbey timeframe with the new CacheDependency feature (which enables you to invalidate items immediately when underlying data changes -- which can ensure that machines accross a cluster maintain the same cached content as opposed to getting out of sync with purely timer driven approaches).



    So to answer Eric and Ken's question above -- it is something we've looked at, but didn't implement in Whidbey. It will, though, be something we'll continue to watch and consider for future versions though.

  • Scott,



    In our scenerio we currently cache search result sets to disk for individual searches so that we don't need to go back to the database every time the user goes to the next or previous page of that search. Our searches are very expensive and this is a huge perf win for us. It would be very cool to be able to go away from our homegrown solution to something that is more generic and supported. But in our situation it would have to support a webfarm setup or we wouldn't be able to use it. I understand that in some situations caching items from the database to a remote server ends up being about the same difference or worse, but I think there are a lot of good scenerios for using a centralized cache as well. What would be really cool is if the caching system in ASP.NET would give you the options of caching to memory, to local disk, or to a centralized cache server for each cache entry.

  • Gotta love the cache

  • Eric -- I definitely hear you on the scenario. We'll keep it on the list to review for the next version after Whidbey and see if we can come up with built-in support for it.



    Thanks,



    Scott

  • Hi Scott



    quote:

    One of the reasons for a centralized distributed cache (the desire to make sure two machines don't cache separate versions of a page/object), also goes away in the Whidbey timeframe with the new CacheDependency feature (which enables you to invalidate items immediately when underlying data changes -- which can ensure that machines accross a cluster maintain the same cached content as opposed to getting out of sync with purely timer driven approaches).





    Is it also possible with a non Yukon database?

    Thanks

    // Ryan

  • Hi Ryan,



    Yep -- that feature is also available for non-Yukon databases. It works out of the box for SQL 7 and SQL 2000. The feature could also be used with other data stores.



    Hope this helps,



    Scott

  • Hi Scott,



    I have tried out the page output caching in the CTP2 release and it has increased page output by 5-6 times on my pc acting as both test client and server. I am sure the improvements will be even better in a deployment scenario.



    A little digging in the code using Reflector indicates that the caching to disk is based on usage of the DiskOutput cache class and 2 methods PersistEntry and RetrieveEntry seem to do most of the work.



    Hence, a provider pattern should be possible where the OutputCacheModule instantiates a provider implementing an interface, e.g. IPersistentCacheProvider instead of DiskOutputCache.



    Is it on the cards to go this way and give developers an option of writing their own providers?



    Also, not sure if this is the right place, but some code that worked in pages without output caching enabled throws exceptions when caching is enabled.



    For example, my home page derives from sitetemplate.master which in turn has a user control for the header. In master_load event, the usercontrol referred to as me.UserControl1 comes as Nothing on some invokes!



    Wonder if this is a known CTP issue?



    The CTP2 release is really good and miles ahead of the CTP1 and the PDC release and I am enjoying porting code to it.



    Thanks

    Sumeet

Comments have been disabled for this content.