Object-oriented data abstraction and caching strategies

I'm not sure if anyone reads blogs on the weekends, but what the heck. :)

I've been thinking a bit about the way I get data in a lot of apps I write, and it's typically something like the article I wrote on uberasp.net. I have some kind of class that has the typical CRUD stuff and properties that match columns, with lots of caching. This works really well, particularly in the forums, where pages get rendered in .05 seconds or less. (Actually, the data access is generally in yet another class, so that article isn't exactly the way I generally do things.)

This class design is great for straight-forward data selection, but it's not quite as straight forward when you do joins. For example, in a forum topic I might get all of the posts as well as user data to display (their post count, signature, or whatever). Sure, each Post object could have a User object as a property, but then I start to worry about the caching, and all of the performance that brings. I may cache a collection of Post objects for as long as the topic isn't changed, but what happens when an individual user changes their sig or increases their post count? The data in the cached topic is then too old.

I probably shouldn't be thinking about this on a Saturday night, having a beer on the deck waiting for friends, but what the heck, I'm still a geek, even with the beer. What would you do?

3 Comments

  • You ran into one of the many traps with caching. In the O/R mapper world the group who thinks caching is for speeding things up gets smaller and smaller and for a reason: it doesn't make things faster, it's only nice for uniquing (one object per entity in the complete app). It actually makes things slower, as every bulk query has to be scanned completely for cache hits. (you have 50 customer entities in the cache, you ask for all customers (or any group of customers), you can't simply return all 50 customers in the cache, you have to pull them from the db, match them one by one with the ones in teh cache, update the cache, and return the results after that.



    If you look for efficient caching for fast-speed internet pages, you should use another trick. (well, you have the option for a couple of tricks actually). One of the best is the 'render the complete site every X seconds/minutes'. This is how slashdot works for example. What they do is both as simple as effective:

    they render the complete site, or parts of the site (which are embedded in pages) every minute. If the hardware they have can render the complete site every 20 seconds, they can serve the site to a very large audience without experiencing slowdowns. Also, the visitors have a delay of 1 minute or less to see their content up. In reality they render parts of the site in real time and other things once per minute (front page) or once per 20 seconds or so (comments).



    If the traffic gets too high, you increase the interval to say 2 minutes. It's that simple :). The fun thing is: even if there is a slight delay, users won't notice.



    Another trick is using cached HTML in the database and using views. You speak about a forum. The actions which are the heaviest are: read threads per forum + statistics (who posted the last posting, when etc.) and read a page with message in a forum thread. These actions are done a lot of times and way more than there are post writes. For my forum I use per posting a text field for the posting text and a text field for the HTML, which is the text parsed by the UBB parser to XML, converted with an XSL to html. I then use a couple of views which pull the data for viewing (for example a threadlist or a page of postings). They do a simple select and the results can be bound to a repeater control without postprocessing. The ASP.NET cache can then take care of page changes, for example invalidate every minute.

  • Jeff, I hope you get more feedback on this...I've had the same thoughts over the past couple of weeks. Sticking with the forums example, one of the things I've played with is using delayed creation:



    private int userId;

    //by FX Cop standards, this probably ought to be a method

    public User User{

    get{ return UserUtility.GetUser(userId); }

    }



    The advantage I see here, is that you don't have to worry about whether you have the most-up-to-date user in your Post class, but instead in your UserUtility class



    UserUtility.GetUser would be implemented as a cache lookup/hit database. UserUtility.Update() would call User.Update() to the database, and update the cached object, so subsequent calls to UserUtility.GetUser() return the updated user.



    It'd be good if paul wilson could jump in on this one...

  • Frans: The problems that you describe aren't really the problems that I have. Actually, I don't have any problems at all the way I work things now, I'm just thinking in terms of making things more intuitive for the programmer and efficient at the same time. To make the blanket assertion that caching makes things slower is incorrect if I can demonstrate that my app is slower without it.



    Karl: Actually, that's kind of the way I do things in the forum now (see the class library docs at http://www.popforums.com if you're interested). If you look at a topic, the posts are all cached as an ArrayList of Post objects. The UI fills in the user details on each post individually by looking up People objects based on Post.PeopleID. This would be ridiculously inefficient if the People data wasn't cached, but since it is, it's insanely fast.



    Truth be told, the only reason I think about this at all is in the case of spreading out the load across more than one server. To do that in my app you'd have to turn caching off, which means the above mentioned topic display would be dog slow. The flip side of that is that I wrote the forum for myself first, and there's little to no chance I'd ever need to run the app across more than one server, and I doubt any of the thousands of people that download it every month would need to either.

Comments have been disabled for this content.