Repositories gone wild

Thursday, January 29, 2015

open source MVC SQL Server General Software Development

One of the very beneficial side effects of the rise of MVC in the ASP.NET world is that people started to think a lot more about separating concerns. By association, it brought along more awareness around unit testing, and that was good too. This was also the time that ORM's started to become a little more popular. The .Net world was getting more sophisticated at various levels of skill. That was a good thing.

But something I do remember vividly was that a lot of tutorials were using a generic repository pattern. In other words, there was some kind of contextual data object that was used to query the data from a number of other places upstream. This was certainly better than finding data access code in the code-behind of an ASP.NET Webforms page, certainly. Those generic repositories still have some value, for example, when paired with UI elements (namely the grids made by various component makers), though that flexibility certainly comes with a price.

So what price is that? Well, there are quite a few negatives that I've found. In no particular order:

The aforementioned UI elements don't know anything about what's indexed or how the underlying data is arranged. Separation of concerns, right? Sure, but there's a bigger context about application design that leads to trouble. Sure, you can sort or filter results, but the grid doesn't know how to do it efficiently, through an ORM.
ORM's are already leaky abstractions, meaning you need to know something about the underlying implementation to make them work the way you want. The leaks come in many forms, needing to understand transaction scope, change tracking, when stuff is persisted, etc.
The resulting interface isn't really an enforceable contract, it's just a thin wrapper around the ORM.
Testing ends up being mostly about matching the query syntax around the generic repository. If you're using repositories with very specific methods for specific actions, you don't need to worry about that querying syntax.
This marries you to the data persistence mechanism. I know what you're thinking, no one ever changes the database, but I'll get to that in a moment.

There are a lot of advantages to repositories that are domain specific in their contract. So for example, you have a repo for customer data, with methods like "GetCustomer" or "UpdateCustomerAddress." You have to think about how you work with context and transactions, but I think that's a problem solved by dependency injection. A lot of people will debate over whether or not you use data transfer objects (DTO's) or entities, or whatever, and that's fine, but my preference is to not rely on entity change tracking to decide what you persist. In other words, I prefer a method takes a couple of parameters like "customerID" and "address" and not read an entity, change it, then save it. That process requires knowledge of the underlying persistence layer or data access framework.

I know not everyone will agree with me, but I don't care for unit testing data access code either. Part of it is that it makes the tests slow, but mostly it's because I have no desire to test ORM's (which are presumably well tested), and if I'm using straight SQL, if it's at all complex, I'm already doing a lot of testing trying to get it right. Do I end up with bugs in the data access code this way? Sometimes, sure. It's a trade-off.

But then there's that whole thing about not coupling your app to the persistence mechanism. The usual response to that is that no one ever changes the kind of database they're using, and six or seven years ago, I would have agreed with that. But a funny thing happened when we started using new caching tools, load balancing across servers got cheap (yay cloud!) and all of these document databases and table storage mechanisms started to get popoular. As it turns out, now there are good reasons to switch up your data store.

I have two recent examples. While none of it is ready for production use, I started experimenting with shared caching on my POP Forums project, first with Azure Cache, then Redis. For years I used the HttpRuntime.Cache, which is super unless you want to run multiple instances. Trying to make that pluggable with generic repositories would have been hard, but it was really easy with repositories that defined a bunch of domain-specific functions. Similarly, I had another project that involved storing images in SQL, but (at the time) it was cheaper to store those images in blob storage (which could also be directly accessed via HTTP, and put behind a CDN). The project used generic repositories (with Entity Framework, in case you were wondering), so we had to break upstream code to pull it out into this cheaper storage. The changes for more specific repos would have been a lot faster, and therefore less expensive and risk prone.

What am I getting at? I'm not at a all a fan of generic repositories. They work OK in low volume apps that don't need to be changed much, but they're harder to deal with in big volume, always changing apps.

1 Comment