Solving the Data Access problem: to O/R map or not To O/R map
On the www.asp.net forums (the architecture section), a person asked in the 'Your favorite O/R mapper' thread, why someone would use a 3rd party component for data-access and why would that be an O/R mapper and if so, which one? I've tried to answer these questions in that thread, but because I think it can be of benefit for more people than just the readers of that long forum thread, I've reworked the text into an article you'll find below. Keep in mind I've tried to keep things simple to understand, so perhaps I've left out a detail here and there, however I don't think these details will matter much to the overall conclusions and descriptions. As I've addressed a couple of questions, which I think are related to each other, I've re-written the forum response as a Q & A.
Q: why would I go out and buy a 3rd party component /
library?
A: With every task you have to perform during a
software development project, you have to make a
calculation: if I perform this task myself, how much time
will that take, and given my hourly fee, how much money is
involved in it, minus the knowledge I gain from doing it
myself and the insight it gives me. The number resulting
from that calculation is compared to what a 3rd party
component costs + how much time it will take to get used to
the 3rd party component + the time to figuring out which
component is good + some risk calculation (because a 3rd
party component can turn out to be a bad choice after all
after a month or so).
This sounds awkward, but it's common sense. It's not always
more efficient to go out and buy a component to do things
for you, like it's not always more efficient to do things
yourself. However without making a simple calculation, it's
hard to tell in which situation you're in. Software projects
are hard to manage, and without tight cost control, or
better: cost insight, it's hard to make a project be run
efficient and profitable.
So even if it's tempting to go out and buy a component or
use an open source component, is it really more efficient to
do so? Often it is, don't get me wrong on that, but don't
forget the costs of using a 3rd party component, especially
when it's a freebee without any documentation and just a raw
example program without a lot of comments.
Q: why should I implement an O/R Mapper in my
projects?
A: O/R mapping is in theory very simple: you have a
table field and you have an entity field and you define a
connection between them and use that connection in your
logic to provide functionality like load a class' data, save
it, etc. etc. However using solely the terms 'O/R mapping'
and 'O/R mapper' is only making things more complicated. The
problem description is:
"I have to make a connection between my business logic and my persistent storage, how do I do that?".
The answer: "use an O/R mapper" is not helpful, as it
would require knowledge about what an O/R mapper is.
If you don't know what it is, how can you judge if an O/R
mapper is helpful and if that answer holds some truth? You
can't.
The right answer is a question: "how do I see my data?".
It's the cornerstone of the answer leading to the solution
of the dreaded Data-Access problem. There are a couple of
different views on 'data' which result in different ways of
how the DataAccess problem is solved. You have:
1) table approach
2)
entity (Chen/Yourdon) approach
3)
domain model (Fowler/Evans) approach
(these are the top 3. There are others, most of them fall in
either of these 3 categories though). 1) and 2) look the
same, but aren't. Let's discuss these 3 views more in
detail.
1) Table approach
The table approach is the plain 'I use tables and
query them' approach. No theory used, just a set of tables,
not based on any abstract model, they're created right there
in DDL. The developer uses tables and is expecting to work
with tables in memory as well, so a plain dataset/datatable
approach with stored procedures or VS.NET-generated SQL
statements is an appealing approach. Typically, the
developers using this approach use terms like 'rows' and
'customer record'. It might sound odd, but this is a very
widely used approach on .NET. The reasons for that are that
Microsoft preaches it through VS.NET designers and -examples
and because in the pre-.NET period, ADO with recordset
objects was the way to go.
2) Entity (Chen / Yourdon) approach
The entity approach is different. The relational
model is build with an abstract model and is based on
theory. This means people speak of entities (or if
you want to go really deep into theory, relation) and
attributes. An approach with solely DataTables /
DataSets is often not appealing, as the relational model
speaks of Customer Entity and not about
Customer record. Developers using this approach want
to use these type of elements also in their code. As they
use a relational model as their base of their thinking, the
entities by definition don't contain behavior/rules or just
low level behavior/rules, like the checkconstraints/unique
constraints and other constraints defined like 'shippingdate
>=orderdate' or 'id >=0'.
Also important is the way the developers want to utilize the
relational model. They understand that the data in the
database is just data, and an entity is just a relation
based on attributes, which can be constructed
dynamically as well, with a select statement. This is
important for lists of combined attributes from different
entities and reporting functionality. The entity approach
uses a combination of O/R mapping for the entity data and
generic data functionality like Datasets / DataTables for
the dynamic data retrieval requests. The entity approach is
also widely used, you see it more in the larger applications
as these applications often require a system architect and
data analyst. It's proven technology which exists since the
late '70-ies of the past century.
3) Domain model (Fowler / Evans) approach
The Domain model is an approach which is the most
used approach for solving the Data Access problem in the
Java world, but interesting enough rather rare in the
microsoft world. This is not that surprising, as in the
Microsoft world it was simply unknown: Microsoft never
talked about it, the techniques mostly used by the tools
used by developers didn't support it very well, so running
into it was not that common, only perhaps when you talked
about Data Access with Java developers for example. Another
reason it is not that widely used, is that it requires an OO
approach which wasn't often possible with COM objects and/or
VB5/6.
The domain model focusses on domains, like the
Customer domainor the Order domain. It starts
with classes, like a Customer class, which contains the data
for a customer but also all behavior for the customer, so
all business rules for the customer are stored there. (This
is somewhat simplistically said, there are a couple of
variants of course, but for the sake of the argument, let's
keep it with this description). Through inheritance you can
create a compact model of classes and store the behavior you
have to define in the class it belongs in, using
polymorphism to override/modify behavior through the
hierarchy. The classes / class hierarchy is then stored in a
persistent storage, typically a database.
This is a fundamental difference with
2) : with the Domain
model, the relational model follows classes, classes don't
follow the relational model. Typically, behavior in
2) (and also
1)) are stored in functionality objects like
CustomerManager, which embeds the customer
functionality, and which is applied to
behaviorless entity objects. In
3) you have the behavior
in the class, no manager classes.
3) requires an O/R
mapper to work with the data in the persistent storage or
better: the O/R mapper is required to (re-)instantiate
entity objects from their persistent state in the persistent
storage. Because the system focus on data is through
objects, working with data like in
2) and
1) is not available,
it's working with objects.
What's the best approach?
Hard to say. 25 years of
2) in millions of
software systems around the world can't be wrong, however
millions of software systems in Java using approach
3) can't be wrong
either. I think it largely depends on what you think
is more logical, how you want to work with data. I'm
in camp 2), and our product
LLBLGen Pro
is a tool which tries to help with
2) by offering both O/R
mapping and flexible relational model data access power.
It's therefore not a pure O/R mapper as it doesn't
fit that much in 3), it offers more functionality, to help with
2) than with
3). Also Paul Wilsons
WilsonORMapper
is more of a category
2) than category
3) application.
More pure O/R mappers, like
EntityBroker,
DataObjects.NET, NHibernate and others focus on
3) (most of the
time).
Don't think lightly about this, the differences are
fundamental and will influence how your system structure is
designed for a great deal. So it's important to pick the
approach which fits your way of thinking. To test how
you think about data, ask yourself: "A customer gets the
Gold status when the customer has bought at least 25,000$
worth of goods in one month. Where is that logic placed? in
which class/classes?". Inside the Customer object, reading
inside the customer object order data to test the rule? Or
in a CustomerManager which executes rules and consumes
customer and order objects?
Also don't let your decision be influenced by "but this
example proves x is better than y!": at the end of the day,
data is data and not information. Information is data placed
into context, and it requires interpretation to give it any
value/meaning. How you do that is not important, as
long as you meet requirements as: maintainability,
scalability and efficiency in development, deployment and
perhaps (but not necessarily) performance.
So if your way of writing software is clearly in the
Fowler/Evans camp,
3), don't use datasets, don't use a Data Access solution
targeting 2) because it
will be a struggle: the way of thinking doesn't fit the tool
used: you want to drive in a nail with a screwdriver, you
should either switch the nail with a screw or use a hammer
instead of a screwdriver. So if you're in camp
3), use a pure O/R mapper, it will fit like a glove.
If your way of thinking is clearly in the
2) camp, using a pure
O/R mapper can give you headaches when you want to write a
lot of reports, you want to use a lot of lists combined from
attributes of multiple entities, you need functionality
which allows you to perform scalar queries, and an approach
which allows you to think from the relational model, so an
application which has an approach tailored on starting with
the relational model.
Update: Paul Wilson explained that his mapper is more of a category 2) than category 3) application. I've changed that in the article.