Reply to "What ORMs have taught me: just learn SQL"
This is a reply to "What ORMs have taught me: just learn SQL"
by Geoff Wozniak.
I've spent the last 12 years of my life full time
writing ORMs and entity modeling systems, so I think I know a thing or two about this topic. I'll
briefly address some of the things mentioned in the article.
Reading the article I got the feeling Geoff
didn't truly understood the material, what ORMs are meant
for and what they're not meant for. It's not the
first time I've seen an article like this and I'm convinced
it's not the last. That's fine; you'll find a lot of these
kind of articles on many frameworks/paradigms/languages etc.
in our field. I'd like to add that I don't know Geoff and
therefore have to base my conclusions on the article alone.
Re: intro
The reference to the Neward article made me chuckle: sorry
to say it but bringing that up always gives me the notion
one has little knowledge of what an ORM does and what it
doesn't do. An ORM is just a tool to translate between two
projections of the same abstract entity model (class and
table, which result in instances: object and table row); it
doesn't magically make your crappy DB look like one designed
by CELKO himself nor does it magically make your 12 level
deep, 10K object wide graph persist to tables in a
millisecond as if there was just 1 table. Neither will SQL
for that matter, but Geoff (and Neward before him) silently
ignores that.
An ORM consists of two parts: a
low level system which translates between class instances
and table rows to transport the entity instances (== the
data) back and forth, and a series of sub-systems on top of
that to provide entity services (validation, graph
persistence, unit of work, lazy / eager loading etc.
etc.)
It is not some sort of 'magic connector'
which eats object graphs and takes care of transforming
those to tabular data of some sort with which you don't want
to know anything about. It also isn't a 'magic connector'
which reads your insanely crappy relational model into a
dense object graph as if you read the objects from memory.
Re: Attribute Creep
He mentions attribute creep (more and more attributes
(==columns) per relation (==table)) and FKs in the same
section, however I don't think one is related to the other.
Having wide tables is a problem but it's a problem
regardless of what you're using as a query system. Writing
projections on top of an entity model is easy, if your ORM
allows you to, but even if it doesn't, the wide tables are a
problem of the way the database is set up: they'll be a
problem in SQL as well as an ORM.
What struck me
as odd was that he has wide tables and also a problem with a
lot of joins which sounds like he either has a highly
normalized model, which should have resulted to narrow
tables, or uses deep inheritance hierarchies. Nevertheless,
if a projection requires 14 joins, it requires 14 joins: the
data itself isn't obtainable in any other way otherwise it
would be doable through the ORM as well (as any major ORM
allows you to write a custom projection with joins etc. to
obtain the data, materialized in instances of the class type
you provide). It's hard to ignore the fact the author might
have overlooked easy to use features (which hibernate
provides) to overcome the problems he ran into and at the
same time it's a bit odd a highly normalized model is the
problem of the ORM and won't be a problem when using SQL
(which has to work with the same normalized tables)
He
says:
Attribute creep and excessive use of foreign keys shows me is that in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.
I agree with the fact that you still need to know SQL, as you need to formulate the queries in your code in such a way that it leads to more efficient SQL; an ORM can do a bit of optimization but it is almost impossible to do without statistics/data (which are not available at that stage). But you can't conclude from that to 'just use SQL', as that's like recommending to learn to write Java Bytecode because the syntax of Clojure is too hard to grasp. A better conclusion would be to learn the query system better so you can predict the SQL which will be produced.
Re: Data Retrieval
Query performance is always a concern, and anything between code and the actual execution of the DML in the DB is overhead. Hand-optimized SQL might be a good option in some areas, but in the majority of cases queries generated by ORMs are fine, even hibernate's ;). Most ORMs have a query language / system which is derived from SQL to begin with (the mentioned hibernate does: HQL) and it is predictable what SQL it will roughly produce.
Sure, if you create deep inheritance hierarchies over your
tables, you might run into a lot of joins, but that's known
up front: inheritance isn't free, one knows what it will do
at runtime. "Know the tool you're working with". If Geoff
was surprised to see a lot of joins because a 14-entity deep
inheritance hierarchy was pulled from the DB, he should have
known better.
He says:
From what I've seen, unless you have a really simple data model (that is, you never do joins), you will be bending over backwards to figure out how to get an ORM to generate SQL that runs efficiently. Most of the time, it's more obfuscated than actual SQL.
I find this hard to believe with the query systems I've seen
and written myself, with one exception: Linq. Linq is a bit
different because it has constructs (like GroupBy) which are
different in Linq/code than they are in the DB which require
a translation of intend from the query to SQL and thus can /
will lead to a SQL query which might not be what one would
expect when reading the Linq query.
The usage of
Window functions and other DB specific features (like query
hints) might be something not doable in an ORM query
language. There are several solutions to that though, one
being creating DB functions which are mapped to code methods
so you can execute the constructs inside your query using
those methods which will result in using the functions in
the SQL query, another being DB Views. They both require
actions inside the RDBMS which is less ideal, but if it
helps in edge cases, why not? They're equal to adding an
index to some fields to speed up data retrieval, or creating
a denormalized table because the data is read-only anyway
and it saves the system using it a lot of joins.
Re: Dual schema dangers
Here I saw the struggle Geoff had with the concept of ORMs.
This isn't uncommon, e.g. Neward (in my opinion) expresses
the same struggle in his cited essay. There are two sides
with a gap between them: Classes and Table definitions. If
you start with classes and try to create table definitions
from these it's equal as starting with the table definitions
and try to create classes from these: both are the
projection result of an abstract entity model and to get one
from the other requires reverse engineering the side you
start with to the abstract entity model it was the
projection of and then projecting that to the side you want
to create: starting from classes or table definitions
doesn't matter.
I do understand the pain point
when you start with either side and have to bridge the gap
to the other side: without the abstract entity model as the
one true source of truth, it's always a problem when one
side changes to update the other side.
Geoff
tries to blame this on the ORM but that's not really fair:
the ORM is meant to work with both sides (class and tables)
at run time, not at design time; it requires a
system meant for modeling an abstract entity model to manage
both sides, as both sides are the result of that model, not
the source of it. (I wrote one, see 'Links to my work' at
the top left. I didn't want to pollute this article with
references to my work)
Re: Identities
Creating new entity instances which get their PK set by a sequence in the DB are the main cause of the problem if I understand Geoff's description correctly. In memory, these entities have no real ID and referring to them is a bit of a pain, true. But that's related to working with objects in general: any object created is either identified by some sort of ID you give it or its memory location ("the instance itself"). I don't get the struggle with the cache and partial commits: if you want to refer to objects in memory, it's equal to what you would do if they weren't persisted to a DB. That they get IDs in the DB in the case of sequenced PKs is not a problem: the objects get updated after the DB transaction completes. Even hibernate is capable of doing that.
Re: Transactions
This section is a typical description of what happens when
you confuse a DB transaction with a business transaction. A
business transaction can span more than one DB transaction,
might involve several subsystems / services, might even use
message queues, might even be parked for a period of time
before commit. A DB transaction is more explicit and
low-level: you start the transaction, you do work, you
commit (or rollback) the transaction and that's it.
Geoffs reference to scope is good, it
illustrates that there's a difference between the two and
therefore you shouldn't use a DB transaction when you need a
business transaction. However it's too bad he misses this
himself. Often developers try to implement a business
transaction at the level of an ORM by using its unit of
work, but it's too low level for that: a business
transaction might span several systems and an ORM isn't the
right system to control such a transaction; it's meant to
control one DB transaction, that's it.
That
doesn't mean the ORM shouldn't provide the tools to
help a developer write proper business transaction
code with the systems controlling the business transaction.
After all, the second part of an ORM is 'entity services'
and one being 'Unit of work'. Most ORMs follow the
Ambler paper
and combine a Unit of Work with their central Session or
Context object. This leads to the problem that you can't
offer a Unit of Work without the central Session or
Context object and thus when you actually want a Unit of
Work to pass around, collecting work for (a part of) the
business transaction, you don't want to deal with a Session
/ Context object which also controls the DB connection /
transaction; it might be that at that level / scope it's not
even allowed / possible to do DB oriented work.
It's therefore essential to have an ORM which
offers a separate Unit of Work object, which solves this
problem. Additionally to that, the developer has to be aware
that a business transaction is more than just a DB
transaction and should design the code accordingly.
Re: Where do I see myself going
A highly normalized relational model (4+ normal form) which
is used to retrieve denormalized sets is not likely to
perform well (as the chance of a high number of joins in
most queries is significant), no matter what query system
you're using. I get the feeling parts of what Geoff ran into
is caused by reporting requirements (which often requires
denormalized sets of (aggregated) data), parts are caused by
inheritance hierarchies (not mentioned but according to the
# of joins which were unexpected I think this is the case)
and partly caused by poorly designed relational models.
None of those are solved magically if you use
SQL instead of HQL or whatever query language you're using
in an ORM. Not only is 'SQL' a query language and
not a query system, it also doesn't make the core
problems go away. Well, perhaps the inheritance one as you
can't have inheritance in SQL, but then again, you're not
forced to use inheritance in your entity model either.
He says:
By moving away from thinking of the objects in my application as something to be stored in a database (the raison d'être for ORMs) and instead thinking of the database as a (large and complex) data type, I've found working with a database from an application to be much simpler.
Here Geoff illustrates clearly a misconception about ORMs: they're not there to persist object graphs into some magic box in the corner, they're a system to move entity instances(==data) across the gap between two projections of the same abstract entity model. It's no surprise it turns out to be much simpler if you see your DB as part of your application, because it is part of your application. If we ignore the difference in level of abstraction, it's equal to talk to a DB through a REST service as it is to talk to a DB through an ORM which provides you with data: both times you go through an API to work with the entity instances on the other side. The REST service isn't a bucket you throw data in, and neither is the ORM.
Re: conclusion
SQL is a query language, not a query system. It's therefore not an alternative to the functionality provided by an ORM. ORMs make some things, namely the things they're built for, very easy. They make other things, namely the things they're not built for, hard. But the same can be said about any tool, including SQL (if we see a language as a tool): SQL is set oriented, and therefore imperative logic is hard to do, so one shouldn't do imperative logic in SQL. Blaming SQL for being crap in dealing with imperative logic doesn't make it so, it merely shows the person doing the blaming doesn't understand what SQL is meant to do and what it isn't meant to do.
In closing I'd like to not that what's ignored in the article is the optimized SQL ORMs generate with respect to e.g updates and graph fetches (eager loading). Left alone the fact that to execute the SQL query and consume the results, one has to write a system which is the core of any ORM: the low-level query execution system and object materializer.
It always pains me to read an article like Geoff's about a long struggle with ORMs as it's often based on a set of misconceptions what ORMs do and what they don't do. This is partly to blame on some ORM developers (let's not name names) themselves which try to sell the image that an ORM is a magic object graph persister and will turn your RDBMS into an object store. It's also party to blame on the complexity of the systems themselves: you don't simply learn how to use all of the ORM features and quirks overnight.
And sadly, it's also party to blame on the users, the developers using the ORMs, themselves. Suggesting a query language as the answer (and with that the tools that come with it) isn't going to solve anything: the root problem, working with relational data in an OO system, i.e. bridging the cap between class and table definition, still has to be solved, and using SQL and low-level systems to execute it will only move that problem onto your own plate, where you run the risk of re-inventing the wheel, albeit poorly.