Code-first O/R mapping is actually rather silly.

Saturday, December 14, 2013

Code-first. It's a way of defining mappings for O/R mappers by hand-writing entity classes and then hand-writing mapping files (either by using shortcuts like conventions or by a fluent api which allows you to setup the mappings rather quickly) to a database which might not exist yet. I find using that kind of system rather odd. The thing is that O/R mapping is about an abstract entity definition which is realized in both a class definition and a table/view definition, in such a way that there is a mapping definable between the two definitions (class and table) so instances of the abstract entity definition (the data!) can flow between instances of the two definitions: from a table row to an entity class instance and back or vice versa. The work needed to perform that flow of entity instances is done by an O/R mapper.

Starting with code in the form of entity classes is equally odd as starting with a table: they both require reverse engineering to the abstract entity definition to create the element 'on the other side': reverse engineer the class to the abstract entity definition to create a table and the mappings is equal to reverse engineering a table to a class and create the mappings. What's the core issue is that if you start with a class or a table, you start with the end result of a projection of an abstract entity definition: the class didn't fall out of the sky, it was created after one determined the domain had such a type: e.g. 'Customer', with a given set of fields: Id, CompanyName, an address etc.

What if that abstract entity definition which was used to create the class or table was in a model which contained all of the domain types for the domain used in the software to build? That would avoid a reverse engineering step to get 'the other side' (class if you start from a table, table if you start from a class), and at the same time it would give a couple of benefits: you can create overviews of the model and more importantly, changes in the domain can be applied directly into the model which then ripple through to classes and tables in the right form, without you needing to worry how that should be done correctly, and without reverse engineering steps which might ignore information which was present in the actual model.

I know the whole idea of 'code first' comes from the fact developers want to write code and think in code and want to persist objects to the database, but that's not what happens: you don't persist objects to a database, you persist their contents, which are the entity instances. It might very well be that an entity class instance (so an object) contains more data than the entity instance, so storing 'the object' then doesn't cover it. Serializing an object is a good metaphor here: with serialization, the object isn't serialized, but a subset of its data, to a form which might not match its source. When deserializing the data into e.g. javascript objects are we then still talking about the original .NET object? No of course not, it's about the data inside the object which lives on elsewhere.

Isn't it then rather odd that when serializing 'objects' to JSON, the overall consensus is that the data is serialized, but when the same object is serialized to a table row, it's actually persisted as a whole? If you are still convinced O/R mapping is about persisting objects, what happens with 'your object', persisted to a table row, if that object is read by a different application, which targets the same database, and which doesn't use an O/R mapper? That application, written in an entirely different language even, can perfectly fine read and consume the entity instance stored in the table row, without even knowing you considered it a persisted .NET object. Because, surprise, the contents of the table row isn't a persisted object, it's a persisted entity instance, an instance of an abstract entity definition, not an instance of a class definition.

But let's say you simply want to work with code only, you don't want to look at models other than when they wear swim gear. Code, being text, has some side effects which might not make it the best medium to define domain models in. Code is readable and changeable easily but to get an overview what's going on, a text editor isn't sufficient anymore, tooling is needed to get proper overview how the model looks like, what the associations are, which inheritance relationships are present. One can't do that by simply looking at the code, the code has to be interpreted or better: reverse engineered to a model, to be able provide the information you're looking for. With a small, 10 entity large model in the form of classes this might work, but if you have to work with a more real-life model with 50 or 100 or even more entities, it's not going to be easy at all.

In case you have a hard time grasping how little one can determine from code alone, try to determine how a database looks like when all you have is 50 tables in DDL SQL statements, complete with their unique constraints and foreign key constraints. It's not a surprise to many that years ago database developers already realized that without proper tooling working with larger relational schemas would be a big nightmare. Strange that code-first using developers don't have that problem at all. At least they don't admit it, otherwise they wouldn't be using code-first at all.

Though let's ignore that for a second. Let's say you as a code-first-I-persist-objects kind of developer have the overview of the model in your code by simply looking at a couple of classes. Are you then problem free? Not really. You see, code-first hides the other side of the picture, or better: a construct in code-first might have devastating results for the other side. E.g. some people have the urge to create a common base class / entity for all their entities in which they define fields like 'ModifiedBy', 'CreatedOn' etc. so all entities will have these fields, and they only have to define them once.

However, there's a problem: inheritance in memory with objects is cheap, it's expensive in a relational database, as with every query, joins will be added for all super/subtypes, if the inheritance is in the form of Target per entity. Is this visible in the code? Is this visible elsewhere? Likely not. Code-first systems often provide a way to use shortcuts: to define repetitive constructs once through conventions. This might hide the results done to the relational schema, as it's unclear how the tables might look like and how they'll result in queries at runtime, simply because the way code is constructed for in-memory use is used as the preferred structure for the relational model as well. Starting from tables of course also hides the other side from the picture, however choices in the DB don't have these kind of effects in code, most of the time.

A better choice is to define the abstract entity model first, use model first with this model to produce both sides, so changes determined in the domain will be reflected as changes in the abstract model and through that as changes to the classes and tables. As it's a model, it can be used as such, so it's easier to make changes, to get overviews from various perspectives and to make sure the changes made to the model are reflected correctly in the classes and tables. It's almost 2014, doing arcane work like in the early days of relational databases and DDL SQL scripts is not necessary anymore. Unless you fancy typing everything out, like a human code generator, you know the people who are replaceable with a tool.

23 Comments

I love you <3

I did push the POCO madness once upon a time with NPersist but I know better now :)

Roger Alsing - Saturday, December 14, 2013 11:31:07 AM

@Roger: hehe :) It's still a shame NPersist didn't take off like nhibernate did, it had some nice ideas.

FransBouma - Saturday, December 14, 2013 11:43:55 AM

It depends....

For any approach chosen a developer or architect should know what he/she is doing.

"Let's say you as a code-first-I-persist-objects kind of developer have the overview of the model in your code by simply looking at a couple of classes. Are you then problem free?"

Are you problem free using model-first? I don't think so.

A developer should know OO model AND relational model so that they are both defined well and performing well.

Even with NoSQL there can be a difference between model in DB and model in code.

BTW, for model-first you need a special tool and these are only coming to mainstream usage in recent years. DB guys had those years before, even if they were quite expensive.

Petar Repac - Saturday, December 14, 2013 12:47:24 PM

A separate application written in another language doesn't share the model with my c# EF app. It just shares the DB.

Adding a model in the middle complicates programs unnecessarily. If EF defines an intermediate data structure to map classes to tables, it's an implementation detail.

I'd rather use Find All References and other intellisense tools I already learned to manage non db code to understand a program than learn another set of tools. I'd rather just learn about DateTime and DateTimeOffset and the SQL Server time types than also add a model's types.

I agree with you that POCOs don't buy you much. I wouldn't mind having to extend an EF class. I declare all my foreign key properties and association classes.

Bruno Martinez - Saturday, December 14, 2013 5:23:57 PM

Could you illustrate your point a little more clearly?

Richard - Sunday, December 15, 2013 12:30:04 AM

I perfectly understand that point of view and from certain points it's really interesting but I really can not agree...

code-first was just a mandatory block to accept using entity framework for most developers which want to do clean code. See that as a mandatory abstraction layer, removing dependencies. Your domain lives in a certain application context and in the end it will be coded somewhere. Sql database is after all a small technical detail. it can be sql server today for certain elements of the context and be flat files for other parts and all these dependencies introduced by entity framework need to be cut.
You can think that it's much easier to remove pocos and manage all by design in a model oriented thing but it will not.

Rui - Sunday, December 15, 2013 2:42:40 PM

I think you vastly over complicate the issue. I don't think you totally appreciate the motivation for code first either. I use it not because I'm concerned about "staying in code," I use it because it's way easier to start unit testing and think through app logic without worrying about the persistence layer.

Jeff - Sunday, December 15, 2013 4:54:38 PM

The domain model grows organically through BDD or TDD, this can't happen with a model first mentality. That is the equivalent to big up front design. There's a reason we don't sit down and design the database for the whole system first any more... because we've realized we never got it right... so, we either ended up with the wrong database and had to wedge the solution into it, or we had to modify the database anyway.

The other problem with a model first approach is that you usually use a tool. The tool persists it's metadata in XML, or binary format that make sharing with a team very difficult. It also generally very difficult to merge changes from feature branches and/or accept patches.

No, I'd much rather use a tool that visualizes my domain model that is stored in plain text code files. It has worked very well for us, and I don't want to go back to designer based, metadata persisted modeling tools.

BOb

PilotBob - Sunday, December 15, 2013 7:20:27 PM

@PilotBob: model first also allows you to grow your db and classes along the way, there's no necessity to do everything up front, on the contrary: because both sides are from the same origin, making changes along the way is easier: you can change the root model and let the changes ripple through to classes and tables.

> The other problem with a model first approach is that you usually use a tool. The tool persists it's metadata in XML, or binary format that make sharing with a team very difficult. It also generally very difficult to merge changes from feature branches and/or accept patches.

Xml is mergable pretty easy though. But that aside, code first doesn't free you from any migration issues: changes made in code which have an effect on the DB, have to result in migration scripts for the db as well, otherwise one can't maintain a production database.

FransBouma - Sunday, December 15, 2013 7:31:01 PM

i agree with Jeff - i think you're overly simplifying the value in the code-first workflow. i don't think of the objects as "stored in the database". even if i did or wanted to, the tooling would never allow me to ignore the database entirely. i think in code and objects because that is the biggest value for my customer out of the gate and the most efficient way to vet out the macro requirements of the system that customer is asking for. it's my easier in my experience to think in the form of objects and then force my thought process to think in terms of the database on the occasions when that is necessary. and any time i've ever been presented w/ a visual "model" of something, it's only a matter of minutes before my brain craves the detail only code can provide.

i think this is a pretty inaccurate summation of the minds of developers.

Kelly Brownsberger - Monday, December 16, 2013 10:45:09 AM

@code first practitioners: it seems you are very good at understanding large(r) piles of code and form an idea of that in your mind you can reason over, as that's what's necessary if you want to get understanding and overview of, say, 30, 40 or more entity classes and their associations, over which fields these associations are defined etc. etc.

Sorry, but I don't buy that for a second. I can understand code first feels more 'at home' as code is what you write all day, however it falls down with the idea that code is what the starting point is of what you create, it's not, it's the end result. The end result of a process, which will produce a different result when the input of that process is changed, e.g. due to a changed understanding of the domain or changes in the domain itself, or changed requirements or other changes.

The thing with code then is that YOU, the human, will make the changes in the code based on a raw understanding of what the code represents at that moment. Not based on what's in the editor. This will only go wrong if the human is inconsistent with the rules of projecting a model (then in memory of the human) to code and applying the rules of projecting that model will always go without errors.

Two things humans aren't very good at. A third is understanding code when reading it as that requires interpretation of it and then building a representation of the interpretation of it in your mind. Sad thing is though that that representation is inside someones head and can differ from person to person.

FransBouma - Monday, December 16, 2013 11:18:35 AM

Step 1. Use an ORM
Step 2. realize you do not have a relational model in your application
Step 3. Use a database that correctly handles non-relational models and ditch the ORM.

Chris Marisic - Monday, December 16, 2013 8:20:38 PM

@Chris

Nice attempt to be funny, I give you that :) Though, modern ORM systems can take care of step 3 for you without you lifting a finger.

FransBouma - Monday, December 16, 2013 8:47:22 PM

This argument appears to be predicated on the belief that the most efficient way to visualize and manipulate an entity model is with a design canvas, and the most efficient way to understand an entity model is with a picture of boxes and lines, and the most efficient way to store an entity model is with a proprietary file format. There are various ancillary and supporting arguments, but that's the central one here.

This viewpoint is to be expected, given how the author pays his bills, but I respectfully disagree.

Look, no matter how the entity model is stored, whether it's in C# code, SQL DDL, or some proprietary file format, there's going to be friction between different representations of it for different layers/systems. And none of these descriptions is going to be 100% accurate; they'll all be approximations of the reality of the business domain, even the fancy wireframe.

And given the choice between depending on a per-seat-licensed tool that stores a model in a proprietary file format and simply using such a well-understood, universally open format such as C#, I feel like the choice is pretty clear.

But tooling! You say. But individual interpretation! But getting an overall view! How can we understand this complex system without a big poster to look at!

And I don't buy THAT for a second. Any developer who, after a couple of months of working on a system, doesn't have the entire domain model in her head to mentally traverse at will is not worth her salary. Maybe the visual model might have helped her in the early days, but she doesn't need it and will never refer to it again after that.

And in the mean time she is going to curse whoever made the decision that she has to go to a visual tool with a design canvas and property windows just to add a boolean to a business class, every time it happens.

Ron - Monday, December 16, 2013 9:39:45 PM

@Ron

ah, the good old "he's biased, his posts are just to get more sales going!" argument... how professional of you, 'Ron without a surname, so I'm actually rather anonymous'. Perhaps I *know* a few things about the topic because I spent every workday FULL TIME on this subject for the past 12 years. Have you? I bet not.

Please show me where I said the alternative is a visual model where everything is drawn up with boxes and lines... If that's the only way you can envision a 'model', then indeed I can understand why the alternative for you is code. But that's a rather limited view of the world. A model isn't a visual poster which is hard to grasp. A model is formulated in a form so projections following rules (yes, you do these now by hand from a model in your head using rules you haven't formulated explicitly so they differ per day and per person) can be made to forms which are useable, like projections to code, to tables and other spaces, oh, like your visual picture, if you fancy visual boxes on a canvas.

If I may, as I'm biased to bits according to 'Ron-without-a-surname' anyway, the work I spent the last 12 years of my life on, doesn't even come with a general canvas, it comes with viewers on a model, and projections in any form you'd like. I didn't add a canvas as that's not a good way to define a model. It's 'a' way to project 'a' model to so you could get 'a' view of it in 'a' context, but not 'the' way, as a model isn't a picture. The picture is just a projection as a class created from the model is a projection.

And 'Ron-without-a-surname', you really think humans can keep large code bases in in their heads over time, and otherwise they're not worth their salaries. Interesting. A lot of bugs in code are directly related to the inability of humans to get a _correct_ model in their heads of what the code _really_ does and represents in _all_ cases. After all, why would one need tests, if a human could simply read the code like a book and know whether it's correct and does what it should do in all cases.

And added to that, a human who makes changes to that model does that consequently the same way every time, every day, now and in the future the code is maintained (which is often years) with the same decision making process as all the other humans working on the same code base. Right?

I don't think so, 'Ron'.

FransBouma - Monday, December 16, 2013 10:28:22 PM

Frans, please give a concrete example of an entity model capturing a requirement better than code classes. I do believe that code first definitions aren't standard classes. You do need to keep in mind that properties will become columns. At the same time, normal rules, such as keeping coupling low can be ignored.

Bruno Martinez - Tuesday, December 17, 2013 2:14:41 AM

@Bruno

A 1:1 B

Is that: a) pk - pk or b) pk - fk/uc ? It is significant to realize that.

A m:n B

is the relationship objectified or not? (Objectified means the intermediate entity is visible, and has non-pk fields, something which EF doesn't support btw)

But that aside: classes in code first are often not in an anemic domain model, meaning they have more properties than there are columns, or don't have properties at all (the private member variables, or at least some of them, are mapped). They might use a base class which isn't mapped, though provides logic for the entity class. I.o.w.: it quickly becomes less clear what exactly is influencing 'the other side': one has to look at the code _and_ the mapping files and combine them with the knowledge of how the ORM will forward map them to the DB.

That is, if one cares about how the DB looks like. But then I say: why use an ORM with a relational database at all?

FransBouma - Tuesday, December 17, 2013 8:02:09 AM

@FransBouma,

You have a horse in this race, and of course that will bias your opinions in this discussion, and it would be dishonest for you to pretend otherwise. I'm not saying that this invalidates your opinions, but it also needs to be taken into account when weighing them.

The core question in this conversation is: what's the best format to describe one's model in? From which all other model projections should flow? Reasonable people can clearly disagree about this. My opinion, for what it's worth (and no, I haven't been in the ORM space for the last 12 years, but I've been using ORMs for longer) is that the best, most flexible, most future-proof, most understandable, most easily edited and manipulated, and most portable format for doing this is plain old objects.

Everything we have talked about--for example, visual modeling or creation of DDL--there are tools for doing that from code. It's trivial to edit and manipulate, it's trivial to merge/branch/diff/collaborate on. It's a lingua franca that everyone who works in a problem space has an intimate, basic understanding of. I don't have to teach someone I hire in, for example, the C# space, how to interpret a code-based data model or how to use the tools that manipulates it. They understand it automatically. The same cannot be said for some third party modeling tool. It's just a great, low ceremony, convention-based approach that starts minimal and can be expanded out to great complexity and subtlety if necessary.

You can of course achieve success with other approaches, and there's a right tool for every job, but my opinion is that the default starting place should always be with code.

Ron - Tuesday, December 17, 2013 5:47:40 PM

A reply I wrote on InfoQ's post about this article:

"I argued in my article that the model you should start with is the model that's actually also the source of the model used to write the classes. However instead of keeping that in the memory of the developers, one should actively model that abstract model and use that model to create the classes and tables as those are actually derivatives (projections) from that abstract model.

Example: Customer, Order, OrderLine, Product. I can define these abstract entities with their fields, their identifying fields, their relationships and not write a single line of code nor table definition. Then I can use that model to create the classes using rules defined for that. Examples of these rules are the ones defined by Halpin and Nijssen in their NIAM rules for translating an abstract entity model in NIAM to table definitions.

I can also define rules like that (e.g. use the same or slightly altered ones) to translate the model to classes. The advantage is that I now have a proper, verifiable model which is the theoretical base for both sides. One of the main advantages of this is that I can make changes in one place, which are again verifiable, and let these changes ripple through to both sides, following the rules defined for these projections.

That's not where it stops though. I can create projections of that model to other models (which thus are actually defined by rules again, so I can get the changes made to the core model applied to my sub models without effort doing myself) and create code from these models as well. As everything is related to each other and originates from the core abstract entity model, I have a single place where I have to model the domain, in such a way that it isn't polluted with code constructs, language limitations or other code related aspects, it's a pure model of the domain.

Starting from code doesn't have that. To be able to do anything with the code, the orm has to reverse engineer the code to that abstract entity model first, however there's a difference: you can't reach that source of which the code model actually originates from, so you're doomed to make changes in the result of a projection, not the source of it. Like you change your C# code by altering the compiled form through altering IL, instead of changing the C#."

FransBouma - Thursday, December 19, 2013 9:55:39 AM

Every time I used some kind of designer that writes/generates code for me in the background, I had problems with it. Specially ORMs, I don't even consider using ORM through designer, not any more. For EF, I didnt use designer from version 4.1. Conventions and POCO, everything else sucks. Also, it's not worth fighting domain model vs relational model, you can't bend ORM that much without having major issues somewhere else. For really complex domains, maybe it's just better to use event sourcing or document store. Little bit harder to setup things upfront, there's no much guidance and patterns out here, but later on much easier to work with.

Hrvoje - Thursday, December 19, 2013 11:05:33 PM

Code first is an approach for those who take no dependency on where the parking place of a given data. If you want to code in such way that your data parking store can be anywhere with any underlying database technology, you will like code first approach.

Most developer do not use ORM in and with this distinction, and they only think it is a mapper for their data table in their database.

You are missing the big picture of a good ORM, and you are thinking too small.

Tony H - Monday, December 23, 2013 6:54:26 PM

That the / an abstract model of the business domain (with personal variations) is ALREADY in the minds of the members of the business community who are future users of the software being developed is too many times lost on the code-first / TDD developers.
Matching the eventually resulting code to the already-existing business model via "software that works" when it eventually passes the (developer-written) tests IS a high-friction procedure FOR THE USER COMMUNITY, no matter how frictionless it might be for developers.
It is not code that makes money for that business, it is the ideas behind the code that runs the business.
Ideas, when they have to be materialized outside of human brains, live more comfortably in models that can be easily verbalized - because human (not computer) language is what (most) humans use to understand each other.
This is where (I believe) Frans comes from, with the references to NIAM.
Oh well, all this never caught on well with coders...

OTOH, it is right to acknowledge that all/most of the model-first tools have been too clunky and too inflexible in the end for the individual developer and for development teams. Model-driven development is not yet productive enough.

Best to all!

Gabriel Tanase - Friday, December 27, 2013 4:07:46 PM

Oh those lovely days! When men were really men and built their own OS and everybody could read hexadecimal...
Proyects of 25 people full of juniors. A single domain defined by two seniors... I prefer a tool doing the plumbing and have my people coding the bells & whistles (I mean, the business logic). Give me tools just working fine. We DO have problems updating the domain, but we have the ORM taking care of the DB and just take care of the front-end and logic part (that we must code anyway).
A REALLY happy Llblgen customer.

Raist - Sunday, January 5, 2014 11:28:50 PM

Comments have been disabled for this content.