Entity: why do some people who write IT books re-invent definitions?

Saturday, May 31, 2003

Database - SQL Server Software Engineering

Paul Gielens blogged about a possible misunderstanding about the term 'Entity'. Reading his text it appears as if the general term 'Entity' has changed recently. The reason: Eric Evans created a different definition. Let me be blunt here: if a definition of a general group of knowledge is known for years by a given term, do not use that term to extend that definition so it will cause misinterpretations between people thinking they are talking about the same definition. The term here is 'Entity', and it is defined for a long time, firstly by Peter Chen if I'm correctly informed, in his article 'The Entity-Relationship Model', ACM Transactions on Database Systems vol.1 nr.1 (March 1976), and his book 'The Entity-Relationship Approach to Logical Database Design' - Wellesley, Mass.: Q.E.D. Information Sciences, 1977.

Chen's work is about a model to design databases, the Entity-Relationship model, in short the E/R model. As you can see, this model is rather old, more than 25 years, and is replaced later by the work of prof. G.M. Nijssen and prof. T.A. Halpin (Conceptional Schema and Relational Database Design, 1989) by their work on the NIAM modelling methodology, later renamed to ORM and extended by prof. T.A. Halpin. (read more about ORM here)

The cornerstone of Chen's work and also of the work of Halpin and Nijssen, is the 'Object type' or 'Entity'. Edward Yourdon describes in his work Modern Structured Analysis, Prentice Hall 1989 the usage of Entity Relationship diagrams. While, as prof. Halpin describes in his overview on ORM, E/R models/diagrams lack information about context and constraints, the concept of Entity is not changed. Yourdon describes the concept of Entity as follows:

An entity in an E/R model has three properties:

Each representation of an entity can uniquely be identified
Each representation of an entity is playing an important role in the system it lives in. (it has to have a reason to be there)
Each representation of an entity can be described by one or more attributes (data-elements, like name, age, quantity)

There is no mentioning of logic in this definition, nor in f.e. the context of ORM/NIAM. The latter is logical since an ORM or NIAM model can be transformed to an E/R model, as Visio for Visual Studio.NET Enterprise Architects shows (for the people who are not familiar with NIAM or ORM).

Using this generic definition has great benefits. In a team, which has to implement a given set of functionality, team members can rely on the fact that when the database-oriented team members are talking about entities in their ORM/NIAM models, they talk about the same thing as when a middle-tier programmer of the team is talking about an entity. The definition is clear, well-known and usable by a large group of developers and software architects, even project leads.

Paul writes: In the OOP world entities are considered an abstract continuity through a lifecycle and even multiple forms. Objects not primarily defined by their attributes, but a thread of identity that runs through time and often across distinct representations. Entities have special model and design considerations. They can radically change their form and content, while the thread of continuity must be maintained. For entities to establish their relation with other object they encapsulate operations. On the other hand object with no conceptual identity, objects describing some characteristics of a thing are value objects. Within the context of value objects, instantiation isn’t a big deal (except for distributed systems, fine grained objects drown performance).

Now, the concept of 'Entity' described by Paul here (refering to Eric Evans) is not the same as the entity definition known for over 25 years. This leads to serious misinterpretations among software developers and software architects which will run the risk to talk about different things while they do not realise that. I really do not see why Fowler re-defined the definition of 'Entity' in such a way that his definition is not the same as the old definition used by Chen and others and therefor is not applicable to the older E/R model concept, which is still used by Visio 2003 to generate a database DDL script from an ORM model.

It's also unnecessary, since the concept of 'entity' as it is defined by Chen and others is perfectly usable in an OO world: you add logic to the entity, the logic isn't part of the entity, since the logic isn't saved in the persistent storage, however the entity is. To see this, consider a database which is used by multiple applications: an internal accounting system, a webapplication and f.e. a webservice exposed on an extranet. The entities are always the same in all the applications, they use the definitions of the persistent storage. The reason for this is that the database is designed by an E/R model or ORM model, and therefor is the physical representation of an abstract model which defines the entities and the relations between these entities.

Software architects designing the applications working with that database model can design the software using the same entities and know the entities they use are correct and available in the persistent storage. At any given time, a developer can grab the abstract database model and can see how the entities relate to each other, what their context is, which constraints are applied, etc. It's this power that makes the concept of the entity that is applicable both in DDL and f.e. in an OO model a valuable tool to write and design good, solid, working software. A definition which is only valid in the OO world can't, simply because it's not applicable to the persistent storage and there is no abstract model defined like an ORM model.

It would be nice if writers of IT books wouldn't re-invent definitions known by millions of developers and software architects worldwide for many years, there is already enough confusion in our industry, we do not need more.

(updated: I interpreted Paul's text as a reference to Fowler, which is a reference to Evans.)

I don't see why people have to be focussed on a difference, of course they are different: one is defined in the persistent storage as an entity and the other one isn't. Value objects are views, collections of attributes which do not have a semantic meaning, entities do. The definition is over 25 years old, but in the definition you quoted, an entity embeds logic: they encapsulate operations. Which is wrong, the logic is embedded in code outside the entity, the entity using code.

Frans Bouma - Saturday, May 31, 2003 6:15:00 AM

I have never seen the term "entity" used in object-oriented programming and design; I don't think it is a very popular term. The normal concept of entity isn't usefuul in OO programming/design as it is replaced with the very similar notion of Class. The big difference is that while representations of an entity *may* have identity (the definition isn't very clear on this), the representations of classes (objects) *must* have identity. The closest thing to an entity in the OO world is either a class, and it can be argued that they are the same thing depending on how you read Chen, or what UML calls has a data class.

If you regard Chen's entity in its original content, namely as an innovation over the normal relational model which is quite agnostic on how to express relationships, I do believe it makes sense to say that an entity is something of which the representations have and preserve identity, although Chen didn't use this in his definition.

Reinier Post - Wednesday, July 14, 2004 2:19:00 PM

2 Comments