Entity: why do some people who write IT books re-invent definitions?
Paul Gielens
blogged about a possible misunderstanding about the term 'Entity'.
Reading his text it appears as if the general term
'Entity' has changed recently. The reason:
Eric Evans created a
different definition. Let me be blunt here: if a definition
of a general group of knowledge is known for years by a
given term, do not use that term to extend that definition
so it will cause misinterpretations between people thinking
they are talking about the same definition. The term here is
'Entity', and it is defined for a long time, firstly by
Peter Chen if I'm correctly informed, in his article 'The Entity-Relationship Model', ACM Transactions on
Database Systems vol.1 nr.1 (March 1976), and his book
'The Entity-Relationship Approach to Logical Database
Design' - Wellesley, Mass.: Q.E.D. Information Sciences, 1977.
Chen's work is about a model to design
databases, the Entity-Relationship model, in short the E/R
model. As you can see, this model is rather old, more than
25 years, and is replaced later by the work of prof. G.M.
Nijssen and prof. T.A. Halpin (Conceptional Schema and
Relational Database Design, 1989) by their work on the NIAM
modelling methodology, later renamed to ORM and extended by
prof. T.A. Halpin. (read more about ORM
here)
The cornerstone of Chen's work and also of the work of Halpin and Nijssen, is the 'Object type' or 'Entity'. Edward Yourdon describes in his work Modern Structured Analysis, Prentice Hall 1989 the usage of Entity Relationship diagrams. While, as prof. Halpin describes in his overview on ORM, E/R models/diagrams lack information about context and constraints, the concept of Entity is not changed. Yourdon describes the concept of Entity as follows:
An entity in an E/R model has three properties:
- Each representation of an entity can uniquely be identified
- Each representation of an entity is playing an important role in the system it lives in. (it has to have a reason to be there)
- Each representation of an entity can be described by one or more attributes (data-elements, like name, age, quantity)
There is no mentioning of logic in this definition, nor in
f.e. the context of ORM/NIAM. The latter is logical since an
ORM or NIAM model can be transformed to an E/R model, as
Visio for Visual Studio.NET Enterprise Architects shows (for
the people who are not familiar with NIAM or ORM).
Using this generic definition has great
benefits. In a team, which has to implement a given set of
functionality, team members can rely on the fact that when
the database-oriented team members are talking about
entities in their ORM/NIAM models, they talk about the same
thing as when a middle-tier programmer of the team is
talking about an entity. The definition is clear, well-known
and usable by a large group of developers and software
architects, even project leads.
Paul writes: In the OOP world entities are considered an abstract continuity through a lifecycle and even multiple forms. Objects not primarily defined by their attributes, but a thread of identity that runs through time and often across distinct representations. Entities have special model and design considerations. They can radically change their form and content, while the thread of continuity must be maintained. For entities to establish their relation with other object they encapsulate operations. On the other hand object with no conceptual identity, objects describing some characteristics of a thing are value objects. Within the context of value objects, instantiation isn’t a big deal (except for distributed systems, fine grained objects drown performance).
Now, the concept of 'Entity' described by Paul here
(refering to
Eric Evans) is not the
same as the entity definition known for over 25 years. This
leads to serious misinterpretations among software
developers and software architects which will run the risk
to talk about different things while they do not realise
that. I really do not see why Fowler re-defined the
definition of 'Entity' in such a way that his definition is
not the same as the old definition used by Chen and others
and therefor is not applicable to the older E/R model
concept, which is still used by Visio 2003 to generate a
database DDL script from an ORM model.
It's also
unnecessary, since the concept of 'entity' as it is defined
by Chen and others is perfectly usable in an OO world: you
add logic to the entity, the logic isn't
part of the entity, since the logic isn't
saved in the persistent storage, however the
entity is. To see this, consider a database which
is used by multiple applications: an internal accounting
system, a webapplication and f.e. a webservice exposed on an
extranet. The entities are always the same in all
the applications, they use the definitions of the persistent
storage. The reason for this is that the database is
designed by an E/R model or ORM model, and therefor
is the physical representation of an abstract model which
defines the entities and the relations between these
entities.
Software architects designing
the applications working with that database model can design
the software using the same entities and know the
entities they use are correct and available in the
persistent storage. At any given time, a developer can grab
the abstract database model and can see how the entities
relate to each other, what their context is, which
constraints are applied, etc. It's this power that
makes the concept of the entity that is applicable both in
DDL and f.e. in an OO model a valuable tool to write and
design good, solid, working software. A definition which is
only valid in the OO world can't, simply because it's not
applicable to the persistent storage and there is no
abstract model defined like an ORM model.
It would be nice if writers of IT books wouldn't re-invent
definitions known by millions of developers and software
architects worldwide for many years, there is already enough
confusion in our industry, we do not need more.
(updated:
I interpreted Paul's text as a reference to Fowler, which is
a reference to Evans.)