Take three PhD specialists in programming languages, three PhD specialists in relational database, and three PhD specialists in network security, and seat them around a table. Now take three coconut-cream pies, three chocolate-cream pies, and three banana-cream pies and randomly place one pie in front of each philosopher. Finally, give them three hours to come up with a position paper on the concept of "Identity." Calculate how many philosophers at the end of three hours have three kinds of cream-pie on their faces, how many have two kinds, and how many have just one kind.
Ok, I thought it was funny, but then, I've been involved in a lot of design and standards committees. The point is, no single concept of "Identity" works for all three. In fact, no concept of Identity works for any pair of the three, and it's only barely possible to get a concept that works for any one of the three. Let's illustrate this just for programming languages and relational databases.
Consider the following data, stated in English:
Venice is in Italy.
Rome is in Italy.
Nogales is in Mexico.
Nogales is in the USA.
Hey, wait a minute. One city in two countries? Sure. Happens all the time. Trieste straddles Slovenia and Italy. How about Bratislava? It's in three countries: Slovakia, Hungary, and Austria. Sault Ste. Marie is in both Canada and the USA. And those are just the physically connected examples. The USA has at least one Rome – in New York – and at least one Venice – in California.
Ok, the point is that our Mondial database is going to have to deal with the fact that the Country-City relationship is really many-to-many, not one-to-many, and that means it's going to have to deal with Identity in a deeper way than we sketched out prior to this.
How would our three database specialists represent the data above? Experienced practitioners will normalize the data, that is, remove redundancies, resulting in three tables: one for the countries, one for the cities, and one, called a join table, middle table, or intersection table, for the relationship. The primary key for the Country table will be just the name of the country. The primary key for the City table will be just the name of the city. The middle table will contain pairs of foreign keys. One member of each pair will be a city name, and the other member of each pair will be a country name. The primary key of the middle table will be the pair of foreign keys. In pictures:
| Table Labels: | | CountryTable | | CityTable | | CountryCityTable |
| Column Labels: | | Country | | City | | Country | City |
| Data Rows: | | Italy | | Venice | | Italy | Venice |
| | | Mexico | | Rome | | Italy | Rome |
| | | USA | | Nogales | | Mexico | Nogales |
| | | | | | | USA | Nogales |
So we see that the only way to refer to a country or a city in the middle table is by a copy of its primary key. Therefore, in this normalized database,
Two rows are identical if and only if their primary keys have the same value.
Many database practitioners use underlining on the labels of columns containing primary keys; we'll use bold. Since every column contains a component of a primary key, all column labels are bold.
There is also an obvious non-normalized view form of the same data. Non-normalized basically means that this form contains mathematically redundant information, and this can cause ambiguity problems in the identity department. Typically, such a form would be temporary, not the permanent form, for constructing reports, building queries, analyzing by-products, and what not. It would look something like this:
| Table Labels: | | CountryFromCityTable | | CityFromCountryTable |
| Column Labels: | | City | Country | | Country | City |
| Data Rows: | | Venice | Italy | | Italy | Venice |
| | | Rome | Italy | | Italy | Rome |
| | | Nogales | Mexico | | Mexico | Nogales |
| | | Nogales | USA | | USA | Nogales |
What's the natural way to express this scenario in Visual Basic? Consider our running "Mondial" example again, but stripped way, way down:
Public Class Country
Public Name As String
Public MyCities As List(Of City)
End Class
Public Class City
Public Name As String
Public MyCountries As List(Of Country)
End Class
...
Sub InitData()
Dim Italy = New Country {Name := "Italy"}
Dim USA = New Country {Name := "USA"}
Dim Mexico = New Country {Name := "Mexico"}
Dim Venice = New City {Name := "Venice"}
Dim Rome = New City {Name := "Rome"}
Dim Nogales = New City {Name := "Nogales"}
Italy.MyCities = New List(Of City)
Mexico.MyCities = New List(Of City)
USA.MyCities = New List(Of City)
Mexico.MyCities.Add(Nogales)
Italy.MyCities.Add(Venice)
Italy.MyCities.Add(Rome)
USA.MyCities.Add(Nogales)
Venice.MyCountries = New List(Of Country)
Rome.MyCountries = New List(Of Country)
Nogales.MyCountries = New List(Of Country)
Venice.MyCountries.Add(Italy)
Rome.MyCountries.Add(Italy)
Nogales.MyCountries.Add(Mexico)
Nogales.MyCountries.Add(USA)
Countries.Add(Italy)
Countries.Add(USA)
Countries.Add(Mexico)
Cities.Add(Venice)
Cities.Add(Rome)
Cities.Add(Nogales)
End Sub
That all just represents a graph of objects in memory, which we can sketch as follows:
_.------------------------.
,-'' `v
+-------+-/-+---+ +---+--------+
| Italy | o | o |<-------------------= o | Venice |
+-------+---+--\+ +---+--------+
^ '-.
`--. `--.
`-------. `-----------.
`-------. `---.
`------. `.
`-. `v
+-------+---+ +\--+------+
| USA | o-=--------. | o | Rome |
+-------+---+ `--. +---+------+
^. `-----.
`----------. `---.
`-----. `.
`--. `--.
,------------. `--. `-.
v `-. `--. v
+--------+---+ `-. +---+-\-+---------+
| Mexico | o | `-= o | o | Nogales |
+--------+-\-+ +---+---+---------+
`-. ^
`----. /
`----------------------'
The point is that with objects in memory, the natural representation of the data is through references, and we may conclude that
Two objects are identical if and only if their references are identical.
So, indeed, since the USA object has a reference to Nogales that is identical to the reference to Nogales contained in the Mexico object, these two Nogales's are (drum roll) identical. Likewise, Venice has a reference to Italy, and Rome has a reference to the same, identical, Italy object.
Now, we might duplicate exactly this situation in a relational database, but we would have to introduce additional data for references, along the following lines:
| CountryTable | | CityTable | | CoutryCityTable |
| Country | CnyPK | | City | CtyPK | | CnyFK | CtyFK |
| Italy | 37 | | Venice | 26 | | 37 | 26 |
| Mexico | 162 | | Rome | 8 | | 37 | 8 |
| USA | 49 | | Nogales | 33 | | 49 | 33 |
| | | | | | | 162 | 33 |
In fact, this is the kind of automatic solution that object-oriented databases would typically compute on your behalf to persist an object graph for you. It just substitutes unique numbers for object references, a process called swizzling. But it still MUST USE a middle table, because there is no good way to represent a variable-length list of references in a column of a table. Most solutions for nested tables, in fact, just un-nest the nested information into additional tables, just as above. So, even though this appears to be a simulation of the object-oriented graph, it doesn't take much to see that it's a totally uninteresting transformation of the normalized relational database form: it just added numerical keys, and it still uses value-equality as the definition for identity.
The point I'm trying to make, here, is that object-oriented programs and relational databases have totally incompatible notions of identity, and that the incompatibilty opens up all kinds of downstream complexities. The various schemes for resolving the incompatibilities are mapping solutions. The world of mapping is very, very complicated, because mapping solutions typically try to resolve not just the mis-match in the notions of identity, but to provide all kinds of operational services, incorporating, for instance, updatable non-normalized views, and support for badly designed legacy databases, which are far too valuable to throw away but far too expensive to normalize.
Sorry, I don't have a bolt of lightning that, with blinding obviousness and clarity, puts this entire topic into order. I think it's best just to stock up on towels and view it as a never-ending pie fight we just have to deal with. That way, we'll interpret it as funny instead of troubling.