The Pie-Fighting Philosophers Problem

Take three PhD specialists in programming languages, three PhD specialists in relational database, and three PhD specialists in network security, and seat them around a table. Now take three coconut-cream pies, three chocolate-cream pies, and three banana-cream pies and randomly place one pie in front of each philosopher. Finally, give them three hours to come up with a position paper on the concept of "Identity." Calculate how many philosophers at the end of three hours have three kinds of cream-pie on their faces, how many have two kinds, and how many have just one kind.

 

Ok, I thought it was funny, but then, I've been involved in a lot of design and standards committees. The point is, no single concept of "Identity" works for all three. In fact, no concept of Identity works for any pair of the three, and it's only barely possible to get a concept that works for any one of the three. Let's illustrate this just for programming languages and relational databases.

 

Consider the following data, stated in English:

 

Venice is in Italy.

Rome is in Italy.

Nogales is in Mexico.

Nogales is in the USA.

 

Hey, wait a minute. One city in two countries? Sure. Happens all the time. Trieste straddles Slovenia and Italy. How about Bratislava? It's in three countries: Slovakia, Hungary, and Austria. Sault Ste. Marie is in both Canada and the USA. And those are just the physically connected examples. The USA has at least one Rome – in New York – and at least one Venice – in California.

 

Ok, the point is that our Mondial database is going to have to deal with the fact that the Country-City relationship is really many-to-many, not one-to-many, and that means it's going to have to deal with Identity in a deeper way than we sketched out prior to this.

 

How would our three database specialists represent the data above? Experienced practitioners will normalize the data, that is, remove redundancies, resulting in three tables: one for the countries, one for the cities, and one, called a join table, middle table, or intersection table, for the relationship. The primary key for the Country table will be just the name of the country. The primary key for the City table will be just the name of the city. The middle table will contain pairs of foreign keys. One member of each pair will be a city name, and the other member of each pair will be a country name. The primary key of the middle table will be the pair of foreign keys. In pictures:

 

Table Labels:

 

CountryTable

 

CityTable

 

CountryCityTable

Column Labels:

 

Country

 

City

 

Country

City

Data Rows:

 

Italy

 

Venice

 

Italy

Venice

 

 

Mexico

 

Rome

 

Italy

Rome

 

 

USA

 

Nogales

 

Mexico

Nogales

 

 

 

 

 

 

USA

Nogales

                               

So we see that the only way to refer to a country or a city in the middle table is by a copy of its primary key. Therefore, in this normalized database,

 

Two rows are identical if and only if their primary keys have the same value.

 

Many database practitioners use underlining on the labels of columns containing primary keys; we'll use bold. Since every column contains a component of a primary key, all column labels are bold.

 

There is also an obvious non-normalized view form of the same data. Non-normalized basically means that this form contains mathematically redundant information, and this can cause ambiguity problems in the identity department. Typically, such a form would be temporary, not the permanent form, for constructing reports, building queries, analyzing by-products, and what not. It would look something like this:

 

Table Labels:

 

CountryFromCityTable

 

CityFromCountryTable

Column Labels:

 

City

Country

 

Country

City

Data Rows:

 

Venice

Italy

 

Italy

Venice

 

 

Rome

Italy

 

Italy

Rome

 

 

Nogales

Mexico

 

Mexico

Nogales

 

 

Nogales

USA

 

USA

Nogales

 

What's the natural way to express this scenario in Visual Basic? Consider our running "Mondial" example again, but stripped way, way down:

 

Public Class Country

    Public Name As String

    Public MyCities As List(Of City)

End Class

 

Public Class City

    Public Name As String

    Public MyCountries As List(Of Country)

End Class

...

    Sub InitData()

        Dim Italy = New Country {Name := "Italy"}

        Dim USA = New Country {Name := "USA"}

        Dim Mexico = New Country {Name := "Mexico"}

 

        Dim Venice = New City {Name := "Venice"}

        Dim Rome = New City {Name := "Rome"}

        Dim Nogales = New City {Name := "Nogales"}

 

        Italy.MyCities = New List(Of City)

        Mexico.MyCities = New List(Of City)

        USA.MyCities = New List(Of City)

 

        Mexico.MyCities.Add(Nogales)

        Italy.MyCities.Add(Venice)

        Italy.MyCities.Add(Rome)

        USA.MyCities.Add(Nogales)

 

        Venice.MyCountries = New List(Of Country)

        Rome.MyCountries = New List(Of Country)

        Nogales.MyCountries = New List(Of Country)

 

        Venice.MyCountries.Add(Italy)

        Rome.MyCountries.Add(Italy)

        Nogales.MyCountries.Add(Mexico)

        Nogales.MyCountries.Add(USA)

 

        Countries.Add(Italy)

        Countries.Add(USA)

        Countries.Add(Mexico)

 

        Cities.Add(Venice)

        Cities.Add(Rome)

        Cities.Add(Nogales)

    End Sub

 

That all just represents a graph of objects in memory, which we can sketch as follows:

 

                   _.------------------------.

               ,-''                           `v

    +-------+-/-+---+                    +---+--------+

    | Italy | o | o |<-------------------= o | Venice |

    +-------+---+--\+                    +---+--------+

         ^           '-.

          `--.          `--.

              `-------.     `-----------.

                       `-------.         `---.

                                `------.      `.

                                        `-.     `v

    +-------+---+                         +\--+------+

    |  USA  | o-=--------.                | o | Rome |

    +-------+---+         `--.            +---+------+

          ^.                  `-----.

            `----------.             `---.

                        `-----.           `.

                               `--.         `--.

              ,------------.      `--.         `-.

             v               `-.       `--.        v

    +--------+---+              `-.  +---+-\-+---------+

    | Mexico | o |                 `-= o | o | Nogales |

    +--------+-\-+                   +---+---+---------+

                `-.                               ^

                   `----.                        /

                         `----------------------'

 

The point is that with objects in memory, the natural representation of the data is through references, and we may conclude that

 

Two objects are identical if and only if their references are identical.

 

So, indeed, since the USA object has a reference to Nogales that is identical to the reference to Nogales contained in the Mexico object, these two Nogales's are (drum roll) identical. Likewise, Venice has a reference to Italy, and Rome has a reference to the same, identical, Italy object.

 

Now, we might duplicate exactly this situation in a relational database, but we would have to introduce additional data for references, along the following lines:

 

CountryTable

 

CityTable

 

CoutryCityTable

Country

CnyPK

 

City

CtyPK

 

CnyFK

CtyFK

Italy

37

 

Venice

26

 

37

26

Mexico

162

 

Rome

8

 

37

8

USA

49

 

Nogales

33

 

49

33

 

 

 

 

 

 

162

33

 

In fact, this is the kind of automatic solution that object-oriented databases would typically compute on your behalf to persist an object graph for you. It just substitutes unique numbers for object references, a process called swizzling. But it still MUST USE a middle table, because there is no good way to represent a variable-length list of references in a column of a table. Most solutions for nested tables, in fact, just un-nest the nested information into additional tables, just as above. So, even though this appears to be a simulation of the object-oriented graph, it doesn't take much to see that it's a totally uninteresting transformation of the normalized relational database form: it just added numerical keys, and it still uses value-equality as the definition for identity.

 

The point I'm trying to make, here, is that object-oriented programs and relational databases have totally incompatible notions of identity, and that the incompatibilty opens up all kinds of downstream complexities. The various schemes for resolving the incompatibilities are mapping solutions. The world of mapping is very, very complicated, because mapping solutions typically try to resolve not just the mis-match in the notions of identity, but to provide all kinds of operational services, incorporating, for instance, updatable non-normalized views, and support for badly designed legacy databases, which are far too valuable to throw away but far too expensive to normalize.

 

Sorry, I don't have a bolt of lightning that, with blinding obviousness and clarity, puts this entire topic into order. I think it's best just to stock up on towels and view it as a never-ending pie fight we just have to deal with. That way, we'll interpret it as funny instead of troubling. 

 

 

Published Sunday, March 19, 2006 9:58 AM by brianbec

Comments

# re: The Pie-Fighting Philosophers Problem@ Monday, March 20, 2006 11:00 AM

We have three people with extensive educations in three domains and a problem that illustrates some of the fundament impedance between these domains.

Even if they each have the potential to solve the whole problem, they are most probably going to produce a state of the art description of the state of the art of the problem rather than an actual solution to the actual problem.

The truth is not determined by committee
(they can occasionly find it just like anyone else)

AndrewSeven

# re: The Pie-Fighting Philosophers Problem@ Monday, March 20, 2006 7:11 PM

There are 4 ways to model the example above:

1) Join table based on values
2) Join table based on references
3) Nested table based on values
4) Nested table based on references

The impedance mismatch in the above example is two-fold:

a) Translating between values vs. references
b) Translating between nested vs. join tables

The collections in the object oriented program above are conceptually nested tables of references.

benja

Leave a Comment

(required) 
(required) 
(optional)
(required)