Musings on relations - or: WinFS is not enough

Have you had a look at WinFS? No, you should. It´s cool. Or maybe I should say: It could be even more cool, if it didn´t stop too early.

The basic idea behind WinFS is: set your data free! Unlock the gems hidden in large databases! Microsoft´s fundamental insight underlying this is, the future is about relationships between data and you can only set up relationships between separately adressable and accessable units of data.

Within a single relational database you can setup relations between rows in tables. The relationship between smaller informational units (columns, fields) is implicit by arranging them within a single table. But what about relationships between data in different databases? Although that´s possible, it´s not really what you want to do in your day to day business. Databases thus draw a boundary around data. From a security point of view this is of course benefical; but from a data reuse point of view, this is a contraproductive.

Wouldn´t is be nice, though, to not be concerned about the database, the bucket where an address is stored in? Wouldn´t it be nice to be able to relate an existing address to a new information you want to store, which not necessarily comes to rest in the same database as the address? Think about linking your latest digital pictures to contacts you already have in Outlook. Think about adding tasks in Outlook to an order workflow process in your ERP program.

All those scenarios are either not possible today or very hard to achieve. That´s the reasoning behind why WinFS is needed. With WinFS there is no more Outlook .pst file hiding all those precious contacts, tasks, emails and appointments behind a wall. Instead these items are set free to float around as separately addressable and accessable data units in the file system space. And you can do the same with your own data.

Once there are comparatively small data items floating around rather than being penned up in an ever increasing number of (incompatible) databases, you can start to relate those data items to each other. That´s so cool! It let´s you reuse information in different contexts/programs instead of reentering (or importing/exporting) it over and over again.

However, although WinFS breaks up the database barriers around the information nuggets it remains in the world of relational databases. No, not just because SQL Server 2005 is the foundation on which WinFS is built. Rather because relationships are still an afterthought and a second class citizen in data modelling.

From data to associations

Filesystems, XML, RDBMS, ODBMS and also WinFS are all data centric. Data is the main concern. Storing data is the most important task of an RDBMS. Databases are about recording data, making it persistent. Well, that sounds reasonable, doesn´t it?

The following picture depicts the current thinking: Data is arranged in fields stuffed into a row. Rows can point at each other. The data items (fields) are related implicitly and explicitly on two levels: implicitly by putting them next to each other in a row and storing all rows of the same kind in a table, explicitly by foreign keys.

The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data.

To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background.

That might be close to how we store filled in paper forms. But it´s so unlike how the mind works.

There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.

Instead the fridge is left where it is, right there in your kitchen. However, what is stored in your brain are associations of all kinds. In fact, your brain can only store "immaterial" associations. (Let´s neglect for the moment, that those immaterial associations need to manifest themselves somehow, e.g. electrical signals, chemical substances, or cell growth.)

The fridge causes the brain to setup internally an unknown number of associations. Thus, the brain works just with relations/associations and not with data or "the real things". The brain has its own representations for the data. There is not data in the brain; rather the data itself stays outside the brain.

So "the real thing", the fridge, is not in the brain, but instead some kind of, hm, "token" or handle. Or maybe there is not even a "token" for a whole fridge in the brain, but a large number of handles for parts of a fridge? Or what seems to be even more likely: the brain knows nothing about fridges and fridge parts, but just about very, very simple visual structures like points, edges, colors. So the mental representation of a fridge is a set of relations between such basic structures/concepts. Then the brain does not need "tokens" for real world entities, but just for basic structures/concepts to relate them to each other.

Ok, why am I telling you all this? What does this have to do with WinFS? Well, it´s about a completely different way to deal with data (or things). To map what the brain does to the software world means, removing the data from the "system" leaving only associations:

Within the "system" there are just associations and associations between associations. The data is outside the "system". Compared to our traditional thinking this kind of "system" is homogeneous. There are only associations. That´s it. The is no distinction between associations and data or different kinds of associations (implicit vs explicit). Associations or relations are first class citizens in this kind of "system".

And since there are no different kinds of data and no more "data buckets" like tables or columns, any association can be associated with any other association.

When you define an RDBMS schema you explicitly set up which kind of data (rows) can be connected to which other kind of data. You try to forsee what could possibly make sense in terms of associating data. Well, that´s what the Outlook team did in the past. They said: Well, we think, users want to associate a contact with an appointment or an email with a task. So we stuff everything in a nice little database.

But then, users thought differently. All of a sudden, they wanted to associate an Outlook contact with an invoice - without success, because the Outlook developers had thought they could foresee the future usage of certain data.

This dawned on Microsoft and they now come up with WinFS. Great! Or not?

No, not so great, although still technologically cool. Because WinFS still requires you to think in pretty large bins of data (e.g. a contact, an appointment). Although you can set up relations between those smaller bins, WinFS still is about data first - and only then come associations between data. It´s a heterogeneous system.

Your brain, on the other side, is homogeneous: the brain knows only about associations. Because that´s the only way to deal with an unpredictable world where you cannot foresee how "things" might look and behave and how you might want to associate fine grained basic concepts like points or coarse grained concepts like fridges with each other. The brain knows about causality/time, points, edges, space, that´s probably pretty much it. Those concepts/structures are its roots. All else is just associations between those roots and other associations. Billions, trillions of them. And it works :-)

So why stop where WinFS stops? Why not take WinFS to the max? Why not radically chance of view of the database world? How about association bases or connection bases instead of data bases?

A world of associations

The gain of a new view on how to deal with data would be an explosion of possible associations. When you look at your fridge, you immediately can see it in different contexts: there is the context of "kitchen" where the fridge is one of many applicances, then there is the fridge as a manufactured product pointing to a history of industrial production, then there is the context of "food" which the fridge keeps, then there is the context of "information" because you put post-it! notes on the fridge´s door, and so on...

The fridge is at the origin of a multi-dimensional space of contexts. Many different contexts intersect in a fridge. That´s so natural to all of us... so why not treat data the same?

Switching to a new view on dealing with data is thus a switch from one context to multiple contexts. In an associative system and data unit (external to the system) can exist in any number of contexts, just depending on the associations between it and other data units or other associations.

So if associations are the real value of data, because they put them "in perspective" aka into different contexts, then how to get more out of an associative system? Well, by forming as many associations as possible (or as makes sense for a certain observer).

Since the number of possible associations is determined by the number of data units, it´s best to see to maximize their number first. And that´s exactly where WinFS falls short.

Although WinFS promotes disassembling databases into their rows (objects, e.g. contacts, tasks), the resulting data units not only stay within the system, but are also still fairly coarse grained. A whole contact can be associated with a whole appointment.

But why stop there? Why disassemble the data further in order to be able to generate even more associations? Who´s able to foresee that associating a whole task with a whole invoice is all that users ever need?

Maybe I want to navigate (by traversing the maze of associations) from a single date in an appointment to contacts with this date as a birthdate? Why not reuse names from contacts in the context of appointments? And I mean just names.

What this would mean is blowing up those WinFS data units (objects) into very small pieces, data atoms. Each atom being some data unit which cannot be split into smaller pieces.

Single letters come to mind as candidates for a data atom. (The bit values 1 and 0 would be the true data atoms, but even though it would be possible to build a "system" on them, since letters are just associations between 1s and 0s, I find this low level a bit unwieldy.) Pictures might be larger data atoms because their individual bytes might indeed make no sense in other associations - but who knows.

In the end, an associative base system should be data atom agnostic. If might know, data atoms are streams of bytes and might offer to store them as is. But then... why should it know about data atoms? They are of no use within (!) the system. So an associative system should provide just one operation concerning data atoms: create a handle for a data atom, if you ask it to.

The associative system then looks like this:

Whatever is outside the system, the system does not care about. However, in order to setup associations with the outside data atoms, the system has to have some kind of internal representation, that´s why the system needs to be able to generate - ex nihilo so to speak - handles for external data atoms (or terminal values). What those handles mean, which terminal values they stand for, whether it´s a single letter or a multi-megabyte picture, the associative system does not know.

Conclusion

Now, think about the implications for a while...

Such kind of associative base, an AB instead of a DB if you want, would not store data, but rather would generate data from data atoms as needed.

Take a text like the Bible: If you defined the 256 ASCII characters as to be the atoms, then there would be no bible text data, but just some 800,000 associations between those 256 terminal values and other associations. (I know this figure, because I´ve implemented such a system in C# and loaded the 4.5 MB King James Bible into the AB.)

Still, though, I can losslessly generate the complete Bible text upon request from those associations. It´s just a matter of recursive descend in a binary tree. But what´s more important is, no combination of letters would need to be stored twice in such an AB. Each association could be unique. No more duplication of data.

This, though, not only leads to maybe saving some disk space, but it means, when looking for the pattern "Enoch" I immediately get all contexts in which Enoch appears in the Old Testament. Starting to look for patterns from the handles for their terminal values immediately leads to all associations which connect to those patterns.

But this is only a simple example and you might say, hey, this is what full text database searches are for. And you´re right! However, a full text database stores the data twice: once as the data, and once all the major words in the index. Also a full text database usually limits you to searching for words. If you want to look for arbitrary patterns, e.g. "o b" in the text of Hamlet, then you´re lost. A full text search engine would not return "to be". For an AB engine, though, this would make no difference. And that´s important, for example, in searching for gene sequences in the field of bio informatics.

I can understand, though, if you find it difficult to switch your thinking from data centric to associations only. It took me 2-3 weeks and I´m still working on it. But the potential of this switch seems to be huge! Each day I learn something new. It almost feels as if I´m in love :-) I´m almost blocked from doing other work, because my mind reels with the possibilities and implications. That´s the reason, why I needed to write this blog entry. I needed to get this out of my head to move on.

Just yesterday I talked to a developer of an ODBMS about all this. Fortunately I was able to depict all this to him on the phone - and he immediately grasped the idea. He even corrected me when I thought about maybe defining whole data fields (e.g. a name, a birthdate, a zip-code) as data atoms to gain performance from having a "regular" database engine to index them. He said, no, that´s not necessary, because all those values (consisting of characters) can be indexes using associations within (!) the AB. And he´s right! I felt so relieved: Such an index would be just another context in with terminal values appear.

The beauty of an association only system is very striking, I think. So while WinFS is a cool idead compared to todays situation, WinFS is but a small step towards really setting data free to be associated in a million ways like in our brains.

Published Tuesday, December 20, 2005 6:03 PM by ralfw
Filed under:

Comments

# re: Musings on relations - or: WinFS is not enough

Tuesday, December 20, 2005 3:51 PM by lexp

I think that performance of everyday queries in Associative Database would be much lower than in RDBMS.

# re: Musings on relations - or: WinFS is not enough

Tuesday, December 20, 2005 4:57 PM by Jason Foster

Aren't you describing Topic Maps?

# re: Musings on relations - or: WinFS is not enough

Tuesday, December 20, 2005 5:14 PM by Ralf

@lexp: The question of query performance sure is an important one. But as first results show: searching for patterns is very fast if not faster compared to usual fulltext databases (especially if you´re looking for non-word patterns).

Plus, as I explained, when you think about structured data you can keep an index (as tree of associations) within (!) your AB next to the associations connecting your data atom handles. This would be the same as an ordinary index in an RDBMS, I´d say, so there should not be much performance loss.

But even if there is some performance loss, I´d argue that in many cases it is compensated by much higher flexibility of the whole AB.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 1:17 AM by omen

Dude, better read up on your relational foundations first. Indexes are not a relational concept. They are implementation level concepts. You can have relational databases without any conventional indexes at all and still perform (the TransRelational technology, the Nucleus database implementation approach, etc.).

The design of a relational database is just a means of specifying the constraints you want on your data. If you want an "everything can be associated with everything else" design, and throw away most if not all integrity out the window, then its your decision, but being relational doesnt stop you from doing so.

Open up your eyes and go beyond what MS, Oracle and the rest of the mainstream market peddles and claims to be relational.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 2:05 AM by Ralf

@omen: Well, dude, did I say, indexes where part of the relational calculus? No. Of course no index is necessary in a relational database. It just speeds up things.

However, I disagree with you opposing "everything can be associated with everything else" and integrity. Integrity is dependend on a certain schema. So if my "schema" defines a network of associations, then this does not violate integritiy. Enforcing integrity just will look different from RDBMS.

TransRelational technology: There is no such thing really existing beyond some hints and prototypes, it seems. Read http://www.dbms2.com/2005/10/10/17/ for example. In addition it´s still rooted in the relational database world.

As for the Nucleus database approach: Please provide a link to more information.

In any case: If the AB I described does not show I´m looking beyond what Microsoft and Oracle do, I don´t know what would.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 2:19 AM by Ralf

@Jason: Topic Maps bear some resemblance with the associative system I described. There are topics which somewhat look like data atoms and there are associations. (For an introduction see http://www.ontopia.net/topicmaps/materials/tao.html.)

However, topic maps are not what I describe out of two reasons:

-Topic maps have a concrete purpose in mind. So maybe you could say, Topic Maps are an application or manifestation of the system I described. The AB I described is very, very basic or fundamental. It´s a way to view the (data) world.

-Topic Maps consist of topics and associations whereas the system I described consists of only associations.

But I guess the difference will become clearer once I continue my description of associative systems.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 4:33 AM by Guy Murphy

Associative Model of Data.

And I think it splitting hairs to construct a framework under which TopicMaps doesn't fit. If it looks like a ducks, and walks like a duck....

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 7:12 AM by Geert Baeyaert

Ralf,

any chance you could expand on how you implemented your prototype AB. (Maybe even make the source available for download). I've followed your latest posts about this subject with great interest, and would like to see how you solved some issues in code.

You can also email me on my gmail address.(firstname.lastname)

Geert Baeyaert

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 9:38 AM by Baxter Basics

I'd love to hear what Fabian Pascal etc. has to say on this pure unadulterated BS. It wouldn't be pretty. It's a rather poor show to make comments regarding relational calculus when you clearly haven't got the slightest clue about it.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 9:47 AM by Ralf

@Baxter: Well, I´d love to hear what Fabian Pascal has to say.

And I like to see emotions flying high as a result of my posting. Where "BS" is thrown at me and "open your eyes" there must be some nerve touched. Because if not, why bother and post such comments at all?

I just can say: I don´t have all the answers. I just know, that RDBMS - as much as I like them; I use them every day - have their limits. Decade old alternatives like ODBMS or OLAP products are proof to that.

So I guess, trying to go further is not a bad thing. We´ll see where thinking beyond RDBMS leads. Whatever I write here is just an inviation to join on an intellectually intereting path.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 10:43 AM by omen

re: "However, I disagree with you opposing "everything can be associated with everything else" and integrity. Integrity is dependend on a certain schema. So if my "schema" defines a network of associations, then this does not violate integritiy. Enforcing integrity just will look different from RDBMS."

All that is relational database design is taking advantange of relational facilities to enforce (declare) the integity you want.

There are already short hands for certain common integrity concerns (domain, entity, referential, etc.). Putting together certain attributes into a single relation is also an integrity decision. The final "general purpose" integrity that a RDBMS should support is database-wide constraints. Your approach seems to take an "anything-connected-to-anything" design with all integrity concerns left to database-level constraints.

I would predict that after doing this design process quite a number of times you would recognize certain repeating integrity concerns and before you know it, you would be repeating all the relational facilities but "hand-made" and error-prone.

You can design an "associative" schema using a fully relational database with full domain support, if you want. Why you would want that, and only that, everytime is beyond me, however. What you want is already a subset of being relational.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 10:57 AM by Ralf

@omen: Putting attributes into relations helps inforce integrity - but limits how you can connect information. I cannot link just a name/zip pair to something else, if its just part of a relation and not a relation itself (in an RDBMS).

You´re right, that also an AB must enforce integrity constraints. But it gives you more freedom. Informational units get not penned up in cages (tables/rows). Which does not mean, such concepts (e.g. a set (table) of like structured sets (rows)) don´t make sense. They are usefull - but they are limited.

With a relational databases you can traverse all contacts in a table. With an AB you can do the same (if you decided to model tables and rows) - but you can also traverse a name accross different tables, e.g. contact, appointment, invoice.

So I´d argue: an associative system is more general than a traditional relational database.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 10:59 AM by omen

as for Nucleus, see http://www.google.com.ph/search?q=nucleus+database+sand+technology&btnG=Search&hl=en

as for TransRelational, it is indeed sad that it is still in the prototype stage, the approach is sound.

Also, if you havent already, try playing around with Alphora's Dataphor. This is a full commercial product (and .NET based at that).

"We´ll see where thinking beyond RDBMS leads"...I would suggest learning what a real RDBMS is in the first place. You'll find out it has more to it and that those who purport to be "beyond RDBMS" are just focusing on the poorly-implemented portions of the relational approach.

# re: Musings on relations - or: WinFS is not enough

Wednesday, December 21, 2005 11:24 AM by Ralf

@omen: Thanks for the link to information on nucleus. As far as I can see, though, nucleus is about data mining. From my point of view that´s not as fundamental as switching from a data centric world to a association centric world.

As for Dataphor here´s a link: http://www.alphora.com/tiern.asp?ID=HIGHLEVEL

Looks interesting - but remains in the world of the relational calculus and thus is data centric. Nonetheless it might be a revolution compared to some next version of your same old RDBMS. But it does not unlock the data from their little cages. That´s what an associative system is trying to do.

We need to see, if implementations of ABs can hold this promise. But I think, we need to try.

# re: Musings on relations - or: WinFS is not enough

Thursday, December 22, 2005 10:17 PM by Don X

LOL - talk about try to model the WORLD in a diagram!

# re: Musings on relations - or: WinFS is not enough

Friday, December 23, 2005 2:16 AM by omen

re: "nucleus is about data mining" It is being marketed as such, but its technology and physical implementation is general purpose to database technology.

Same goes for another product Netfrastructure. It is marketed as a portal development tool, but its powered off of a proprietary relational database management system that is much more advanced and cleaner that most mainstream products

re: "but remains in the world of the relational calculus and thus is data centric. Nonetheless it might be a revolution compared to some next version of your same old RDBMS."

Data-centric as opposed to? Associations are just data, and are as amenable to relational approaches as "regular" data. And part of the relational model is having proper domains (aka data types) with corresponding operators. So its not just data, but also behavior/operations.

re: "But it does not unlock the data from their little cages"

The cages you refer to are decisions made by the database designer, not something automatically imposed by the relational model. You can design a relational database in 6NF, for example (Date's 6NF).

Another commercial product you can investigate is FirstSQL.

Probably my main point is, not building on the solid foundations already laid out by relational theory and chasing instead ad-hoc approaches in the hopes of alleviating supposed, but untrue, limitations of relational theory is in the end unproductive. All it will do is add fuel to the buzzword, fashion driven aspect of the IT industry. But I must admit there is a great temptation to pursue "novel" ideas, just for the sake of being novel. Specially when the novel thing is "my novel thing".

As for Pike, what argunment and solution does it offer at its core, that is substantially different from this: http://www.lazysoft.com/?

# re: Musings on relations - or: WinFS is not enough

Friday, December 23, 2005 5:00 AM by Ralf

@Don X: you´re right :-) These associations are the most basic way of modelling "the world".

@omen: if vendors fail to position their solutions in a way so anyone can see how general they are, how fundamental their approach is... well, i can´t help it. nucleus for me is no general solution, but a very specific one. it´s up to the manufacturer to go out and help solving problems like gene sequence analysis or make database design fundamentally easier. obviously, though, the manufacturer does not aim at that. maybe you want to do that for him?

re lazysoft: thx for the link. i´ll have to read their online book or some whitepaper first.

re FirstSQL: thx for the hint. here´s a link: http://www.firstsql.com/
however FirstSQL, although allowing access to data via objects, is still data centric. that´s all nice and well, but the idea of associations goes far beyond that.

re "solid foundations already laid out": well, this is such a general argument, you can almost always apply it. it´s a parent talking to a child "get rid of your fantasies and start learning what there is today." it reminds me of charles h. duell´s famous 1899 words: "Everything that can be invented has been invented." (http://www.worldofquotes.com/author/Charles-H.-Duell/1/)

You´re saying: no need to think about associative bases, because what´s needed for data storage has been invented already, the relational theory is enough. It just needs to be fully understood and applied properly.

If you think so, that´s perfectly fine for me. And I´m open to any improvements on the implementation of the relational calculus. As I said: WinFS (or SQL Server for that matter) are not enough. But neither is FirstSQL. And I don´t think nucleus solves the world´s data management problems.

Nor does an associative base, mind you. But it´s an alternative and radical approach that deserves some eyeballs, I´d say. If I and some other´s like to toy around with this idea, why not? You like toy around or work productively with something else. That´s finde, too.

# re: Musings on relations - or: WinFS is not enough

Friday, December 23, 2005 9:51 AM by omen

re: "if vendors fail to position their solutions in a way so anyone can see how general they are, how fundamental their approach is... well, i can´t help it"

Well, thats the problem with relying mainly (or solely) with vendors and products...and with "marketing", for your background.

"I hope very much that computing science at large will become more mature, as I am annoyed by two phenomena that both strike me as symptoms of immaturity.

The one is the widespread sensitivity to fads and fashions, and the wholesale adoption of buzzwords and even buzz notes. Write a paper promising salvation, make it a "structured" something or a "virtual" something, or "abstract", "distributed" or "higher-order" or "applicative" and you can almost be certain of having started a new cult.

The other one is the sensitivity to the market place, the unchallenged assumption that industrial products, just because they are there, become by their mere existence a topic worthy of scientific attention, no matter how grave the mistakes they embody."

-Edsger W. Dijkstra

# re: Musings on relations - or: WinFS is not enough

Friday, December 23, 2005 10:55 AM by Ralf

@omen: I agree with you (or Dijkstra) that "computing science" (or software development) needs to become more mature. Since 1979, when Dijkstra made his statement, much has happened - but much has still to happen. And if you follow my blog you´ll see some areas for which I see great need for more maturity.

But it´s one thing to diagnose immaturity, and another thing to say, whatever carries the label "abstract" or "distributed" (or you might add "service oriented") is just hype or cult - and therefore another immature fad.

That would be to throw out the baby with the bath. Because it´s a sign of arrogance or senility to take something not seriously just because of some buzzword.

The scientific method is or at least should be buzzword agonistic. Whoever claims something must be judged impartially on the grounds of his statements and in comparison to an established body of knowledge.

# re: Musings on relations - or: WinFS is not enough

Saturday, December 24, 2005 4:48 PM by omen

re: "The scientific method is or at least should be buzzword agonistic. Whoever claims something must be judged impartially on the grounds of his statements and in comparison to an established body of knowledge."

Now that is common ground we can finally hang our discussion upon. There seems to be a notion that its better to let everyone's and his uncle's newfangled theories get equal time, opportunity and recognition, and let everyone else make heads or tails of the situation.

While I agree that this is the best course when dealing with nascent fields of study (as there is no prior work to build upon and study), in established areas of science and similar endeavors, there are characteristics which are (or should be) expected of any proposed replacement theory.

Probably the first is that the proponent of the replacement theory should demonstrate a clear understanding of what he/she is trying to replace. If the proponent cannot even explain properly (devoid of buzzwords) the theory sought to be replaced, then that is prima facie evidence against the replacement (or at least its proponent).

The second is that the replacement should be as/more precise (and/or as/more formal), as the original and should based in turn on a foundation that is also as/more precise/formal. This is what makes scientific theories independent of buzzwords. You can distill them into formal, abstract notation, no matter the buzzwords used to explain or promote them. Absent this, no amount of buzzwords or books can make a scientific theory respectable. With this, buzzwords only serve to distract.

Finally, the replacement theory (and its proponent), should be able to demonstrate why it is more accurate. Thus, it should explain what areas or applications the original theory is indeed deficient in, and how it is better at those, as well as how it handles the same or better those areas that are already adequately covered by the original theory.

These 3 characteristics are what differentiates respectable enhancements/reinterpretations to the relational model such as Datalog (some links: http://www.findarticles.com/p/articles/mi_m2483/is_n3_v18/ai_20418257/pg_2 http://goanna.cs.rmit.edu.au/~zahirt/Teaching/subj-datalog.html http://lambda-the-ultimate.org/node/view/25) and the "Temporal Data and the Relational Model" (http://web.onetel.com/~hughdarwen/TheThirdManifesto/TemporalData.Warwick.pdf) to so called "replacements" such as http://www.lazysoft.com, OODBMS, Prevayler, and including Pile.

# re: Musings on relations - or: WinFS is not enough

Saturday, December 24, 2005 4:59 PM by omen

on prior works in addition to lazysoft.com:

what does argument does pile's logical foundation offer that is not already used by such approaches as the Semantic Web (http://infomesh.net/2001/swintro/
http://www.w3.org/RDF/
http://en.wikipedia.org/wiki/Semantic_web)
and Topic Maps (http://www.ontopia.net/topicmaps/materials/tao.html)

# re: Musings on relations - or: WinFS is not enough

Saturday, December 24, 2005 6:10 PM by Ralf

@omen: I agree with you in general about how a "new theory" should behave towards established ones. Also thx for the links.

However, I can´t shed the feeling, we´re suffering from a couple of misunderstandings here.

1. Pile is immature and still in a phase of being defined. No formalism on par with the relational calculus has been worked out yet.
But does that mean, I cannot talk about Pile?

2. Pile is not really positioned against the relational data model. It does not per se want to replace it. RDBMS are here to stay for many decades. But does that mean, I cannot think aloud about alternatives?

3. Pile does not claim to have invented associations or relations (nor did the Semantic Web invent ontologies). What Pile claims is only a radically new low level view of how data could be managed "by getting rid of data".

4. Pile does not resist to learn from other fields. So whoever has something to contribute, please speak up.

I encourage you to read all my postings on Pile and criticize specific statements using pointers to specific existing alternatives. Discussing the pros and cons on such a general level as "What does Pile offer above the Semantic Web?" is pretty tiring. (My short answer would be: Pile is much more low level than the Semantic Web or Topic Maps; but that does not mean, there are no common concepts.)

Also, I encourage you to find out, what bugs you so much about Pile in the first place. Is it Pile´s (or my) presumed arrogance or ignorance? Do you want to teach me (or the Pile team) something you already know? Are you dissatisfied with the field of data management (or software development) in general and see in Pile another sign for the immaturity of the industry? I think, it could be worthwhile to also step up to this meta-level of the communication, since you´re not the only one reacting like you do towards Pile.

So what is it, that elicits such reactions? Pile by itself is modest. There´s some claim, I in my ignorance find this claim cool and blog about it. So... what´s so bad about that? Why not let us live on in blissful ignorance?

My hunch is: Even though Pile is not as formalized as the relational calculus there is something about it, that sounds attractive. There seems to be a hidden promise of less problems than exist today in certain areas of computing (or data management). But on the other hand Pile is at least so radically different, as to provoke rejection.

But why? Pile is harmless. Nobody looses her job because of entertaining thoughts on a world of pure associations. Why not see it as an intellectual adventure?

(As you can see, I still think Pile is very different from (I don´t say "better than") the relational calculus, simply because in Pile there is only one concept: association. That´s it. The relational calculus has at least 2: attribute and relation ;-)

# re: Musings on relations - or: WinFS is not enough

Sunday, December 25, 2005 11:09 AM by omen

re: "But does that mean, I cannot talk about Pile?" and "I cannot think aloud about alternatives?" Of course you can, but you will be subjected to scrutiny versus the characteristics I have mentioned (which is a good thing). It would have been better if you too have subjected Pile to such scrutiny beforehand.

re: "Pile is immature and still in a phase of being defined" So does that mean I can invent any newfangled approach, call it something catchy, claim its better than what has come before, and deflect all criticisms by saying "hey, the idea isnt really formed yet"? There is a certain gestation stage to ideas before they are ready for public scrutiny.

re: "Pile is not really positioned against the relational data model. It does not per se want to replace it." So what problem does it intend to address that isnt already addressed by the relational data model, or isnt addressed as well?

re: "What Pile claims is only a radically new low level view of how data could be managed "by getting rid of data"." Pile's respectability is not helped by such vague statements. This is what lumps it together with www.lazysoft.com, Prevayler, the Semantic Web, etc. and apart from the relational model and such things as DataLog, the Temporal enhancements, etc. This is also what separates New Age medicine from the real deal. While New Age medicine might stumble upon something useful once in a while, or benefit from centuries of anecdotal experience, only proper scientific treatment can specific New Age practices gain respectability.

re: "Pile does not resist to learn from other fields. So whoever has something to contribute, please speak up." This is not enough. Better would be "Pile has studied and amalgamated currently accepted, publicly available, commercial and non-commercial, mainstream and those not-so, approaches, found them wanting in areas such as 1, 2, 3, etc. and proposes another superset approach that is better because of reasons A, B, C, etc." While this is a tall order, this is necessary "homework" and one that is expected of any proponent worth his/her passion. Thus Einstein's theories vs. that of Newton, Quantum vs. Classical mechanics, the Relational data model vs. that of the Network and Hierarchic ones, etc.

re: "..there is something about it, that sounds attractive. There seems to be a hidden promise of less problems than exist today in certain areas of computing (or data management)." A promise is not enough, for everyone and his uncle can promise a better approach. Probably another word for this is "sexiness". While sexiness of an idea influences the quantity and intensity of its adherents, any connection to its quality and long term practicality is a coincidence.

To quote http://www.tdan.com/sms_issue26.htm , "Engle’s position derives from his declared optimism regarding the ability of nonrelational approaches to achieve fuller capturing of user meaning. But while I may agree with Engle on the desirability of such an objective, optimism is not substitute for scientific knowledge. I am unaware of any formal approach that currently does a better job than the relational model and the logic underlying it, and Engle does not offer any."

"It is incumbent on those who are optimistic about those approaches either to demonstrate that we are wrong about the fundamental flaws we document, or show why those flaws are no good reason to drop the optimism."

re: "But why? Pile is harmless. Nobody looses her job because of entertaining thoughts on a world of pure associations. Why not see it as an intellectual adventure?" If Pile is about entertainment, or the arts, or makes trivial claims, then no problem. But in that its topic is not subject to entertainment value (at least not primarily), being harmless is not enough. Higher standards should apply. These standards are what makes a field mature.

# re: Musings on relations - or: WinFS is not enough

Sunday, December 25, 2005 11:40 AM by Ralf

@omen: Thx for your elaborate reply. I guess we´ve exchanged positions. Unfortunately though, you have not substantiated your criticism by picking any number of concrete claims of Pile and contrasted them to any concrete existing statements.

To give you an example of what I mean: You could have said "The notion of a relation being itself a subject of relations is an old hat. Read XYZ and you´ll find is has been proposed before." or "The notion of not storing data items (e.g. names, zip codes) anymore but just relations between much, much smaller informational units has already been proposed by XYZ." or "Associations between relations in the relational calculus are no second class citizens even though keys have to be introduced into the calculus, because of XYZ and you can read this here (link to ABC)."

Please respect I´m writing about Pile as I do. It is to my best understanding of Pile and the relational calculus. If you find my knowledge or Pile´s claim lacking anything, I´d be happy to hear from you the exact (!) locations in my text (or their´s) where you´re dissatisfied - including as exact as possible hints to more substantial claims.

I´ve to admit I find it tiresome to hear "read up on XYZ". Even though I admit I don´t know everything about the relational calculus or set theory or algorithmis complexity and what not and am very (!) willing to learn - I´m reluctant to follow such sweeping criticism.

To point out errors or deficits in my writing is perfectly fine with me. But then, please, be as accurate as you ask me to be.

That said, I´m looking forward to specific critizism from you or anyone else.

And by the way: I prefer to know my critics by name. So please devulge your identity - otherwise I have to assume you have a commercial interest in derating Pile. So, who is "omen"?

Leave a Comment

(required) 
(required) 
(optional)
(required)