Content item identity in Orchard and DecentCMS

Thursday, February 26, 2015

Identity is a funny thing. It’s one of those concepts that we use all the time, but that are tremendously difficult to pin down precisely. To keep things “simple”, in philosophy and in physics, it’s about equivalence relations. In computer science, I’d say it’s more like a bijection between two categories of objects. The difficulty is that you can only approximate real identity this way: by definition, an object is only identical to itself. What we call identity in code really is a proxy for identity, a substitute for it that usually takes less bits to represent than the object itself. The difficulty when building an identity algorithm is twofold: you need to be able to deterministically extract an id from an object, and the id needs to be different when the objects are different. In other words, same yield same ids, and distinct yield distinct ids. Of course, this can be made more complicated with mutable objects. The constraints are in fact very close to those of a good hashing algorithm.

Databases typically solve the problem by assigning identity to objects rather than inferring it. Uniqueness of ids is the guarantee that distinct objects have distinct ids because, well, all ids are distinct. That works well until you need to go beyond the confines of that particular database, and exchange information with other databases, with conflict resolution, such as when doing importing and exporting of contents between instances of Orchard. In those cases, instead of using local integer ids, you need something really globally unique such as guids, or a concept of identity that’s intrinsic to the object.

In Orchard, content items do have a database integer id that is used internally, but for import/export, a different concept of id is used. The identity algorithm leverages the idea of composition, like almost everything in Orchard. The idea and identity string format are similar to identity in the X.500 directory services standard. The parts in a content item can each contribute a nugget of identity, and the item’s identity is composed from the list of those nuggets. A special part can be added in those cases where no clear identity exists on any part, that uses a guid as the identity.

This mostly works fine, but is a little complex. More importantly, there is no guarantee that all objects will have an identity: if no part contributes identity, then you’re out of luck. This does happen quite frequently, as users who don’t know about all this also don’t know that if they don’t add the identity part to their custom types, they won’t export properly. Worse, no error message is going to tell them what went wrong, and instead export will seem to work, and import will fail in funny ways.

That’s why in DecentCMS, I’ve opted for a much simpler system that is somewhere between the local DB integer ids and Orchard’s composite ids: all content items must have a unique id that is a human-readable string. For any content that has a publicly accessible URL, that id is simply that URL, relative to the site’s root. For other types of items, such as widgets, that id is directly or indirectly determined by the user, and uniqueness is enforced by the system. Global uniqueness is not strictly-speaking guaranteed, but practically reasonably sure. Locally, it is guaranteed.

Identity providing services implement rules to go between item and identity, so extensible identity collaboration is still achieved, but instead of having identity map to parts, it maps to types of content storage. For example, there is a storage provider for widgets that knows how to find items with an id that starts with “widget:”, and another that knows where to find items with ids that look like relative URLs (the default). Once the system knows where to find the content item, the rest of the pipeline is indifferent to the type of id that was used, and all content items are equivalent.

In summary, in DecentCMS, there is only one type of identity, a string, and it is guaranteed that all content items have a unique identity. It’s simple, and it works.

2 Comments

Using this type of identity, In case of implementing an import/export between two sites there is a high possibility of identity collision, isn't it? Why not just guids or event better sequential guids?

Richard Chamorro - Thursday, February 26, 2015 5:11:53 PM

Those are great questions.

If the id is reasonably derived (for example from the slug), chances of collision are vanishingly small. The likelihood that the items are actually the same if their ids are the same is much higher, which is something you want for export/import, as you want to be able to import the same item multiple times and not produce clones doing so. The chances of collision are actually blown out of proportion: they are a theoretical possibility, but are actually negligible in practice.

So why not guids? Well, first because guids, sequential or not, are an abomination. They are extremely human-unfriendly, cannot be read, written, or rembered by humans. Would you prefer this blog post's URL was https://weblogs.asp.net/01B4C778-FE90-4E56-A4BE-951719BEFD88, or even https://168.62.43.5/01B4C778-FE90-4E56-A4BE-951719BEFD88 instead of https://weblogs.asp.net/bleroy/content-item-identity-in-orchard-and-decentcms? There's a reason why we moved away from that kind of URL and adopted friendly URL (and by the way that's a relatively recent development, it took us time to understand that for some reason).

For the same reasons, I think we should move away from numerical ids wherever these ids are going to be used by humans.

bleroy - Thursday, February 26, 2015 7:19:06 PM

Comments have been disabled for this content.