XML - Just another TLA?
XML. Possibly the most highly vaunted TLA of the 21st century thus far. Religious wars continue on value-vs-attribute - even I'll join in on that argument given half a chance. Systems everywhere are burning CSV logs on pyres in the race for XML as an output format. Relational databases are being contorted to support the storage and querying of this data. At this rate, the angle bracket will outstrip the full-stop as the most commonly used punctuation mark by the end of 2006. (OK, so I made the last one up, but it wouldn't surprise me)
There are two key aspects to XML:
- It can represent arbitrary data
- It can be interpreted by almost any system.
I'm sure that other people would argue that the self describing nature of it via schemas is equally/more important. But this is my Blog, and I don't think that's half as important. Even if it is self describing, you still need to do something with it to get the data you want out - you still have to have your applications understand what's in it, and that still means coding.
The key point is that the angle brackets are irrelevant. In fact, the whole structure of XML is irrelevant - XML, in itself, is meaningless. Any format that allows the encoding of arbitrarily complex data in a uniform manner would suffice. The fact that it has nice bells and whistles like 7-bit encoding helps, of course, but it's not the key.
What XML gives us is a means to transmit hierarchical data between locations and query it:
- It saves us from having to write object models.
- It saves us from having to create database schemas to represent everything.
- It saves us from having to write custom ways of interpreting data from external systems
But if you look at that list above, you'll see that there are times when the inverse of each of those points may be really important. You may want an object model as it allows methods to be associated with data. You may want a database schema to allow efficient querying across large data-sets. You may want a more efficient means of communication with another system.
One of the problems with the XML - objects - DataSets - whatever-else relationship is that it's quite difficult to change a system oriented towards one approach to another. SOA will help if the service interfaces are carefully defined and shield the consumer from the internals of the system. Newer technologies such as SQL Server 2005 and other databases that have decent native support for XML blur the lines, as does XML-serialization of objects. The price to pay if the wrong option is chosen - the cost of change - is still very real, though.
The point of all of this? XML has earned its place in the world: it both saves us time when we don't the benefits that additional overhead may bring (databses, object models) around what we're doing, and it allows for great machine-readable external communication. This is what it really standard for. But anyone that says they've created an entire enterprise infrastructure based entirely and unrelentingly around XML would give me cause for concern (and I've seen systems like this). It's a technology to be used judiciously, just like any other. There is no one technology that's good for everything - in the majority of cases the very benefits that each option offers us are also its drawbacks in another form.