« A Great Blog: Intentional Software | Main | "Republicans Buy Shoes Too" »

February 11, 2006

XML: When to use it and when not to use it

XML is overused.

I'm not sure why. I suspect it is because that XML is standardized and because of the prevalent attitude that all things with a W3C standard are good.

However, there are many times XML is selected to represent data or (worse) behavior where that selection is a poor decision. The best examples of this mistake are programming languages that are XML: Ant, NAnt, XSLT, and VoiceXML, for a few.

XML is verbose and puts awkward constraints on syntax. The strained syntax in VoiceXML is an illustration. Writing VoiceXML -- something I've spent much time doing -- is tedious and error prone. If you want a simple if/then/else construct, you need to type <if> … <elseif /> ... <else /> ... </if>. The result looks more like hieroglyphics than readable code.

The VoiceXML creators could have delivered all the same features by just making it an extension of an existing language. Developers wouldn’t have to learn a new syntax and wouldn’t give themselves Carpal Tunnel syndrome from typing all those <, >, and / characters.

Another particularly odious example is passing complex data structures as method parameters. I've now run into several organizations that have as a corporate standard architecture that you create an XML document to pass to another logical layer.

I'm not talking about using Web Services or passing data across process boundaries -- I'm talking about passing, say, three strings in an XML document -– doSomething( Document document ) -- rather than declaring a method doSomething( string foo, string bar, string baz ). Onyx's CRM platform uses this antipattern as their integration framework.

This is such a bad idea I don't know where to start. For one, when you encode things in XML, you lose type. You need to write logic that encodes your parameters in XML and then more to unencode the parameters from within the target method. The extra statements obscure your code.

All this for no benefit that I can percieve.

XML is chosen for two main reasons, one good and one dubious.

The first reason -- and a good one -- is that there are numerous stable XML parsers. Free, readily available tool support is a good reason to select a technology, but it doesn't override all other concerns.

The dubious reason people select XML is because it is standard and allegedly non-proprietary. XML is so weakly typed and amorphous that its ultimate definition is proprietary. Microsoft Visio saves diagrams to XML, but that specific format is proprietary. It is solely owned and controlled by Microsoft.

To sum things up, here are some general rules on when and when not to use XML:

  • If you're representing logical statements -- a programming language -- never use XML.
  • If you need only simple, non-hierarchical data representation, don't use XML.
  • If you need complex data represented externally to your program (to a human or another unrelated system), consider XML. If you do select XML, unit test the logic that persists and retrieves it well.
  • If you need complex data represented only internally to your system, bias your decision away from XML. In Java and .NET, for example, you can binary serialize your object graph. Performance is better and you don’t have to write as much code.
  • Finally, if you're looking at some other application I haven't mentioned above, carefully consider the pros and cons of using XML. In general, the positives of XML are too strongly weighted and the negatives overlooked.

Posted by gsmith at February 11, 2006 08:34 AM

Comments

I went to this presentation where the guy was showing how XAML was the next big wave of XML to bleach the application landscape. He was demoing the Avalon SDK.

GUIs in XML.....Ugh.
It's the wrong direction to go.

Posted by: Avishek Sen Gupta at February 11, 2006 01:57 PM

I wonder rolling a YAML->XML converter might help some situations where an XML schema readily makes documents unintelligible. There is a mismatch between the data-models of the two formats, but some intelligent schema-specific mapping might help?

Posted by: James Webster at February 12, 2006 02:46 PM