donderdag 23 augustus 2012

XML anti-pattern



At my current project we decided to store information in XML, and not in a relational database. Well, actuall we decided to store the information in XML-column in SQL Server. This was a bad idea. Let me tell you why.

As an example let's say we have an Order and it has properties like this:
   CreatedOnDate,
   CreatedByUserId,
   Status,
   StatusDate,
   CreationChannel,
   OrderType
   etc.

Most of us would create a table Orders with columns CreatedOnDate, CreatedByUserId, Status, StatusDate, CreationChannel,   OrderType. We created a table Orders, with an Id column Id and an XML column (datatype XML). And in the XML column we put strings like this:

<order>
   <CreatedOnDate>2012-01-01T00:00:00<CreatedOnDate>
   <CreatedByUserId>12345678-ABCD-1234-ABCD-12345678ABCD</CreatedByUserId>
   <Status>...<Status>
   ...
</order>

What's so bad about it?

1. Creating an Xml Schema
To be able to search the XML Column, it has to have an XML Schema on it. In our case the developers made this schema. But they don't do that daily, so it costs them a lot of time to figure out how to do it. As apposed to making a relational database schema, which they do daily.
We also could have chosen to let the dedicated Xml-team create the schema. But that will make the project dependend on another team, and will have impact on the lead time.

2. Maintaining an Xml Schema
Let's think for a moment about when the system goes in production. There will come a moment when a not-backwards-compatible change will be made to the schema. The schema of the Xml column can not be changed, because the current data will not fit into the schema. Now what? The solution is to take the schema of the column, alter the xml data in such a way that it will fit into the new schema, and finally put the new schema onto the Xml column. In a relational database it's one statement to update all data in a column, but with Xml this will be a lot more work.

3. Goodbye Entity Framework
You can't query an Xml column with Entity Framework. Sure, you can select a record which contains Xml data, but you can not do an XQuery from EF. So, getting all orders which have the status X must be done through a stored procedure. This is more difficult, errorprone and timeconsuming. The increase in lead time was much higher than anticipated. It's remarkable how much can go wrong in writing stored procedures and calling them from Entity Framework.

4. XQuery/XPath versus SQL
Most developers are pretty good at writing SQL queries. But writing XQueries to search in Xml data is a lot more difficult.

5. Less data integrity
In Xml you have no foreign keys. So there's no way to guarantee on database level that any relational data is correct. I can easily insert an order with statusId 73 in the database, even though we have no status in the status table with id 73. In a relational model this can easily be solved with a foreign key constraint.

6. Serialize / deserialize
We need code to serialize and deserialize Xml to and from objects. More code, more things to figure out, more errors, more unit tests, higher lead time.

I don't think we will ever use XML again to store relational data in SQL Server. XML is good for config-files and structured data like documents, but not suitable for relational data.