donderdag 23 augustus 2012

XML anti-pattern



At my current project we decided to store information in XML, and not in a relational database. Well, actuall we decided to store the information in XML-column in SQL Server. This was a bad idea. Let me tell you why.

As an example let's say we have an Order and it has properties like this:
   CreatedOnDate,
   CreatedByUserId,
   Status,
   StatusDate,
   CreationChannel,
   OrderType
   etc.

Most of us would create a table Orders with columns CreatedOnDate, CreatedByUserId, Status, StatusDate, CreationChannel,   OrderType. We created a table Orders, with an Id column Id and an XML column (datatype XML). And in the XML column we put strings like this:

<order>
   <CreatedOnDate>2012-01-01T00:00:00<CreatedOnDate>
   <CreatedByUserId>12345678-ABCD-1234-ABCD-12345678ABCD</CreatedByUserId>
   <Status>...<Status>
   ...
</order>

What's so bad about it?

1. Creating an Xml Schema
To be able to search the XML Column, it has to have an XML Schema on it. In our case the developers made this schema. But they don't do that daily, so it costs them a lot of time to figure out how to do it. As apposed to making a relational database schema, which they do daily.
We also could have chosen to let the dedicated Xml-team create the schema. But that will make the project dependend on another team, and will have impact on the lead time.

2. Maintaining an Xml Schema
Let's think for a moment about when the system goes in production. There will come a moment when a not-backwards-compatible change will be made to the schema. The schema of the Xml column can not be changed, because the current data will not fit into the schema. Now what? The solution is to take the schema of the column, alter the xml data in such a way that it will fit into the new schema, and finally put the new schema onto the Xml column. In a relational database it's one statement to update all data in a column, but with Xml this will be a lot more work.

3. Goodbye Entity Framework
You can't query an Xml column with Entity Framework. Sure, you can select a record which contains Xml data, but you can not do an XQuery from EF. So, getting all orders which have the status X must be done through a stored procedure. This is more difficult, errorprone and timeconsuming. The increase in lead time was much higher than anticipated. It's remarkable how much can go wrong in writing stored procedures and calling them from Entity Framework.

4. XQuery/XPath versus SQL
Most developers are pretty good at writing SQL queries. But writing XQueries to search in Xml data is a lot more difficult.

5. Less data integrity
In Xml you have no foreign keys. So there's no way to guarantee on database level that any relational data is correct. I can easily insert an order with statusId 73 in the database, even though we have no status in the status table with id 73. In a relational model this can easily be solved with a foreign key constraint.

6. Serialize / deserialize
We need code to serialize and deserialize Xml to and from objects. More code, more things to figure out, more errors, more unit tests, higher lead time.

I don't think we will ever use XML again to store relational data in SQL Server. XML is good for config-files and structured data like documents, but not suitable for relational data.

woensdag 18 juli 2012

XML in SQL Server - Indexes

You MUST add indexes to an XML column in SQL Server. If you don't do that, queries will soon become very slow. Just run these statements:


CREATE PRIMARY XML INDEX primaryidx ON [table](XmlColumnname)
GO


CREATE XML INDEX pathidx ON [table](XmlColumnname)
USING XML INDEX  primaryidx  FOR PATH
GO


CREATE XML INDEX valueidx ON [table](XmlColumnname)
USING XML INDEX  primaryidx  FOR VALUE
GO


CREATE XML INDEX propidx ON [table](XmlColumnname)
USING XML INDEX  primaryidx  FOR PROPERTY
GO

XML in SQL Server - Inner join


There are a few ways to do inner joins from an XML column to another table. This is the easy way which delivers very good performance:
select *
from [table1] t1
inner join [table2] t2
on  t1.[XmlColumn.exist('/myrootelement/subelement[.=sql:column("t2.id")]') = 1

zondag 15 juli 2012

XML in SQL Server - Query datetime


I have XML stored in a SQl database like this:



<myrootelement xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <product>something</product>
  <deliverydate>2024-12-26T21:39:24</deliverydate>
</myrootelement>



Let's say I have 10,000 records like the above, and want to get the count of all records with a deliverydate greater than the current date. How can we do that?


First make sure there's a correct schema attached to the column in which the xml is stored. Then XQuery can be used to select the records. The fastest way to query XML in SQL Server is to use the 'exist' function:



SELECT count(*) FROM [table]
WHERE [column].exist('/ myrootelement[deliverydate > xs:dateTime(''2012-07-15T00:00:00'')]') = 1



The function xs:dateTime() is used to parse a string to a datetime value.  Please note that two single quote signs are written, and not a double quote sign.


Using parameters
What if we don't want to use a hardcoded value, but a parameter? I thought that would be pretty straightforward, so I wrote this query:



declare @rcurrentdate datetime; 
set @currentdate = getdate()

SELECT 
count(*)  FROM [table]
WHERE [column].exist('/ myrootelement[deliverydate > xs:dateTime('' + @currentdate + '')]') = 1



But this resulted in an error:



Msg 9319, Level 16, State 1, Line 16
XQuery [Documenten.XmlStuurinformatie.exist()]: Static simple type validation: Invalid simple type value ' + @stringdate + '.

Changing the number of single quote signs didn't matter:

SELECT  count(*)  FROM [table]
WHERE [column].exist('/ myrootelement[deliverydate > xs:dateTime(' + @currentdate + ')]') = 1

Msg 8172, Level 16, State 1, Line 15
The argument 1 of the XML data type method "exist" must be a string literal.

or



SELECT  count(*)  FROM [table]
WHERE [column].exist('/ myrootelement[deliverydate > xs:dateTime(''' + @currentdate + ''')]') = 1

Msg 8172, Level 16, State 1, Line 15
The argument 1 of the XML data type method "exist" must be a string literal.



The right way
So how can we use a parameter in an XQuery? By telling the query to use a SQL parameter. For example:



declare @rcurrentdate datetime; 
set @currentdate = getdate()


SELECT  count(*)  FROM [table]
WHERE [column].exist('/ myrootelement[deliverydate > xs:dateTime(sql:variable("@currentdate"))]') = 1

dinsdag 3 juli 2012

XML Serialization with C#

There are a lot of ways to serialize and deserialize objects into xml strings. Some are easier than others, and some will give you headaches by unintelligible exceptions. In this blog I'll show you how I serialize and deserialize XML.

Let's first create basic object that we want to serialize:

public class MyObject
{
   public string Number{ get; set; }
   public string Status { get; set; }
   public bool IsSomething { get; set; }
}

When we want to serialize this object we basically have two serializers we can choose: XmlSerializer and DataContractSerializer. I prefer the latter. The best desciption of the differences between the two can be found here: XmlSerializer vs DataContractSerializer

When using DataContractSerializer we have to mark our object with DataContract, and all properties we want to serialize must be marked with DataMember, like this:

[DataContract]
public class MyObject
{
   [DataMember]
   public string Number{ get; set; }

   [DataMember]
   public string Status { get; set; }

   public bool IsSomething { get; set; }
}

As you can see the IsSomething property has no DataMember attribute, and therefore will not be serialized.

You can set the name of the class and properties in XML, like this:

[DataContract(Name = "myobject", Namespace = "")]
public class MyObject
{
   [DataMember(Name = "nr")]
   public long Number{ get; set; }

   [DataMember(Name = "status")]
   public string Status { get; set; }
}


Serialize object into xml string

The class MyObject can be serialized this way:

var myObject = new MyObject {Number = 97, Status = "complete"};

var stream = new MemoryStream();
var wr = XmlWriter.Create(stream);
var serializer = new DataContractSerializer(typeof (MyObject));
serializer.WriteObject(wr, myObject);
wr.Close();

Encoding utf8Encoder = new UTF8Encoding(false);
var result = utf8Encoder.GetString(stream.ToArray());

After running this code, result will look like this:

<?xml version="1.0" encoding="utf-8"?><myobject xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/ConsoleApplication1"><nr>97</ nr ><status>complete</status></myobject >

The code is pretty straightforward, except for one thing:

Encoding utf8Encoder = new UTF8Encoding(false);
var result = utf8Encoder.GetString(stream.ToArray());

Why don't I simply do this:

var result = Encoding.UTF8.GetString(stream.ToArray());

The code above will put a Byte Order Mark at the beginning of the xml string. It can't be seen in the Visual Studio debugger, but if you copy-paste the result into a text editor it will look like this:

?<?xml version="1.0" encod[...]

The ? is the Byte Order Mark. If you try to deserialize an xml string with a Byte Order Mark you will get this exception:

Data at the root level is invalid. Line 1, position 1


That why we use this code:

bool addByteOrderMark = false;
var utf8Encoder = new UTF8Encoding(addByteOrderMark);
var result = utf8Encoder.GetString(stream.ToArray());


Deserialize xml string into object

Let's begin with the xml string we serialized:



var xmlstring = "<?xml version=\"1.0\" encoding=\"utf-8\"?><myobject xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://schemas.datacontract.org/2004/07/ConsoleApplication1\"><nr>97</ nr ><status>complete</status></myobject >";


Now we can easily deserialize this into an object:

MyObject myObject;
using (var ms = new MemoryStream(Encoding.UTF8.GetBytes(xmlstring)))
{
   var serializer = new DataContractSerializer(typeof (MyObject));
    myObject = (MyObject) serializer.ReadObject(ms);
}


Deserialize xml string into XDocument

Last thing I want to do is serialize an xml string into an XDocument, which we can query with Linq.

Let's start again with the xml string we serialized:

var xmlstring = "<?xml version=\"1.0\" encoding=\"utf-8\"?><myobject xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://schemas.datacontract.org/2004/07/ConsoleApplication1\"><nr>97</ nr ><status>complete</status></myobject >";

It's also pretty straightforward to deserialize this into an XDocument:


XDocument xDocument;

using (var ms = new MemoryStream(new UTF8Encoding(false).GetBytes(xmlString)))
{
   using (var reader = new XmlTextReader(ms))
   {
      xDocument = XDocument.Load(reader);
   }
}


And there you have your XDocument.

vrijdag 22 juni 2012

Execute a Stored Procedure with Entity Framework Code First

I ran into a problem when executing stored procedures with output parameters using Entity Framework Code First which lead to this exception:


The data reader has more than one field. Multiple fields are not valid for EDM primitive types.


Problem
This is my stored procedure:


CREATE PROCEDURE sp_test (@name varchar(30) OUTPUT)
AS
   SELECT @name = 'John Doe'
GO

As you can see I want to return one value as varchar/string.


This is how you call a stored procedure with EF Code First:




var sqlParameter = new SqlParameter{
       ParameterName = "UniekKenmerkOUT",
       Value = -1,
       DbType = DbType.String,
       Size = 30,
       Direction = ParameterDirection.Output};





var result = Context.Database
      .SqlQuery<String>("exec  sp_test @name = @name OUT",
                        sqlParameter);



When I execute the code I get this exception:


The data reader has more than one field. Multiple fields are not valid for EDM primitive types.




Solution
Entity Framework expects a return value. The stored procedure is only returning an output parameter, and that is not enough. We can fix it this way:



CREATE PROCEDURE sp_test (@name varchar(30) OUTPUT)
AS
   SELECT @name = 'John Doe'
   SELECT @name
GO



The first SELECT statement in the stored procedure is not returning a value. It's actually assigning a value to the output parameter. That's why we need a second select to really return the value. I think it's preferable to write it this way:



CREATE PROCEDURE sp_test (@name varchar(30) OUTPUT)
AS
   SET @name = 'John Doe'
   SELECT @name
GO



Without output parameters
As you can see the output parameter in the example above is not necessary, but it returns the exact same value as the select statement. So, we can rewrite the stored procedure like this:



CREATE PROCEDURE sp_test
AS
   SELECT 'John Doe'
GO



And then we can execute this stored procedure with Entity Framework Code First:



var result = Context.Database.SqlQuery<String>("exec  sp_test);

woensdag 16 mei 2012

SQL Server XML Column

If you want to search for text in an XML column you can't use the Like keyword, but you can use Contains:



SELECT [columns]
FROM TabvleWithXmlColumn
where [XmlColumnname].exist('/root/element[contains(., "[search text]")]') = 1

Be aware this is very slow when you have a lot of records.

woensdag 28 maart 2012

Sharding with SQL Azure Federations is expensive


Since Microsoft introduced it's now pricing model in February 2012 SQL Azure has become a lot cheaper. However, when you partition your databases, the advantages of the new pricing model are lost.


Let me give you an example:
1 database of 10 GB will cost $45.95 per month.
10 partitioned databases of 1 GB each will cost 10 x $9.99 = $99.90


In other words, it's more cost effective to have one big database then to divide the database into several smaller databases. Which is a pity, because with several smaller databases you'll get better performance and parallelism (each database has its own cpu cores, memory, etc.).


Let's look at a few more examples, and see that the price difference can get pretty big:


1 database of 50 GB will cost $125.87 per month
10 databases of 5 GB will cost $259.74 per month


1 database of 50 GB will cost $125.87 per month
50 databases of 1 GB will cost $499.50 per month


1 database of 150 GB will cost $225.77 per month
150 databases of 1 GB will cost $1498.50 per month


Bottom line is that the more efficient your software is (with regards to performance of queries), the less need there will be to split a database into partitions. Ergo, efficient software will keep costs down. One quick win is to not use Entity Framework in places where you want efficient queries, but write them by hand. If for example you want to update a record, Entity Framework always does a select query first and then the update query. This can be rewritten into one query if you do it manually.

maandag 26 maart 2012

Domain Driven Development - Business Logic in Domain Objects (Part 2)

In Part 1 we saw how we can move Business Logic into Domain Objects. It was an easy example with a parent and a list of child objects. Let's look at a bit more complicated example.

Let's take for example this Business Rule:
When a Proposition is closed, no more Tasks can be added to the Phases of the Proposition.

If we build a simple system these three objects can look like this in code:

public class Proposition
{
   public int Id { getset; }
   public bool IsClosed {  get  set ; }
   public List<Phase> Phases {  get  set ; }
}

public class Phase
{
   public int Id { getset; }
   public List<Task> Tasks {  get  set ; }
   public  Proposition Proposition {  get  set ; }

   public void AddTask(Task task)
   {
      if(Proposition.IsClosed)
      {
         throw new Exception("Proposition is closed.");
      }
      Tasks.Add(task);
   }
}

public class Task
{
    public  string Name {  get   set ; }
}

And we can interact with Phase and Task like this:

var phase = PhaseRepository.GetById(1);
phase.AddTask(new Task());


What is the problem with this code?
The problem with the code above is that a child object (Phase) is depending on a parent object (Proposition). Why is that a problem? Because if we add another property to Proposition that has to be added to the Business Rule, we now have to edit two objects:

public class Proposition
{
   ...
    public int Value { getset; } 
} 



public class Phase
{
   public void AddTask(Task task)
   {
      if(Proposition.IsClosed)
      {
         throw new Exception("Proposition is closed.");
      }


      if(Proposition.Value > 100)
      {
         throw new Exception("Propositionvalue is more than 100.");
      }


      Tasks.Add(task);
   }
}


What is the solution?
One solution is to move this Business Rule to the Proposition object, like this:


public class Proposition
{
   public int Id { getset; }
   public bool IsClosed {  get  set ; }
   public List<Phase> Phases {  get  set ; }


   public void AddTask(Task task,  int phaseId)
   {
      if(IsClosed)
      {
         throw new Exception("Proposition is closed.");
      }


      if(Value > 100)
      {
         throw new Exception("Propositionvalue is more than 100.");
      }

       Phases.Single(it => it.Id ==  phaseId ).AddTask(task);
   }
}



public class Phase
{
   public int Id { getset; }
   private List<Task> _tasks;
   public  ReadOnlyCollection<Task> ReadOnlytasks
   {
      get
      {
         return new ReadOnlyCollection<Task>(_tasks);
      }
   }

   internal void AddTask(Task task)
   {
      Tasks.Add(task);
   } 

}

Note that AddTask in the Phase class is made internal, so it's only accessible to Proposition, and not in for example an MVC controller, or a business logic layer.

zondag 25 maart 2012

Readonly collections with Entity Framework

If you want to use readonly collections inside Domain Objects (like I did here) and are using Entity Framework Code First you currently have a problem, because collections of type ReadOnlyCollection are not supported.

Luckily I've found a way to work around this.

Let's start with an object which has a  ReadOnlyCollection :

public class Phase
{
   public int Id { getset; }
   public bool IsClosed { getset; }

   private List<Task> _tasks;
   public  ReadOnlyCollection<Task> ReadOnlyTasks
   {
      get
      {
         return new ReadOnlyCollection<Task>(_tasks);
      }
   }
} 

The problem with the code above is that Entity Framework can't map to private fields and it can't fill a ReadOnlyCollection.
What we can do is create a custom List which behaves like a readonly collection. For starters this means that the Add method is disabled:

public class ReadOnlyList<T> : List<T>
{
   /// <summary>
   ///  Not supported, because it's not allowed to add
   ///  items to a readonly list
   /// </summary>
   public new void Add(T item)
   {
      throw new NotSupportedException();
   }
}

Now we can use this  ReadOnlyList in our Domain Object:

public class Phase
{
   public int Id { getset; }
   public bool IsClosed { getset; }

   public ReadOnlyList<Task> ReadOnlyTasks
   {
      get;
      private set;
   }
}  

One problem solved: Entity Framework is able to fill the ReadOnlyList we created.
But now the Add method isn't supported anymore, how can we fill this list ourself? We have to create another add method, but this time we make it internal:

public class ReadOnlyList<T> : List<T>
{
   /// <summary>
   ///  Not supported, because it's not allowed to add
   ///  items to a readonly list
   /// </summary>
   public new void Add(T item)
   {
      throw new NotSupportedException();
   }

   internal  void AddItem(T item)
   {
      base.Add(item);
   }
}

We can use the AddItem method like this:

public class Phase
{
   public int Id { getset; }
   public bool IsClosed { getset; }

   public ReadOnlyList<Task> ReadOnlyTasks
   {
      get;
      private set;
   }

   public void AddTask(Task task)
   {
      ReadOnlyTasks.AddItem(task);
   }
}  

So how do we use the Phase class after we wrote this code? For example an MVC controller can look like this:

public ActionResult Index() 
{
   var phase = PhaseRepository.GetById(1);
   phase.AddTask(new Task());
} 

What would happen if we call the Add method on the ReadOnlyTasks list, like this:

public ActionResult Index() 
{
   var phase = PhaseRepository.GetById(1);
   phase.ReadOnlyTasks.Add(new Task());
} 

The code above will throw an  NotSupportedException, just as we wrote. So it is indeed a readonly list. But it's not very nice that we can call the Add method, the code compiles without any problem, but at runtime it will fail. It would be much nicer if we can do something to the Add method so we can't call it. Unfortunately we can't make it private or internal, because the public Add method of the List we inherited will become available. But there's another solution: mark the method with the System.Obsolete parameter, like this:

public class ReadOnlyList<T> : List<T>
{
   /// <summary>
   ///  Not supported, because it's not allowed to add
   ///  items to a readonly list
   /// </summary>
   [Obsolete("Not supported, because it is not allowed to add items to a readonly list", true)]
   public new void Add(T item)
   {
      throw new NotSupportedException();
   }
   ...
}

Now, if we try to call the Add method from an MVC Controller, we see this:





This helps us a lot by not making the mistake of calling Add on the readonly list. And if still continue, and type something like this:

phase.Tasks.Add(new Task());

We will get an error at compile time, so our code will never run. We now can only add items to the readonly list by calling the AddTask method on the Phase class, exactly as we wanted.


Btw. this solution is not completely air tight, because we can still cast our Readonly list to a List and the Add method is back available again. But at least we have a pretty clean solution for Readonly lists with Entity Framework.