Comparing XML in SQL Server 2005

Comparing XML in SQL Server 2005

Xml is not text so a literal compare of two xml documents may lead to a false negative, that is it may indicate that two documents are not equal when in fact they are. For example these two xml documents are the same:

 <item x="1" y="2"/>
 
 
 <item y="2" x="1"/>
 

because the order of attributes in xml is not signicant. SQL Server has not "built in" way to compare xml documents and in fact makes a point about the fact that it cannot compare xml documents. This is not a deficiency of SQL Server, it is just due to the fact that xml may not be treated as ordinary text.

You can try to compare to xml documents by converting them to a string or a varbinary type, but this will lead to false negatives. For example:

 DECLARE @x1 xml
 DECLARE @x2 xml
 
 set @x1 = '<item x="1" y="2"/>'
 set @x2 = '<item y="2" x="1"/>'
 if CAST(@x1 AS VARBINARY(MAX)) = CAST(@x2 AS VARBINARY(MAX))
 PRINT 'equal'
 ELSE
 PRINT 'not equal'
 

This prints out 'not equal'

Things get more interesting if an xml schema is involved. For example:

 CREATE XML SCHEMA COLLECTION Test AS
 '
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 targetNamespace = "urn:test">
 <xs:element name="item">
 <xs:complexType>
 <xs:attribute name="x" type="xs:int"/>
 <xs:attribute name="y" type="xs:int"/>
 </xs:complexType>
 </xs:element>
 </xs:schema>'
 
 DECLARE @x1 xml(Test)
 DECLARE @x2 xml
 
 set @x1 = '<item xmlns="urn:test" x="00" y="2"/>'
 set @x2 = '<item xmlns="urn:test" x="00" y="2"/>'
 if CAST(@x1 AS VARBINARY(MAX)) = CAST(@x2 AS VARBINARY(MAX))
 PRINT 'equal'
 ELSE
 PRINT 'not equal'
 
 

this prints out 'not equal' even though both @x1 and @x2 appear to be literally the same. This is due to the fact that SQL Server does not store literal form of the xml when it is strongly typed, it stores each of the simple types in the xml document in it's XML Schema canonical form. The conanical form of '00' for xs:int is '0'. So in this case the comparison is not equal becuase SQL Server is comparing '0' in the strongly typed @x1 to '00' in the weakly typed @x2.

In order to compare two xml documents the must both be converted to a canonical form (http://www.w3.org/TR/2001/REC-xml-c14n-20010315). You can then do a byte by byte comparison of the two canonical forms to determine if they are the same according to the XML 1.1 recomendation, but will not take into account the canonical represenation of data types from the XML Schema recomendation.

The XmlDsigC14NTransform class in the .NET Framwork can convert xml to canonical xml so that it can be compared. You can create a CLR user defined function that uses this class to do the comparison. For example:

 public partial class UserDefinedFunctions
 {
  [Microsoft.SqlServer.Server.SqlFunction]
  public static SqlBoolean CompareXML(SqlXml x1, SqlXml x2)
  {
   if (x1.IsNull || x2.IsNull)
   {
    return SqlBoolean.Null;
   }
   XmlDocument xdoc1 = new XmlDocument();
   xdoc1.Load(x1.CreateReader());
   XmlDocument xdoc2 = new XmlDocument();
   xdoc2.Load(x2.CreateReader());
   XmlDsigC14NTransform xcanonical1 = new XmlDsigC14NTransform();
   xcanonical1.LoadInput(xdoc1);
   XmlDsigC14NTransform xcanonical2 = new XmlDsigC14NTransform();
   xcanonical2.LoadInput(xdoc2);
   // this returns the conanical form
   Stream s1 = (Stream)xcanonical1.GetOutput(typeof(Stream));
   Stream s2 = (Stream)xcanonical2.GetOutput(typeof(Stream));
   // do byte by byte compare
   if (s1.Length != s2.Length)
   {
    return new SqlBoolean(false);
   }
   for (Int32 index = 0; index < s1.Length; index++)
   {
    if (s1.ReadByte() != s2.ReadByte())
    {
     return new SqlBoolean(false);
    }
   }
   return new SqlBoolean(true);
  }
 };
 
 

Then in T-SQL you can do the comparison:

 DECLARE @x1 xml
 DECLARE @x2 xml
 
 set @x1 = '<item x="1" y="2"/>'
 set @x2 = '<item y="2" x="1"/>'
 DECLARE @b bit
 set @b = dbo.CompareXml(@x1, @x2)
 IF @b = 1
 PRINT 'equal'
 ELSE
 PRINT 'not equal'
 

and this will print out 'equal' as expected.

Note that the assembly the contains the CompareXML function will have to be loaded with PERMISSION_SET = UNSAFE because XmlDsigC14NTransform.LoadInput may not be used from partially trusted assemblies.

If the following comparison is done:

 
 DECLARE @x1 xml(Test)
 DECLARE @x2 xml
 
 set @x1 = '<item xmlns="urn:test" x="00" y="2"/>'
 set @x2 = '<item xmlns="urn:test" x="00" y="2"/>'
 if dbo.CompareXml(@x1,@x2) = 1
 PRINT 'equal'
 ELSE
 PRINT 'not equal'
 

It will return 'not equal' because SQL Server does not store literal xml for a strongly typed xml variable. It stores strongly typed xml datatype in their XML Schema canonical form and this is '0' for '00' if the datatype is xs:int. Again this is expected because of the way that SQL Server treats strongly typed xml.

Dan

Posted Sep 01 2006, 07:37 AM by dan-sullivan
Filed under: ,

Comments

Matthew Adams wrote re: Comparing XML in SQL Server 2005
on 09-01-2006 6:20 AM
The second example will always print out 'not equal' because of this:

PRINT 'not equal'
ELSE
PRINT 'not equal'

Of course, it continues to print out 'not equal' even if you correct the first part to 'equal'!
Matthew Adams wrote re: Comparing XML in SQL Server 2005
on 09-01-2006 2:33 PM
You've sneakily changed it now! (Thanks)
Jayaram Krishnaswamy wrote re: Comparing XML in SQL Server 2005
on 07-21-2007 12:44 PM
You can use an XML task in SSIS to do the comparison
Jacob Sebastian wrote re: Comparing XML in SQL Server 2005
on 09-14-2008 8:58 AM

Nice work! This is one of the links I give people when they ask me about comparing two XML values. I just wrote a TSQL function that compares two XML values. www.sqlserverandxml.com/.../xquery-lab-36-writing-tsql-function-to.html

Add a Comment

(required)  
(optional)
(required)  
Remember Me?