Comparing XML in SQL Server 2005
Xml is not text so a literal compare of two xml documents may lead to a false negative, that is it may indicate that two documents are not equal when in fact they are. For example these two xml documents are the same:
<item x="1" y="2"/>
<item y="2" x="1"/>
because the order of attributes in xml is not signicant. SQL Server has not "built in" way to compare xml documents and in fact makes a point about the fact that it cannot compare xml documents. This is not a deficiency of SQL Server, it is just due to the fact that xml may not be treated as ordinary text.
You can try to compare to xml documents by converting them to a string or a varbinary type, but this will lead to false negatives. For example:
DECLARE @x1 xml
DECLARE @x2 xml
set @x1 = '<item x="1" y="2"/>'
set @x2 = '<item y="2" x="1"/>'
if CAST(@x1 AS VARBINARY(MAX)) = CAST(@x2 AS VARBINARY(MAX))
PRINT 'equal'
ELSE
PRINT 'not equal'
This prints out 'not equal'
Things get more interesting if an xml schema is involved. For example:
CREATE XML SCHEMA COLLECTION Test AS
'
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace = "urn:test">
<xs:element name="item">
<xs:complexType>
<xs:attribute name="x" type="xs:int"/>
<xs:attribute name="y" type="xs:int"/>
</xs:complexType>
</xs:element>
</xs:schema>'
DECLARE @x1 xml(Test)
DECLARE @x2 xml
set @x1 = '<item xmlns="urn:test" x="00" y="2"/>'
set @x2 = '<item xmlns="urn:test" x="00" y="2"/>'
if CAST(@x1 AS VARBINARY(MAX)) = CAST(@x2 AS VARBINARY(MAX))
PRINT 'equal'
ELSE
PRINT 'not equal'
this prints out 'not equal' even though both @x1 and @x2 appear to be literally the same. This is due to the fact that SQL Server does not store literal form of the xml when it is strongly typed, it stores each of the simple types in the xml document in it's XML Schema canonical form. The conanical form of '00' for xs:int is '0'. So in this case the comparison is not equal becuase SQL Server is comparing '0' in the strongly typed @x1 to '00' in the weakly typed @x2.
In order to compare two xml documents the must both be converted to a canonical form (http://www.w3.org/TR/2001/REC-xml-c14n-20010315). You can then do a byte by byte comparison of the two canonical forms to determine if they are the same according to the XML 1.1 recomendation, but will not take into account the canonical represenation of data types from the XML Schema recomendation.
The XmlDsigC14NTransform class in the .NET Framwork can convert xml to canonical xml so that it can be compared. You can create a CLR user defined function that uses this class to do the comparison. For example:
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean CompareXML(SqlXml x1, SqlXml x2)
{
if (x1.IsNull || x2.IsNull)
{
return SqlBoolean.Null;
}
XmlDocument xdoc1 = new XmlDocument();
xdoc1.Load(x1.CreateReader());
XmlDocument xdoc2 = new XmlDocument();
xdoc2.Load(x2.CreateReader());
XmlDsigC14NTransform xcanonical1 = new XmlDsigC14NTransform();
xcanonical1.LoadInput(xdoc1);
XmlDsigC14NTransform xcanonical2 = new XmlDsigC14NTransform();
xcanonical2.LoadInput(xdoc2);
// this returns the conanical form
Stream s1 = (Stream)xcanonical1.GetOutput(typeof(Stream));
Stream s2 = (Stream)xcanonical2.GetOutput(typeof(Stream));
// do byte by byte compare
if (s1.Length != s2.Length)
{
return new SqlBoolean(false);
}
for (Int32 index = 0; index < s1.Length; index++)
{
if (s1.ReadByte() != s2.ReadByte())
{
return new SqlBoolean(false);
}
}
return new SqlBoolean(true);
}
};
Then in T-SQL you can do the comparison:
DECLARE @x1 xml
DECLARE @x2 xml
set @x1 = '<item x="1" y="2"/>'
set @x2 = '<item y="2" x="1"/>'
DECLARE @b bit
set @b = dbo.CompareXml(@x1, @x2)
IF @b = 1
PRINT 'equal'
ELSE
PRINT 'not equal'
and this will print out 'equal' as expected.
Note that the assembly the contains the CompareXML function will have to be loaded with PERMISSION_SET = UNSAFE because XmlDsigC14NTransform.LoadInput may not be used from partially trusted assemblies.
If the following comparison is done:
DECLARE @x1 xml(Test)
DECLARE @x2 xml
set @x1 = '<item xmlns="urn:test" x="00" y="2"/>'
set @x2 = '<item xmlns="urn:test" x="00" y="2"/>'
if dbo.CompareXml(@x1,@x2) = 1
PRINT 'equal'
ELSE
PRINT 'not equal'
It will return 'not equal' because SQL Server does not store literal xml for a strongly typed xml variable. It stores strongly typed xml datatype in their XML Schema canonical form and this is '0' for '00' if the datatype is xs:int. Again this is expected because of the way that SQL Server treats strongly typed xml.
Dan
Posted
Sep 01 2006, 07:37 AM
by
dan-sullivan