Solving the XSD versioning problem

The single biggest problem in the WS space today is data and service versioning. I've been thinking about this problem for years now, and I finally came to an answer that is simple, straight-forward and plays by all the rules. The inspiration came from Henry Thompson's presentation at the XSD workshop that happend last year. I also reread some pieces of the XSD spec, where he dropped some hints about this model. My current, and I hope final, solution to this problem grew from there.

Most OO folks assume that an XML instance is associated with a single XSD. Most XML people do not. Rather an XML instance may be valid according to a whole range of XSDs. In the case of versioning, those XSDs are related in a simple way. They all share the same target namespace. Each new version may add new top-level constructs (elements, types, groups, etc.) and extend existing constructs with optional content. The first sort of change is okay because it doesn't change the transitive closure of any existing constructs so it won't break clients (with a little caveat that a wildcard matching the target namespace has to be assume match things that may be added in a future version). The second sort of change is okay because the extensions are optional. Instances created based on an earlier schema version don't have those elements, but that's okay. Any other changes to existing definitions require a new schema with a new target namespace.

The next problem is what to do about instances of a new version of the schema that are sent to a consumer built with an old version of the schema. In this case there may be extra elements that the consumer didn't know about when it was created. This case arises all the time with Web services, where a new service has to return data to the old client. The solution to this problem is for the client to ignore the extra data. Both the .NET XML-to-object mappers (XmlSerializer and DataContractSerializer) do this. JAXB 2.0 (which is used by JAX-WS) has an option to do this. In other words, if your code doesn't do this already, it will be able to soon.

Ah, but what about schema validation? If an application built with the older version of the schema attempts to validate data sent based on the newer version of the schema, there may be extra elements in the instance that will cause validation to fail. The first attempt to fix this involved adding a wildcard to accept this extra content and then to introduce either hierarchical or inline delimiters to work around the determinism constraint of schema. The goal was to stop the schema validation error from occurring. This solution leads to schemas and marshalers that are too clever by half and create real issues for interoperability between toolkits. Luckily, there is another solution.

The XSD spec does not define validity as a single boolean value. Nor does it say how a system has to react to validation issues. When you validate a document, the processor tells you whether validation was attempted for a given element. If it was attempted, it tells you what schema component was used, and whether the element was valid, invalid, or unknown (because it isn't in the schema). You can decide what to do with that information.

Most of the time, today, people build validation logic that treats anything other than valid elements as an error and throw an exception. But you could be more flexible. For instance, you could build a validator that, upon encountering an unknown element in a sequence, could simply ignore it and all others in the sequence until the parent element's closing tag. “Ignore“ could mean either don't throw an exceptions or it could mean actually filter that data out of the element stream. The important thing is that during validation, if you detect extra stuff, you let it slide. Of course, for the elements you do know about, you can validate their content up until you hit extra unexpected data. I have a .NET 2.0 implementation of this working now, hopefully I'll have time to clean it up and post it soon.

So, my model for versioning comes down to these points:

  • A service can evolve it's contract in a controlled way without breaking clients
  • Clients must assume that the contract they get is a snapshot in time and the service is free to evolve it's contract in a controlled way
  • An application producing an XML instance should make sure it matches the schema that application is using
  • An application consuming an XML instance should assume it matches the schema that application is using plus additional elements
  • If an application consuming an XML instance wants to schema validate it, it should be forgiving in how it deals with unknown elements in the stream and should not simply throw exceptions

Really, this is simply another variation of Postel's law: be careful in what you produce and flexible with what you receive, which is the basis for all the successful distributed systems I know of.

I'm very happy with this model. I think it is pretty intuitive and it works with today's tools. It also doesn't attempt to twist instance around the XSD UPA requirement nor does it attempt to change the semantics of XSD or XSD validation. It only argues for a different reaction to issues that arise during validation, which is totally reasonable.

 

 

 


Posted Apr 14 2006, 02:55 PM by tim-ewald

Comments

William Vambenepe wrote re: Solving the XSD versioning problem
on 04-14-2006 1:37 PM
Or just don't validate. Have a bunch of XPath statements that retrieve the info you need from the message. If they return something consistent with what you expect then you're in business, validation or not. If they don't, then you have a problem and can fail or take corrective action. I agree with you Tim that inviting failure with a yes/no validation is silly. But I would go further and in most cases not validate at all. The caveat is that you need to write your XPath statements with enough specificity that they match the right nodes at the right place, don't start them all with "//" (see http://h20325.www2.hp.com/blogs/vambenepe/archive/2006/01/24/671.html).
Raimond Brookman wrote Re: Solving the XSD versioning problem
on 04-15-2006 5:12 AM
I see how this model can solve a bunch of the problems, but how to deal with removal of allowed elements?

Lets say we have a V1:
<Customer>
<Name>John Doe</Name>
</Customer>

In V2 it is discovered that it should actually be
<Customer>
<FirstName>John</FirstName>
<LastName>John</LastName>
</Customer>

I don't see how this case would fit with the proposed model.
Jason Haley wrote Interesting Finds
on 04-15-2006 5:48 AM
Erik Johnson wrote re: Solving the XSD versioning problem
on 04-15-2006 10:48 AM
One schema lots of people (well, me at least) seem keen to extend is XSD itself. I know the semi-normative schema the W3C published for the XSD spec is itself not a valid schema document. But is there any chance your bits would validate a schema document instance that's been, um, enhanced?

If so, that's very cool and please tell Foliage to send the invoice to the W3C.
Chris Kirby's Inner Monolog wrote The schema and service versioning dilemma
on 04-17-2006 12:30 PM
&lt;&lt;Solving the XSD versioning problemI would agree, as with most xml and web service developers,...
Dan Kearns wrote re: Solving the XSD versioning problem
on 04-18-2006 10:40 AM
It will be a fine day when the tooling and programming environments get good support for this sort of thing. DataContract is certainly a good start.

One interesting side effect of this approach is that you need to reconsider when to include wildcards in your schemas. In particular, it frees you up to use wildcards purely to denote additive extensibility points, since versioning is now a clearly separate concept. Goodbye, ugly russian-doll wildcard schemas!

Another side effect is that when a schema is thought of as a temporal artifact, the urge to waste the usefulness of XSD by writing goofy schemas for generic name-value pairs and property sets also starts to go away, and that in turn makes transformation tooling a much much easier problem because the XML is much less abstract, more human-readable, etc.
Christopher Steen wrote Link Listing - April 18, 2006
on 04-18-2006 7:32 PM
"Atlas" Toolkit - Using an Image in a
CollapsiblePanelExtender [Via: Robert
McLaws ]
A few thoughts...
XML Nation wrote Making everything optional
on 04-20-2006 12:51 PM
From 9 till 2 wrote Putting in the Potent
on 04-22-2006 2:49 AM
XML Nation wrote Initial code for version-aware schema validation
on 04-25-2006 12:22 PM
David Orchard wrote re: Solving the XSD versioning problem
on 04-27-2006 12:22 PM
Good stuff Tim. One thing I'd point out is that there are two types of ignoring: discarding or retaining.

<a href="http://www.pacificspirit.com/blog/2006/03/17/how_much_do_i_ignore_thee_discard_or_retain">http://www.pacificspirit.com/blog/2006/03/17/how_much_do_i_ignore_thee_discard_or_retain</a>
XML Nation wrote Versioning and semantic changes
on 05-05-2006 12:27 PM
Dare Obasanjo aka Carnage4Life wrote Tim Ewald on Versioning XML Web Services with XSD
on 05-14-2006 4:53 PM
Grammar Police wrote re: Solving the XSD versioning problem
on 08-30-2006 9:02 AM
it's = it is
its = belonging to it
Ken Brubaker wrote Tim Ewald's solution for XML Schema versioning
on 11-28-2006 7:55 AM
Tim Ewald addresses the XML Schema versioning issue head on.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?