Java SE: XML Processing Fundamentals

XML is used in many systems for integrating within an organization or across them. This course will teach you how to read and write documents using the various APIs available in Java, as well as cover document querying and validations.
Course info
Level
Intermediate
Updated
Oct 25, 2017
Duration
4h 14m
Table of contents
Course Overview
Overview and Getting Started with DOM Processing
Reading XML Documents with Streams and the DOM
Creating XML Documents
Validating XML with Schemas
Event-driven Processing and Filtering with StAX
Querying Documents with XPath
XML Object Binding with JaxB
Description
Course info
Level
Intermediate
Updated
Oct 25, 2017
Duration
4h 14m
Description

XML is a standard for storing data and integrating systems within and between organizations. In this course, Java SE: XML Processing Fundamentals, you'll learn the basics of reading and writing XML documents using all the various APIs available. First, you'll discover the tradeoffs between each - understanding which options offer better memory efficiency versus processing control. Next, you'll learn that systems working together need to ensure the data being passed is valid. Finally, you'll explore how to create XML schemas to provide documentation of allowable elements and use them to perform validations while reading documents. By the end of this course, you'll have a solid understanding how to query into documents to efficiently find subsets of data often needed when working with XML.

About the author
About the author

Mike is a solution architect for US Foods, having worked in technology for over 15 years holding roles focused on technical leadership, solution architecture, and enterprise architecture.

More from the author
Test-Driven Development Practices in Java
Intermediate
2h 54m
Apr 7, 2014
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello, my name is Mike Nolan, and welcome to my course Java SE: XML Processing Fundamentals. I am a solution architect with US Foods, and have been working with Java technologies for more than 15 years. XML is the specification defining rules for structuring documents and exchanging data in a descriptive manner. It is used in many systems for integrating within an organization or across them. It provides a means for defining validity rules of these structures and supporting the ability to check that documents conform to these at runtime. With Java being an enterprise class programming language used for implementing systems, knowing its vast ecosystem of XML processing capabilities is important. There are many means for reading and writing documents, with each having a particular set of trade-offs. By the end of this course, you will know how to use each of the APIs for reading and writing documents. Some allow you to process at a very low level pulling specific elements of a document while others can bind a complete document in a very concise manner. You'll also understand the XML schema specification and how to leverage it in Java to perform document validations. Additionally, we will cover the basic of XPath, a query language used for filtering XML documents to help simplify your coding. I hope you'll join me in this journey to learn XML processing and Java SE: XML Processing Fundamentals course at Pluralsight.

Reading XML Documents with Streams and the DOM
In module one, we were introduced to the DOM approach for parsing XML documents. While an effective approach, there are drawbacks to using it. Java actually offers many different models for processing XML documents, with each having different pros and cons, typically related to processing efficiency, memory consumption, and level of programming control. In this module, we will compare the different models and demonstrate them. We will go through how DOM processes the document behind the scenes and discuss the considerations you'd need to understand when processing via this model. Stream processing provides a completely different approach to reading documents. It is typically more efficient, but comes with constraints that are important to understand. Within streams, there are two models you can work from. Push Stream Processing is the first one we'll discuss, which is covered under the simple API for XML, or SAX. Second we will go through Pulse Stream Processing, which is done leveraging the streaming API for XML, or StAX. We will compare the differences of the two models for stream processing and go through demos of each. We will then wrap up with a discussion of efficiency, walking through some alternative techniques that can help you out.

Creating XML Documents
Most systems integrating with another need to send data at some point rather than just receiving it. So understanding how to create XML documents is important. In this module, we are going to cover a few different ways of doing so. First, we are going to cover the DOM approach to creating documents. The API is pretty similar to reading documents, but there are some subtle differences. Then we'll see how to do the same with StAX using the cursor API. Similar to reading documents, this is going to be the more efficient approach. But there are some trade-offs, which we will discuss. Lastly, we'll cover XML transformations, which provide a mechanism for being able to transform one XML document into another, without doing a whole lot of Java coding.

Validating XML with Schemas
Two systems integrating together need to ensure the data being passed is in a valid format to allow for smooth processing. This may include validations of overall document structure, element data types, and data constraints for field values. XML schemas allow you to support these validations when reading and writing XML. In this module, we will cover what they are, how to write them, and how to apply validations in your programs. First, we'll go through why validations are powerful. This includes a little history of XML versus other types of data structures previously used to integrate, some of the challenges, and how XML and schemas help resolve them. Next, we will go through validating the overall structure of an XML document and the basic data types. At this point, integrating validations into our program will be covered using DOM. Then namespaces will be explored and the different ways we can structure our schemas for reusability. After that, we will dive into adding custom data validations. We often need to enforce minimum and maximum lengths and data formatting. These are capabilities we'll cover here. Last, we'll cover validations as they relate to StAX. These aren't a native capability to the StAX API, so we'll talk about how to incorporate the standalone XML validation classes and the tradeoffs in using them.

Event-driven Processing and Filtering with StAX
The StAX Cursor API offers the most efficient means for processing an XML document. But it does so in a manner where it is hard to track information regarding previous nodes processed. Information about previous nodes can be useful for complex document processing, and when needed, you have the option to leverage the StAX Event API. In this model, you query into an event reader to receive an object encapsulating the current node. While it's not as efficient as the cursor API, this path allows you the flexibility to track back through previous nodes when needed to understand your current processing context. In this module, we will start with an overview of the Event API. This will explore the event object model and compare this version of the API to the Cursor API. Then we will cover a concept within StAX known as filtering. In our processing to this point, the consumer of the reader has received all nodes directly. The filter changes this by allowing you to control which nodes and events are returned by the reader, allowing you to move logic out of your processing code when there are sections of the document which should be ignored from processing. Lastly, we'll cover data coalescing. All processing to this point of text within a node has assumed that a single query into the API will return a complete set of text between the start and end elements. This is not always the case, and we'll cover the settings to enable and disable coalescing.

Querying Documents with XPath
To this point, our DOM processing has involved loading the document and iterating through the root, or finding a list of nodes based on the tag name as a starting point, and subsequently iterating from each of them. This will work when you need to traverse a majority of the document. But what if you want to find a very specific set of records without having to loop unnecessary parts of the document? Alternatively, maybe you need to summarize sets of values in a document or count the occurrences of a specific node. XPath is a query language we touched on briefly when discussing XSL. In this module, we will discuss the use of this in conjunction with DOM to perform powerful queries to narrow to very specific points in an XML document or summarize sets of data. First, we will start with the basic concepts of XPath queries, covering how to construct them, the core Java classes needed for their execution, and how the node's position in the document influence all of this. We're going to start with basic queries for finding nodes and reading fields. We'll do this without namespaces, as they require some explanation that muddy up the basic understanding of queries. This will be covered over a series of demos. Once we have the basics down, and have seen filtering in action, we will add namespaces into the mix, showing how to map them into queries. XPath supports the ability to run functions on documents. This will be the third thing we discuss. Last, we'll talk through custom functions in variables when using XPath.

XML Object Binding with JaxB
As seen in previous modules, there are many different APIs and mechanisms within them for reading and writing XML documents. All so far have focused on manually querying into the document, or writing it, in a manner which is very low level. We have looked for tags by evaluating their names and namespaces and fetching their attributes and values to build out our domain objects or transform our domain objects into documents. This is powerful, but also quite verbose. The Java API for XML binding offers a different approach. Instead of working with low-level APIs, you create a set of classes containing annotations, which define how you map a document to fields in an object, allowing a generic class to consume the document, and automatically translate the nodes and values into objects. This can greatly simplify your code at implementation time. It is a much less verbose approach. In this module, you will be introduced to this approach and API. First, we'll discuss what XML binding is. While covering this, we'll go through trade-offs between manually mapping the document and using binding. Then we will cover the generation of schema mappings, which are at the heart of binding. With the schema mapping in place, we'll see how you go about reading the document. After that, we'll see how you can read and edit a document and rewrite it out in a very simple manner. Last, we'll see how to do partial document binding. In this part of the module, we'll be combining a few things we've learned to this point with JaxB, DOM, and XPath.