PowerShell offers support for XML data directly though its [xml] datatype and this article is going to look at that.
First of all to make use of the builtin [xml] datatype just prefix a variable name with [xml] when you assign something to it.
PS C:\demos> [xml]$order = "<order customer='joe'>
>> <line price='3.00' qty='4'>nut</line>
>> <line price='1.00' qty='3'>hammer</line>
>> </order>"
>>
At first glance this appears to be an ordinary assignment of a string to a variable. But it is doing more than that, it is checking the string to make sure that it is well-formed XML, i.e. text that conforms to the XML specification. For example the following will produce an error
PS C:\demos> [xml]$xdata = "<order ><line>stuff</line></Order>"
Cannot convert value "<order ><line>stuff</line><Order>" to type "System.Xml.Xm
lDocument". Error: "Unexpected end of file has occurred. The following elements
are not closed: Order, order. Line 1, position 34."
At line:1 char:12
+ [xml]$xdata <<<< = "<order ><line>stuff</line><Order>"
because of a mis-match between the opening <order> tag and the closing <\Order> tag. So when you use an [xml] variable in PowerShell you can be sure that what it contains is XML.
So $order is an [xml] variable, what can we do with it? One of the nice things about PowerShell is that if you are not sure what you can do with a varable you can always ask by piping the variable to the get-member function.
PS C:\demos> $order | get-member
TypeName: System.Xml.XmlDocument
Name MemberType Definition
---- ---------- ----------
ToString CodeMethod static System.String XmlNo...
add_NodeChanged Method System.Void add_NodeChange...
add_NodeChanging Method System.Void add_NodeChangi...
add_NodeInserted Method System.Void add_NodeInsert...
add_NodeInserting Method System.Void add_NodeInsert...
add_NodeRemoved Method System.Void add_NodeRemove...
add_NodeRemoving Method System.Void add_NodeRemovi...
AppendChild Method System.Xml.XmlNode AppendC...
Clone Method System.Xml.XmlNode Clone()
CloneNode Method System.Xml.XmlNode CloneNo...
CreateAttribute Method System.Xml.XmlAttribute Cr...
CreateCDataSection Method System.Xml.XmlCDataSection...
CreateComment Method System.Xml.XmlComment Crea...
CreateDocumentFragment Method System.Xml.XmlDocumentFrag...
CreateDocumentType Method System.Xml.XmlDocumentType...
CreateElement Method System.Xml.XmlElement Crea...
CreateEntityReference Method System.Xml.XmlEntityRefere...
CreateNavigator Method System.Xml.XPath.XPathNavi...
CreateNode Method System.Xml.XmlNode CreateN...
CreateProcessingInstruction Method System.Xml.XmlProcessingIn...
CreateSignificantWhitespace Method System.Xml.XmlSignificantW...
CreateTextNode Method System.Xml.XmlText CreateT...
CreateWhitespace Method System.Xml.XmlWhitespace C...
CreateXmlDeclaration Method System.Xml.XmlDeclaration ...
Equals Method System.Boolean Equals(Obje...
GetElementById Method System.Xml.XmlElement GetE...
GetElementsByTagName Method System.Xml.XmlNodeList Get...
GetEnumerator Method System.Collections.IEnumer...
GetHashCode Method System.Int32 GetHashCode()
GetNamespaceOfPrefix Method System.String GetNamespace...
GetPrefixOfNamespace Method System.String GetPrefixOfN...
GetType Method System.Type GetType()
get_Attributes Method System.Xml.XmlAttributeCol...
get_BaseURI Method System.String get_BaseURI()
get_ChildNodes Method System.Xml.XmlNodeList get...
get_DocumentElement Method System.Xml.XmlElement get_...
get_DocumentType Method System.Xml.XmlDocumentType...
get_FirstChild Method System.Xml.XmlNode get_Fir...
get_HasChildNodes Method System.Boolean get_HasChil...
get_Implementation Method System.Xml.XmlImplementati...
get_InnerText Method System.String get_InnerText()
get_InnerXml Method System.String get_InnerXml()
get_IsReadOnly Method System.Boolean get_IsReadO...
get_Item Method System.Xml.XmlElement get_...
get_LastChild Method System.Xml.XmlNode get_Las...
get_LocalName Method System.String get_LocalName()
get_Name Method System.String get_Name()
get_NamespaceURI Method System.String get_Namespac...
get_NameTable Method System.Xml.XmlNameTable ge...
get_NextSibling Method System.Xml.XmlNode get_Nex...
get_NodeType Method System.Xml.XmlNodeType get...
get_OuterXml Method System.String get_OuterXml()
get_OwnerDocument Method System.Xml.XmlDocument get...
get_ParentNode Method System.Xml.XmlNode get_Par...
get_Prefix Method System.String get_Prefix()
get_PreserveWhitespace Method System.Boolean get_Preserv...
get_PreviousSibling Method System.Xml.XmlNode get_Pre...
get_SchemaInfo Method System.Xml.Schema.IXmlSche...
get_Schemas Method System.Xml.Schema.XmlSchem...
get_Value Method System.String get_Value()
ImportNode Method System.Xml.XmlNode ImportN...
InsertAfter Method System.Xml.XmlNode InsertA...
InsertBefore Method System.Xml.XmlNode InsertB...
Load Method System.Void Load(Stream in...
LoadXml Method System.Void LoadXml(String...
Normalize Method System.Void Normalize()
PrependChild Method System.Xml.XmlNode Prepend...
ReadNode Method System.Xml.XmlNode ReadNod...
RemoveAll Method System.Void RemoveAll()
RemoveChild Method System.Xml.XmlNode RemoveC...
remove_NodeChanged Method System.Void remove_NodeCha...
remove_NodeChanging Method System.Void remove_NodeCha...
remove_NodeInserted Method System.Void remove_NodeIns...
remove_NodeInserting Method System.Void remove_NodeIns...
remove_NodeRemoved Method System.Void remove_NodeRem...
remove_NodeRemoving Method System.Void remove_NodeRem...
ReplaceChild Method System.Xml.XmlNode Replace...
Save Method System.Void Save(String fi...
SelectNodes Method System.Xml.XmlNodeList Sel...
SelectSingleNode Method System.Xml.XmlNode SelectS...
set_InnerText Method System.Void set_InnerText(...
set_InnerXml Method System.Void set_InnerXml(S...
set_Prefix Method System.Void set_Prefix(Str...
set_PreserveWhitespace Method System.Void set_PreserveWh...
set_Schemas Method System.Void set_Schemas(Xm...
set_Value Method System.Void set_Value(Stri...
set_XmlResolver Method System.Void set_XmlResolve...
Supports Method System.Boolean Supports(St...
Validate Method System.Void Validate(Valid...
WriteContentTo Method System.Void WriteContentTo...
WriteTo Method System.Void WriteTo(XmlWri...
Item ParameterizedProperty System.Xml.XmlElement Item...
order Property System.Xml.XmlElement orde...
PS C:\demos>
Notice that an [xml] PowerShell variable is an .NET XmlDocument. So everything you know about an XmlDocument can be applied to a PowerShell [xml] variable. In a later article we will look at how you can leverage this fact, but for now we will stick to the XML support that is directly part of PowerShell.
Look closely at the information returned by get-member… notice there is a property named “order” of type XmlElement. If you lookup XmlDocument in MSDN you will not find an order property for it, this is something that PowerShell has added.
The order property is in fact the root, or document, element for the $order document. PowerShell has a simple path language for XML that makes use of the additional properties that PowerShell adds to XML elements. This language is not XPath but still very useful. The simplest thing you can do with an XML variable is ask what it contains:
PS C:\demos> $order
order
-----
order
PS C:\demos>
It produces a table that shows the children of the $order variable. It shows that the $order variable has a single child named “order” whose value is “order”. Since we saw that order is a property of the $order variable we can append it with a “.” to that variable to see what that contains:.
PS C:\demos> $order.order
customer line
-------- ----
joe {line, line}
PS C:\demos>
When PowerShell displays the children of an element in at table, or list, it show both the elements and attributes of that element.
If you prefer to see the children of an [xml] variable in list form you can pipe the results to the format-list function:
PS C:\demos> $order.order | format-list
customer : joe
line : {line, line}
Here we see that line has a number of lines as children.
We can extend this dotted property syntax further and find the value for customer:
PS C:\demos> $order.order.customer joe PS C:\demos>
We can tell that this is just a plain old value because the telltale “——-” used to separate the names of properties from their values is not there. Likewise if we had used a list form the name : is not there.
Now let’s use the dotted property syntax to see what the value of line is.
PS C:\demos> $order.order.line
price qty #text
----- --- -----
3.00 4 nut
4.00 2 hammer
Here we can see that PowerShell has output a two lines, one for each line element in the order element. price and qty are attributes of the line and the special name #text indicates that the line element contains some text.
Since there are a number of line elements we can refer to them individually using an array syntax.
PS C:\demos> $order.order.line[0]
price qty #text
----- --- -----
3.00 4 nut
PS C:\demos> $order.order.line[1]
price qty #text
----- --- -----
4.00 2 hammer
We can even ask how many line elements there are.
PS C:\demos> $order.order.line.Length
2
Under the covers $order.order.line is a .NET array and we can do with it anything we might do with a .NET array. If we want the price of the second line element…
PS C:\demos> $order.order.line[1].price
4.00
The dotted property syntax is actually a simple path language and is very similar to that used in data binding in ADO.NET, Windows Workflow, and Windows Presentation Framework. Now that we see the basics of this syntax lets see if we can do some calculations. Let’s calculate the value of the order.
PS C:\demos> $value = 0
PS C:\demos> $order.order.line | %{$value += [double]$_.qty * $_.price}
PS C:\demos> $value
20
Note that we had to cast the qty attribute to a double to force a numeric computation.
First we create a $value variable and initialize it to 0. Then we use the dotted property syntax to select the line elements in the $order variable. $order.order.line is actually a .NET array. When a .NET array is passed into a segment of a pipeline, as indicated by the | symbol, each item in the array is passed into that segment one at a time for processing.
The %{} syntax in the second segment of the pipeline says to execute what is between the braces once for each item that flows into the pipe. The $_ refers to the individual item that is flowing through that segment, in this case a line element. So as the line elements flow through the pipe the script between the braces calculates the extended price, i.e. the price times the qty, for the line element and accumulates that product into the $value element. Last we get the value of the $value element.
Next let’s do this same calculation again, but be more “pipeline” oriented.
PS C:\demos> $order.order.line | %{[double]$_.qty * $_.price} | &{begin{$v = 0}process{$v += $_}end{$v}}
20
This is making use of an in-line function, the part inside of &{}. It starts the same way the previous solution did, by passing the line elements into the pipeline. The second segment of the pipeline calculates the extended price for the line element and passes it onto the last segment.
The last segement is the in-line function and consists of three parts. The first part is inside of begin{} and is executed just once no matter how many elements pass through the pipeline and it initializes the $v variable to 0.
The second part of the function, the part inside of process{}, is executed once on each item that passes through the pipeline. It just adds each of the extended prices it receives to the $v variable.
The last part of the function, the part inside of the end{}, is also executed just once, but only after the process part is finished processing all of the items passing through the pipeline. All it does is refer to the $v so it is returned.
We can carry this even further. If we know we will be just summing up some numbers that pass though the pipeline we can change our in-line function into a named function and reuse it over and over.
PS C:\demos> function sum {
>> begin{$v = 0}
>> process{$v += $_}
>> end{$v}
>> }
>>
PS C:\demos> $order.order.line | %{[double]$_.qty * $_.price} | sum
20
So far our example has used literal XML, that is we have literally typed in the XML document we want to process. In real life you will probably have some XML files to process. There are a number of ways to read a file with PowerShell but the easiest it to use for our purposes is the ${} syntax. The file c:\demos\order.xml contains the same XML document we used to initialize $order in the previous examples.
PS C:\demos> [xml]$orders2 = ${c:\demos\order.xml}
PS C:\demos> $orders2
order
-----
order
PS C:\demos>
Note that the full path was specified.
Another source of XML data is SQL Server. It uses a FOR XML clause in a SELECT statement to return the results of a query in the form of XML. For example the following query is made against a table that contains information about last names there were filled out on United States census forms.
SELECT name, rank FROM LastNames FOR XML AUTO, ROOT('Names')
The LastNames table contains the names from the census forms along with their rank, ranked by how many forms contained that name. Now we have to somehow use PowerShell to execute a query against SQL Server. Besides its builtin functionality PowerShell can use any functionality provided by a .NET class. So we will use ADO.NET to execute the query. The tricky part, as we shall see in shortly, is how to stick the results into an [xml] PowerShell variable.
First we need a connection to SQL Server.
PS C:\demos> $conn = new-object System.Data.SqlClient.SqlConnection
PS C:\demos> $conn.ConnectionString = "server=.;integrated security=true;database=scratch"
PS C:\demos> $conn.Open()
To make the connection we use the PowerShell new-object function to create an instance of SqlConnection then fill out the connection string. Now we need a command.
PS C:\demos> $cmd = new-object System.Data.SqlClient.SqlCommand
PS C:\demos> $cmd.CommandText="SELECT name, rank FROM LastNames FOR XML AUTO,ROOT('Names')"
PS C:\demos> $cmd.Connection=$conn
Similar to making the connection we make the command by creating an instance of a SqlCommand and fill out the CommandText and Connection.
When ever a SqlCommand uses FOR XML it must be executed using ExecuteXmlReader, which returns its result in the form of an XmlReader.
PS C:\demos> $xrdr = $cmd.ExecuteXmlReader()
If were were writing a program in C# or VB.NET then at this point we would use the XmlReader to Load an XmlDocument. Wait! Remember what we found out when we used get-member to find out what we could do with an [xml] PowerShell variable? We found out that is was an XmlDocument. So we can just use the Load method on an [xml] varible load it with the results of our query. These is one problem though…
PS C:\demos> [xml]$names.Load($xrdr)
You cannot call a method on a null-valued expression.
At line:1 char:18
+ [xml]$names.Load( <<<< $xrdr)
A freshly created [xml] variable has not yet been initialized so we cannot use it to load anything. There are a couple of ways to intialize it. One is to just assign an XmlDocument to it.
PS C:\demos> [xml]$names = New-Object System.Xml.XmlDocument
PS C:\demos> [xml]$names.Load($xrdr)
PS C:\demos> $names
Names
-----
Names
Here we have use new-object to create an XmlDocument and then assign it to an [xml] variable. After we have done that the Load method can be used to read the results of our FOR XML query into the $order3 variable. The we have PowerShell to output the $name variable in tablular form. Here we can see it contains the expected ‘Names” root element.
Another way to initialize the [xml] varible is to just put some XML into it, then write over it when the Load method is used.
PS C:\demos> [xml]$names = '<a/>'
PS C:\demos> $names.Load($xrdr)
PS C:\demos> $names
Names
-----
Names
Now that we have our Names XML document let’s take a quick look inside of it.
PS C:\demos> $names.Names
LastNames
---------
{SMITH, JOHNSON, WILLIAMS, JONES...}
PS C:\demos> $names.Names.LastNames.Length
88799
PS C:\demos>
Here we can see that there are 88799 names the document and the first few are SMITH,JOHNSON and so on.
One last thing… there is a bug in the RC2 PowerShell that hopefully will be fixed by the time the RTM version is released. Elements whose name is “item” will cause problems when the table or list formatter processes them.
PS C:\demos> [xml]$stuff = '<stuff><item/><item/></stuf
PS C:\demos> $stuff
stuff
-----
stuff
PS C:\demos> $stuff.stuff
format-default : The member "Item" is already present.
You will still be able to process “item” element otherwise, it is just an issue when format-table or format-list is used.
We have seen the PowerShell has some pretty straightforward support for XML in its simple dotted path expressions. You can work with literal XML, XML from files and XML as it is returned from SQL Server in FOR XML queries. The PowerShell builtin path language that is very similar to that used in data binding which you may have already used. I didn’t cover it in this article, but PowerShell also allows you to make use of all the XML capabilities built into .NET… but that leaves me something to work on for the next article.
Dan
dan@pluralsight.com
Posted
Oct 28 2006, 02:16 PM
by
dan-sullivan