When XML is transformed into something else, in most cases XSLT comes to play. One of the challenges of XSLT is to select just the nodes you are interested in. This task is done by XPath, “a query language for selecting nodes from a XML document.”
However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!
To make it easier: Test your XPath expressions in the internal xmllint shell!
Using Easy XPath Expressions
Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:
<book lang="en"> <title>Dancing with Penguins</title> <bookinfo> <author> <firstname>Tux</firstname> <surname>Penguin</surname> </author> </bookinfo> <chapter id="know.penguins"> <title>Getting to Know Penguins</title> <abstract> <para>Penguins are cute.</para> </abstract> <sect1> <title>The Head</title> <para>...</para> </sect1> <!-- A small comment --> <sect1 id="penguin.coat"> <title>The Coat</title> <para>...</para> </sect1> </chapter> </book>
The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:
xmllint --shell penguin-dance.xml
You first see the prompt:
The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.
Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:
/ > xpath /book Object is a Node Set : Set contains 1 nodes: 1 ELEMENT book ATTRIBUTE lang TEXT content=en
Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:
/ > cd book/chapter chapter >
Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.
Let’s see what we have inside book:
chapter > xpath * 1 ELEMENT title 2 ELEMENT abstract 3 ELEMENT sect1 4 ELEMENT sect1
Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:
chapter > xpath sect1[not(@id)] Object is a Node Set : Set contains 1 nodes: 1 ELEMENT sect1
We need the title, so we just append /title after the previous expression:
chapter > xpath sect1[not(@id)]/title 1 ELEMENT title
and we want the content so we wrap it into the string XPath function:
chapter > xpath string(sect1[not(@id)]/title) Object is a string : The Head
We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.
Using Namespaces in XPath Expressions
When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.
xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng
As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.
Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:
/ > xpath * Object is a Node Set : Set contains 1 nodes: 1 ELEMENT grammar namespace db href=http://docbook.org/ns/docbook namespace a href=http://relaxng.org/ns/compatibility/anno... namespace rng href=http://relaxng.org/ns/structure/1.0 namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc... default namespace href=http://relaxng.org/ns/structure/1.0 ATTRIBUTE datatypeLibrary TEXT content=http://www.w3.org/2001/XMLSchema-datatyp...
As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:
/ > setns r=http://relaxng.org/ns/structure/1.0
The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. 🙂 After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:
/ > xpath count(//r:element) Object is a number : 80
Generally, every RELAX NG schema contains a start element. What contains it?
/ > xpath /r:grammar/r:start/* Object is a Node Set : Set contains 1 nodes: 1 ELEMENT ref ATTRIBUTE name TEXT content=k.image
Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:
/ > xpath /r:grammar/r:start/r:ref/@name Object is a Node Set : Set contains 1 nodes: 1 ATTRIBUTE name TEXT content=k.image
In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:
/ > xpath //r:define Object is a Node Set : Set contains 310 nodes: 1 ELEMENT define ATTRIBUTE name TEXT content=k.image.name.attribute ... 310 ELEMENT define ATTRIBUTE name TEXT content=k.users
Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ] Object is a Node Set : Set contains 1 nodes: 1 ELEMENT define ATTRIBUTE name TEXT content=k.image
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/* Object is a Node Set : Set contains 1 nodes: 1 ELEMENT element ATTRIBUTE name TEXT content=image
An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/* Object is a Node Set : Set contains 3 nodes: 1 ELEMENT a:documentation 2 ELEMENT db:para 3 ELEMENT interleave
The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema. 🙂
This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.
Happy XPath-ing! 🙂
Both comments and pings are currently closed.