Playing With XPath Expressions in The xmllint Shell

November 23rd, 2009 by Thomas Schraitle

When XML is transformed into something else, in most cases XSLT comes to play. One of the challenges of XSLT is to select just the nodes you are interested in. This task is done by XPath, “a query language for selecting nodes from a XML document.”

However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!

To make it easier: Test your XPath expressions in the internal xmllint shell!

Using Easy XPath Expressions

Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:

<book lang="en">
  <title>Dancing with Penguins</title>
  <bookinfo>
    <author>
      <firstname>Tux</firstname>
     <surname>Penguin</surname>
    </author>
  </bookinfo>
  <chapter id="know.penguins">
    <title>Getting to Know Penguins</title>
    <abstract>
      <para>Penguins are cute.</para>
    </abstract>
    <sect1>
      <title>The Head</title>
      <para>...</para>
    </sect1>
    <!-- A small comment -->
    <sect1 id="penguin.coat">
      <title>The Coat</title>
      <para>...</para>
    </sect1>
  </chapter>
</book>

The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:

xmllint --shell penguin-dance.xml

You first see the prompt:

/ >

The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.

Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:

/ > xpath /book
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT book
    ATTRIBUTE lang
      TEXT
        content=en

Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:

/ > cd book/chapter
  chapter >

Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.

Let’s see what we have inside book:

chapter > xpath *
  1  ELEMENT title
  2  ELEMENT abstract
  3  ELEMENT sect1
  4  ELEMENT sect1

Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:

chapter > xpath sect1[not(@id)]
  Object is a Node Set :
  Set contains 1 nodes:
  1  ELEMENT sect1

We need the title, so we just append /title after the previous expression:

chapter > xpath sect1[not(@id)]/title
  1  ELEMENT title

and we want the content so we wrap it into the string XPath function:

chapter > xpath string(sect1[not(@id)]/title)
  Object is a string : The Head

We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.

Using Namespaces in XPath Expressions

When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.

Let’s consider KIWI. The configuration is a XML file, based on a RELAX NG schema. The RELAX NG schema are bound to a namespace. Load the KIWI schema with the following command:

xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng

As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.

Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:

/ > xpath *
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT grammar
namespace db href=http://docbook.org/ns/docbook
namespace a href=http://relaxng.org/ns/compatibility/anno...
namespace rng href=http://relaxng.org/ns/structure/1.0
namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc...
default namespace href=http://relaxng.org/ns/structure/1.0
ATTRIBUTE datatypeLibrary
TEXT
content=http://www.w3.org/2001/XMLSchema-datatyp...

As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:

/ > setns r=http://relaxng.org/ns/structure/1.0

The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. 🙂 After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:

/ > xpath count(//r:element)
Object is a number : 80

Generally, every RELAX NG schema contains a start element. What contains it?

/ > xpath /r:grammar/r:start/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT ref
    ATTRIBUTE name
      TEXT
        content=k.image

Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:

/ > xpath /r:grammar/r:start/r:ref/@name
Object is a Node Set :
Set contains 1 nodes:
1  ATTRIBUTE name
    TEXT
      content=k.image

In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:

/ > xpath //r:define
Object is a Node Set :
Set contains 310 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image.name.attribute
...
310  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.users

Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image

What’s inside?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT element
    ATTRIBUTE name
      TEXT
        content=image

An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/*
Object is a Node Set :
Set contains 3 nodes:
1  ELEMENT a:documentation
2  ELEMENT db:para
3  ELEMENT interleave

The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema. 🙂

This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.

Happy XPath-ing! 🙂

Both comments and pings are currently closed.

Tags:KIWI · namespace · RELAX NG · XML · xpath
Category: Documentation

Posted: 2009-11-23 - 09:22
Author: Thomas Schraitle
Feed: RSS 2.0

2 Responses to “Playing With XPath Expressions in The xmllint Shell”

anon_anon

November 28, 2009 at 22:16 |

You want want to look at vtd-xml, another XPath engine that offers a lot of cool features

http://vtd-xml.sf.net
Thomas Schraitle

November 30, 2009 at 07:06 |

Thanks, I’ll have a look.

Advertisement
Tags
11.3 11.4 12.1 12.2 12.3 13.1 13.2 amd ARM ATI Beta buildservice Build Service C-Language cloud Collaboration Community conference Education event Events Factory fglrx fun GNOME gsoc Hackweek KDE Kernel Kraft Linux LXDE obs openSUSE Package PostgreSQL radeon raspberry Raspberry Pi rpm Ruby Tumbleweed XML xorg YaST
Lizards
- Adrian Schröter (12)
- Agustin Chavarria (6)
- Alessandro de Oliveira Faria (13)
- Alex Barrios (12)
- Alexander Naumov (10)
- Alexander Orlovskyy (3)
- Alin M Elena (5)
- Andrea Florio (27)
- Andreas Jaeger (70)
- Andreas Stieger (12)
- Andrew Wafaa (31)
- Arvin Schnell (9)
- Atri Bhattacharya (3)
- Bernhard Wiedemann (31)
- Bonnie Kurniawan (1)
- Bruno Friedmann (98)
- Calumma Brevicorne (29)
- Carl Fletcher (1)
- Christopher Hobbs (17)
- Ciaran Farrell (3)
- Stephan Kulow (17)
- craig gardner (2)
- Stephan Barth (2)
- Thomas Schmidt (2)
- Dinar Valeev (1)
- Dirk Mueller (2)
- Dmitry Serpokryl (7)
- Efstathios Iosifidis (21)
- Fabio Mucciante (5)
- Federico Lucifredi (9)
- Greg Freemyer (1)
- Holger Sickenberg (2)
- Hubert Mantel (1)
- Ilya Chernykh (5)
- Ismail Donmez (1)
- J. Daniel Schmidt (2)
- James Tremblay (7)
- Jan Blunck (4)
- Jan Loeser (3)
- Jan Madsen (1)
- Jan-Christoph Bornschlegel (3)
- Jan-Simon Möller (20)
- Javier Llorente (12)
- Jigish Gohil (85)
- Jiri Srain (1)
- Jiří Suchomel (3)
- Johan Kotze (5)
- José Oramas M. (6)
- Josef Reidinger (16)
- Juergen Weigert (1)
- Julio Vannini (9)
- Dinar Valeev (5)
- Kevin "Yeaux" Dupuy (11)
- Klaas Freitag (55)
- Lars Vogdt (11)
- Ludwig Nussel (13)
- M. Edwin Zakaria (4)
- Marcus Hüwe (39)
- Marcus Meissner (2)
- Marcus Moeller (3)
- Marcus Schaefer (4)
- Martin Lasarsch (8)
- Martin Mohring (11)
- Masim "Vavai" Sugianto (20)
- Michael Andres (1)
- Michael Löffler (7)
- Michal Marek (7)
- Michal Vyskocil (12)
- Miguel Angel Barajas Hernandez (2)
- P Linnell (2)
- Nelson Marques (55)
- Nenad Latinović (1)
- Nikanth Karthikesan (2)
- Przemyslaw Bojczuk (1)
- Peter Pöml (4)
- Petr Gajdos (2)
- Petr Mladek (60)
- Petr Uzel (5)
- Ray Wang (1)
- Raymond Wooninck (1)
- Ricardo Chung (7)
- Ricardo Varas Santana (7)
- Richard Bos (11)
- Robert Schweikert (16)
- Rossana Motta (1)
- Rupert Horstkötter (10)
- Sascha Manns (66)
- saydul akram (3)
- Sebastian Siebert (6)
- Shawn Dunn (2)
- Stanislav Visnovsky (7)
- Stefan Haas (1)
- Stefan Hundhammer (5)
- Stefan Schubert (7)
- Steffen Winterfeldt (8)
- Suresh Jayaraman (3)
- Susanne Oberhauser (3)
- Thomas Göttlicher (6)
- Thomas Schraitle (26)
- Togan Muftuoglu (3)
- Tuukka Pasanen (36)
- Will Stephenson (22)
- YaST Team (90)
Archives
- March 2020 (1)
- February 2020 (2)
- January 2020 (1)
- December 2019 (3)
- November 2019 (2)
- October 2019 (4)
- September 2019 (3)
- August 2019 (3)
- July 2019 (4)
- June 2019 (2)
- April 2019 (4)
- March 2019 (3)
- February 2019 (5)
- January 2019 (1)
- December 2018 (2)
- November 2018 (2)
- October 2018 (3)
- September 2018 (1)
- August 2018 (3)
- July 2018 (2)
- May 2018 (2)
- April 2018 (2)
- March 2018 (2)
- February 2018 (2)
- January 2018 (2)
- December 2017 (1)
- November 2017 (2)
- October 2017 (2)
- September 2017 (3)
- August 2017 (4)
- July 2017 (4)
- June 2017 (2)
- May 2017 (4)
- April 2017 (2)
- March 2017 (3)
- February 2017 (3)
- January 2017 (2)
- December 2016 (5)
- November 2016 (3)
- October 2016 (6)
- September 2016 (2)
- August 2016 (3)
- July 2016 (4)
- June 2016 (2)
- May 2016 (2)
- April 2016 (1)
- March 2016 (2)
- February 2016 (4)
- January 2016 (4)
- December 2015 (6)
- November 2015 (2)
- October 2015 (3)
- September 2015 (2)
- August 2015 (2)
- July 2015 (2)
- June 2015 (3)
- May 2015 (12)
- April 2015 (7)
- March 2015 (6)
- February 2015 (6)
- January 2015 (7)
- December 2014 (5)
- November 2014 (3)
- October 2014 (5)
- September 2014 (3)
- August 2014 (5)
- July 2014 (5)
- June 2014 (7)
- May 2014 (9)
- April 2014 (2)
- March 2014 (9)
- February 2014 (9)
- January 2014 (10)
- December 2013 (9)
- November 2013 (10)
- October 2013 (10)
- September 2013 (6)
- August 2013 (7)
- July 2013 (3)
- June 2013 (7)
- May 2013 (4)
- April 2013 (4)
- March 2013 (7)
- February 2013 (6)
- January 2013 (3)
- December 2012 (3)
- October 2012 (6)
- September 2012 (6)
- August 2012 (5)
- July 2012 (12)
- June 2012 (6)
- May 2012 (4)
- April 2012 (4)
- March 2012 (5)
- February 2012 (2)
- January 2012 (5)
- December 2011 (10)
- November 2011 (6)
- October 2011 (5)
- September 2011 (9)
- August 2011 (12)
- July 2011 (14)
- June 2011 (11)
- May 2011 (18)
- April 2011 (15)
- March 2011 (26)
- February 2011 (16)
- January 2011 (23)
- December 2010 (27)
- November 2010 (18)
- October 2010 (21)
- September 2010 (16)
- August 2010 (21)
- July 2010 (20)
- June 2010 (33)
- May 2010 (29)
- April 2010 (24)
- March 2010 (29)
- February 2010 (22)
- January 2010 (20)
- December 2009 (15)
- November 2009 (21)
- October 2009 (17)
- September 2009 (22)
- August 2009 (28)
- July 2009 (36)
- June 2009 (38)
- May 2009 (40)
- April 2009 (30)
- March 2009 (20)
- February 2009 (21)
- January 2009 (27)
- December 2008 (23)
- November 2008 (12)
- October 2008 (23)
- September 2008 (40)
- August 2008 (24)
- July 2008 (12)
- June 2008 (28)
- May 2008 (26)
- April 2008 (1)

Playing With XPath Expressions in The xmllint Shell

Using Easy XPath Expressions

Using Namespaces in XPath Expressions

2 Responses to “Playing With XPath Expressions in The xmllint Shell”

Advertisement

Tags

Lizards

Archives