openSUSE Lizards

Authors
Adam Jurkiewicz
Adrian Schröter (7)
Agustin Chavarria (1)
Akhil Laddha
Alex Barrios
Alex Minton
Alexander Naumov (1)
Alexander Orlovskyy (3)
Alexey Eromenko
Alin M Elena (4)
Andrea Florio (15)
Andreas Jaeger (45)
Andreas Stieger (2)
Andreas van dem Helge
Andrej Semen
Andrew Wafaa (26)
Arvin Schnell (6)
Bernhard Wiedemann
Bharath Acharya
Bonnie Kurniawan
Brian G. Merrell
Bruno Friedmann (2)
Carl Fletcher
Casual Programmer
Chang ChiaChin
Christoph Thiel
Christopher Hobbs (15)
Ciaran Farrell (2)
Claes Backstrom
Coly Li
Cristian Rodríguez
Daniel Bornkessel
David Bailey
David C. Rankin
Dean Hilkewich
Dinar Valeev (5)
Dirk Müller (1)
Dmitry Serpokryl (7)
Duncan Mac-Vicar
Enrique Herrera Noya
Eugene Pivnev
FabioMux (1)
Federico Lucifredi (1)
Frank Lee
Gabriele Mohr
Gerrit Beine
Helman Rene Taleno Martinez
Helmut Schaa
Henne (8)
Herbert Graeber
Holgi (2)
Hubert Mantel (1)
Ioan Vancea
J. Daniel Schmidt (1)
Jaime Andrés Vélez Osorio
James Tremblay (7)
Jan Blunck (4)
Jan Loeser (1)
Jan Madsen (1)
Jan Nieuwenhuizen
Jan-Christoph Bornschlegel (3)
Jan-Simon Möller (19)
Javier Llorente (2)
Jigish Gohil (26)
Jiri Srain (1)
Jiří Suchomel (1)
Johan Kotze (5)
John Terpstra
Joop Boonen
José Oramas
Josef Reidinger (8)
Juergen Weigert (1)
Julio Vannini (7)
Justin Haygood
Kálmán Kéménczy
Kayo Hamid
Kevin Yeaux (11)
Klaas Freitag (25)
Klara Cihlarova
Klaus Kämpf
Klaus Singvogel
kl_eisbaer (10)
Lars Marowsky-Bree
Li Bin
Ludwig Nussel (6)
M. Edward (Ed) Borasky
M. Edwin Zakaria
M. Hill
Manuel Trujillo
Marcos David
Marcus Hüwe (8)
Marcus Meissner (1)
Marcus Moeller (1)
Marcus Schaefer (3)
Martin Lasarsch (8)
Martin Mohring (8)
Martin Schmiderer
Martin Schmidkunz
Masim "Vavai" Sugianto (20)
Matt Sealey
Mauro Parra-Miranda
Michael Andres (1)
Michael Löffler (4)
Michael Skiba
Michal Marek (3)
Michal Vyskocil (10)
Michal Zugec
Miguel Angel Barajas Hernandez (1)
Mingxi Wu
mrdocs
Nikanth Karthikesan (2)
Oprea Lucian
Oswin Zulu
Peter Nixon
Peter Pöml (4)
Petr Mladek (37)
Petr Uzel (3)
Philipp Thomas
Pragnesh Radadiya
Raul Libório
Ravi Kumar
Ray Chen
Ray Wang (1)
Raymond Wooninck
Rémy Marquis (1)
Renato de Pontes Pereira
Ricardo Chung
Ricardo Varas Santana (6)
Richard Bos (6)
Robert Lihm
Robert Schweikert (2)
Roland Haidl
Roman Drahtmueller
Rossana Motta (1)
Rupert Horstkötter (10)
Sascha Manns (45)
Savin Alex V.
Sebastian Schöbinger (4)
Stanislav Visnovsky (7)
Stefan Haas (1)
Stefan Hundhammer (5)
Stefan Schubert (4)
Steffen Winterfeldt (4)
Stephan Kulow (10)
Suman Manjunath
Suresh Jayaraman (1)
Susanne Oberhauser (2)
Syamsul Qamar Ngabito
Thomas Göttlicher (5)
Thomas Jones
Thomas Schraitle (16)
Thruth Wang
Tuukka (11)
Ulrich Hecht
Vincenzo Barranco
Wilken Gottwalt
Will Stephenson (2)
Xin Wei Hu
Yuri Tsarev





 

Playing With XPath Expressions in The xmllint Shell

1 Star2 Stars3 Stars4 Stars5 Stars (7 votes, average: 5.00 out of 5)
Loading ... Loading ...
Monday, November 23rd, 2009 by Thomas Schraitle Digg!

When XML is transformed into something else, in most cases XSLT comes to play. One of the challenges of XSLT is to select just the nodes you are interested in. This task is done by XPath, “a query language for selecting nodes from a XML document.”

However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!

To make it easier: Test your XPath expressions in the internal xmllint shell!

Using Easy XPath Expressions

Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:

<book lang="en">
  <title>Dancing with Penguins</title>
  <bookinfo>
    <author>
      <firstname>Tux</firstname>
     <surname>Penguin</surname>
    </author>
  </bookinfo>
  <chapter id="know.penguins">
    <title>Getting to Know Penguins</title>
    <abstract>
      <para>Penguins are cute.</para>
    </abstract>
    <sect1>
      <title>The Head</title>
      <para>...</para>
    </sect1>
    <!-- A small comment -->
    <sect1 id="penguin.coat">
      <title>The Coat</title>
      <para>...</para>
    </sect1>
  </chapter>
</book>

The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:

xmllint --shell penguin-dance.xml

You first see the prompt:

/ >

The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.

Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:

/ > xpath /book
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT book
    ATTRIBUTE lang
      TEXT
        content=en

Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:

/ > cd book/chapter
  chapter >

Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.

Let’s see what we have inside book:

chapter > xpath *
  1  ELEMENT title
  2  ELEMENT abstract
  3  ELEMENT sect1
  4  ELEMENT sect1

Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:

chapter > xpath sect1[not(@id)]
  Object is a Node Set :
  Set contains 1 nodes:
  1  ELEMENT sect1

We need the title, so we just append /title after the previous expression:

chapter > xpath sect1[not(@id)]/title
  1  ELEMENT title

and we want the content so we wrap it into the string XPath function:

chapter > xpath string(sect1[not(@id)]/title)
  Object is a string : The Head

We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.

Using Namespaces in XPath Expressions

When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.

Let’s consider KIWI. The configuration is a XML file, based on a RELAX NG schema. The RELAX NG schema are bound to a namespace. Load the KIWI schema with the following command:

xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng

As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.

Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:

/ > xpath *
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT grammar
namespace db href=http://docbook.org/ns/docbook
namespace a href=http://relaxng.org/ns/compatibility/anno...
namespace rng href=http://relaxng.org/ns/structure/1.0
namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc...
default namespace href=http://relaxng.org/ns/structure/1.0
ATTRIBUTE datatypeLibrary
TEXT
content=http://www.w3.org/2001/XMLSchema-datatyp...

As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:

/ > setns r=http://relaxng.org/ns/structure/1.0

The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. :) After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:

/ > xpath count(//r:element)
Object is a number : 80

Generally, every RELAX NG schema contains a start element. What contains it?

/ > xpath /r:grammar/r:start/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT ref
    ATTRIBUTE name
      TEXT
        content=k.image

Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:

/ > xpath /r:grammar/r:start/r:ref/@name
Object is a Node Set :
Set contains 1 nodes:
1  ATTRIBUTE name
    TEXT
      content=k.image

In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:

/ > xpath //r:define
Object is a Node Set :
Set contains 310 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image.name.attribute
...
310  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.users

Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image

What’s inside?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT element
    ATTRIBUTE name
      TEXT
        content=image

An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/*
Object is a Node Set :
Set contains 3 nodes:
1  ELEMENT a:documentation
2  ELEMENT db:para
3  ELEMENT interleave

The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema. :-)

This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.

Happy XPath-ing! :-)


2 Comments »

Comment by anon_anon
2009-11-28 22:16:23

You want want to look at vtd-xml, another XPath engine that offers a lot of cool features

http://vtd-xml.sf.net

 
Comment by Thomas Schraitle
2009-11-30 07:06:29

Thanks, I’ll have a look.

 
Name
Email for notification (will not be published)
Website (optional)
Spam protection: Sum of o-ne + n-ine ?

Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.