Home Home > 2008 > 06 > 09 > Query your XML with xpathgrep.py
Sign up | Login

Deprecation notice: openSUSE Lizards user blog platform is deprecated, and will remain read only for the time being. Learn more...

Query your XML with xpathgrep.py

June 9th, 2008 by

Maybe you know this problem: You have a couple of XML files and you need a specific information. Probably everybody would think of grep or similar tools first. But maybe your query is a bit more complicated than just a simple piece of text. What do do?

Recently I’ve found a very useful command line utility, which is probably not very known. It’s named xpathgrep.py and you can get it from the lxml repository (you need lxml too). Let’s assume we have the following DocBook file:

File db.xml
<?xml version="1.0"?>
<book>
  <title>My Cooking Book</title>
  <chapter>
    <title>Ingredients</title>
    <para>...</para>
  </chapter>
  <chapter id="howtocook">
    <title>How to cook</title>
    <para>...</para>
  </chapter>
</book>

Now, if I want to get all the titles I have to use a XPath (which is a path description language for XML, similar to Unix/Linux paths, but more powerful). To get all title elements all I have to do is to write //title, regardless of the level:

$ xpathgrep.py //title db.xml

and I get this:

<title>My Cooking Book</title>

<title>Ingredients</title>

<title>How to cook</title>

Nice, isn’t it? Probably you say: “But, hey, I can get this with grep too!” Yes, but if you want just all chapter titles, you have a problem with grep. With XPath and xpathgrep.py I only modify my XPath expression a bit:

$ xpathgrep.py //chapter/title db.xml

Now this reduces the above output just to the wanted chapter titles. And I can extent my query just for all chapters that doesn’t have an id attribute:

$ xpathgrep.py '//chapter[not(@id)]/title' db.xml

(You need the apostroph because of the shell.) The tool outputs this:

<title>Ingredients</title>

That’s nice, isn’t it? There are a lot of more to discover. A few hours ago I send a small patch to the lxml-devel mailinglist to support namespaces. Hopefully, it will be accepted. 🙂

Both comments and pings are currently closed.

Comments are closed.