openSUSE Lizards

Authors
Adam Jurkiewicz
Adrian Schröter (5)
Agustin Chavarria (1)
Akhil Laddha
Alex Barrios
Alex Minton
Alexander Naumov
Alexander Orlovskyy (3)
Alexey Eromenko
Alin M Elena (4)
Andrea Florio (14)
Andreas Jaeger (44)
Andreas Stieger (1)
Andreas van dem Helge
Andrej Semen
Andrew Wafaa (25)
Arvin Schnell (6)
Beineri2
Bernhard Wiedemann
Bharath Acharya
Bonnie Kurniawan
Brian G. Merrell
Bruno Friedmann
Carl Fletcher
Casual Programmer
Chang ChiaChin
Christoph Thiel
Christopher Hobbs (15)
Ciaran Farrell (2)
Claes Backstrom
Coly Li
Cristian Rodríguez
Daniel Bornkessel
David Bailey
David C. Rankin
Dean Hilkewich
Dinar Valeev (5)
Dirk Müller (1)
Dmitry Serpokryl (7)
Duncan Mac-Vicar
Enrique Herrera Noya
Eugene Pivnev
FabioMux (1)
Federico Lucifredi
Frank Lee
Gabriele Mohr
Gerrit Beine
Helman Rene Taleno Martinez
Helmut Schaa
Henne (6)
Herbert Graeber
Holgi (2)
Hubert Mantel (1)
Ioan Vancea
J. Daniel Schmidt (1)
Jaime Andrés Vélez Osorio
James Tremblay (7)
Jan Blunck (4)
Jan Loeser (1)
Jan Madsen (1)
Jan Nieuwenhuizen
Jan-Christoph Bornschlegel (3)
Jan-Simon Möller (19)
Javier Llorente (2)
Jigish Gohil (22)
Jiri Srain (1)
Jiří Suchomel (1)
Johan Kotze (5)
John Terpstra
Joop Boonen
José Oramas
Josef Reidinger (8)
Juergen Weigert (1)
Julio Vannini (7)
Justin Haygood
Kálmán Kéménczy
Kayo Hamid
Kevin Yeaux (10)
Klaas Freitag (21)
Klara Cihlarova
Klaus Kämpf
Klaus Singvogel
kl_eisbaer (10)
Lars Marowsky-Bree
Li Bin
Ludwig Nussel (6)
M. Edward (Ed) Borasky
M. Edwin Zakaria
M. Hill
Manuel Trujillo
Marcos David
Marcus Hüwe (8)
Marcus Meissner (1)
Marcus Moeller (1)
Marcus Schaefer (3)
Martin Lasarsch (8)
Martin Mohring (8)
Martin Schmiderer
Martin Schmidkunz
Masim "Vavai" Sugianto (20)
Matt Sealey
Mauro Parra-Miranda
Michael Andres (1)
Michael Löffler (3)
Michael Skiba
Michal Marek (3)
Michal Vyskocil (9)
Michal Zugec
Miguel Angel Barajas Hernandez
Mingxi Wu
mrdocs
Nikanth Karthikesan (2)
Oprea Lucian
Oswin Zulu
Peter Nixon
Peter Pöml (4)
Petr Mladek (32)
Petr Uzel (2)
Philipp Thomas
Pragnesh Radadiya
Raul Libório
Ravi Kumar
Ray Chen
Ray Wang (1)
Renato de Pontes Pereira
Ricardo Chung
Ricardo Varas Santana (6)
Richard Bos (5)
Robert Lihm
Roland Haidl
Roman Drahtmueller
Rossana Motta (1)
Rupert Horstkötter (10)
Sascha Manns (45)
Savin Alex V.
Sebastian Schöbinger (4)
Stanislav Visnovsky (7)
Stefan Haas (1)
Stefan Hundhammer (5)
Stefan Schubert (3)
Steffen Winterfeldt (4)
Stephan Kulow (10)
Suman Manjunath
Suresh Jayaraman (1)
Susanne Oberhauser (2)
Syamsul Qamar Ngabito
Thomas Göttlicher (4)
Thomas Jones
Thomas Schraitle (15)
Thruth Wang
Tuukka (11)
Ulrich Hecht
Vincenzo Barranco
Wilken Gottwalt
Will Stephenson (1)
Xin Wei Hu
Yuri Tsarev





 

Query your XML with xpathgrep.py

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5.00 out of 5)
Loading ... Loading ...
Monday, June 9th, 2008 by Thomas Schraitle Digg!

Maybe you know this problem: You have a couple of XML files and you need a specific information. Probably everybody would think of grep or similar tools first. But maybe your query is a bit more complicated than just a simple piece of text. What do do?

Recently I’ve found a very useful command line utility, which is probably not very known. It’s named xpathgrep.py and you can get it from the lxml repository (you need lxml too). Let’s assume we have the following DocBook file:

File db.xml
<?xml version="1.0"?>
<book>
  <title>My Cooking Book</title>
  <chapter>
    <title>Ingredients</title>
    <para>...</para>
  </chapter>
  <chapter id="howtocook">
    <title>How to cook</title>
    <para>...</para>
  </chapter>
</book>

Now, if I want to get all the titles I have to use a XPath (which is a path description language for XML, similar to Unix/Linux paths, but more powerful). To get all title elements all I have to do is to write //title, regardless of the level:

$ xpathgrep.py //title db.xml

and I get this:

<title>My Cooking Book</title>

<title>Ingredients</title>

<title>How to cook</title>

Nice, isn’t it? Probably you say: “But, hey, I can get this with grep too!” Yes, but if you want just all chapter titles, you have a problem with grep. With XPath and xpathgrep.py I only modify my XPath expression a bit:

$ xpathgrep.py //chapter/title db.xml

Now this reduces the above output just to the wanted chapter titles. And I can extent my query just for all chapters that doesn’t have an id attribute:

$ xpathgrep.py '//chapter[not(@id)]/title' db.xml

(You need the apostroph because of the shell.) The tool outputs this:

<title>Ingredients</title>

That’s nice, isn’t it? There are a lot of more to discover. A few hours ago I send a small patch to the lxml-devel mailinglist to support namespaces. Hopefully, it will be accepted. :)


Comments

No comments yet.

Sorry, the comment form is closed at this time.