XML – openSUSE Lizards

DocBook Authoring and Publishing Suite (DAPS) 2.0 Released

Thomas Schraitle — Tue, 23 Jun 2015 12:46:08 +0000

After more than two years of development, 15 pre-releases and more than 2000 commits we proudly present release 2.0 of the DocBook Authoring and Publishing Suite, in short DAPS 2.0.

DAPS lets you publish your DocBook 4 or Docbook 5 XML sources in various output formats such as HTML, PDF, ePUB, man pages or ASCII with a single command. It is perfectly suited for large documentation projects by providing profiling support and packaging tools. DAPS supports authors by providing linkchecker, validator, spellchecker, and editor macros. DAPS exclusively runs on Linux.

Download & Installation

For download and installation instructions refer to https://github.com/openSUSE/daps/blob/master/INSTALL.adoc
Highlights of the DAPS 2.0 release include:

fully supports DocBook 5 (production ready)
daps_autobuild for automatically building and releasing books from different sources
support for EPUB 3 and Amazon .mobi format
default HTML output is XHTML, also supports HTML5
now supports XSLT processor saxon6 (in addition to xsltproc)
improved “scriptability”
properly handles CSS, JavaScript and images for HTML and EPUB builds (via a “static/” directory in the respective stylesheet folder)
added support for JPG images
supports all DocBook profiling attributes
improved performance by only loading makefiles that are needed for the given subcommand
added a comprehensive test suite to ensure better code quality when releasing
tested on Debian Wheezy, Fedora 20/21 openSUSE 13.x, SLE 12, and Ubuntu 14.10.

Please note that this DAPS release does not support webhelp. It is planned to re-add webhelp support with DAPS 2.1.

For a complete Changelog refer to https://github.com/openSUSE/daps/blob/master/ChangeLog

Support

If you have got questions regarding DAPS, please use the discussion forum at https://sourceforge.net/p/daps/discussion/General/. We will do our best to help.

Bug Reports

To report bugs or file enhancement issues, use the issue Tracker at https://github.com/openSUSE/daps/issues.

The DAPS Project

DAPS is developed by the SUSE Linux documentation team and used to generate the product documentation for all SUSE Linux products. However, it is not exclusively tailored for SUSE documentation, but supports every documentation written in DocBook.
DAPS has been tested on Debian Wheezy, Fedora 20/21 openSUSE 13.x, SLE 12, and Ubuntu 14.10.

The DAPS project moved from SourceForge to GitHub and is now available at https://opensuse.github.io/daps/.

Editing KIWI configurations with Emacs

Togan Muftuoglu — Fri, 31 Aug 2012 16:17:00 +0000

I recently decided to do all my work in emacs and even though the learning speed is a bit slow, I thought I would share what I discoverd regarding editing the KIWI config files. Kiwi has the schema file for the elements and their attributes but unfortunately by default Emacs is unaware of it’s schema location. So first create a schema location file as below and save it.

I saved it as $HOME/.emacs.d/data/myschemas.xml. Now add this to your Emac’s init file for autoloading the nxml mode for kiwi files in addition to the xml files
(setq auto-mode-alist (cons '("\\.\$xml\\|kiwi\\|xsl\\|rng\\|xhtml\$\\'" . nxml-mode) auto-mode-alist))
and add this code for nxml mode to locate the kiwi schema file when you edit a kiwi config file
(eval-after-load 'rng-loc '(add-to-list 'rng-schema-locating-files (concat user-emacs-directory "data/myschemas.xml")))
Now have fun with Emacs, Kiwi and your openSUSE

Zippl again – now in the package

Klaas Freitag — Tue, 12 Jul 2011 20:16:00 +0000

lightweight presentations

some might remember my hackweek project Zippl. I blogged about it more than a year ago. Zippl is a lightweigt presentation tool, a bit like prezi, a hipp tool for that purpose, where all ‘slides’ sit on one large canvas and during the presentation a kind of camera moves over the canvas.

I liked the idea and did Zippl as I wanted to play with Qt’s QGraphicsView. It takes a simple xml file as input which describes the presentation and animates it as shown in the video in my older blog.

First I thought it doesn’t make sense to continue that project. But recently, somebody asked if I have built in the feature back to the previous spot as I promised almost a year ago, as he wanted to do a presentation with Zippl. I couldn’t believe, and so I spent an evening in the weekend to polish Zippl a bit. And because its easy with OBS, I quickly built an rpm package for various openSUSEs.

Now that I worked on it a bit again I found it could also make sense on tablet devices, for example to run cool Hello New User animations or small presentations for ant Tilly to get some sponsorship for the new bike. Could be fun.

If you want to check it, please install from my home repository.

KIWI RELAX NG Schema Explained

Thomas Schraitle — Sun, 06 Dec 2009 19:20:29 +0000

KIWI, invented by Marcus Schäfer, is a magnificent tool to build your own SUSE Linux distribution. It is also the backend of SUSE Studio.

For those who has used KIWI manually already know the details: KIWI’s configuration file is XML and based on a RELAX NG schema. This article give developers a little background of the history, a short overview of some design decisions around KIWI’s RELAX NG schema, and how to customize it to your needs.

Invent it—The History

When KIWI was young, it used a W3C XML schema to validate its configuration file. Well, for several reasons (which is unimportant for this post) I don’t really like this schema language. Some time ago I discovered RELAX NG, a schema language which was used to develop DocBook 5.
RELAX NG has less than 30 elements, can be written in XML or in a compact syntax and is surprisingly simple. To become more familiar with this schema language, I thought it would be a good idea to rewrite KIWI’s W3C XML Schema into RELAX NG.

Although there are tools that can convert it, it is much more fun to do it manually. " class="wp-smiley" style="height: 1em; max-height: 1em;" /> Well, the old schema was also partely documented, so I thought this could be integrated as well. Well, after some testing I’ve sent the first draft of the RELAX NG schema to Marcus on November 11, 2007. I never thought he would integrate it into his production code. Well, you know the result. Thanks Marcus! " class="wp-smiley" style="height: 1em; max-height: 1em;" />

Design it—What We Need

Based on Marcus’ former W3C schema, the RELAX NG schema had the following design decisions in mind:

The Compact syntax (also known as RNC) was used.
Elements were defined as named patterns for easier customization.
Single attributes were also definied as named patterns.
A group of attributes were collected as named pattern.
A convention (naming schema) made the RELAX NG schema much more consistent. This naming schema is borrowed from DocBook 5.
Datatypes were used, if possible.
Annotations were integrated to document KIWI’s elements, attributes, and attribute values.

I will just focus on some principles. This makes it easier to find your way through the schema, if needed. To explain the complete schema would be too boring.

Investigate It—The Technical Details

Ok, let’s consider the image element, KIWI’s root element. KIWI’s RNC schema says:

k.image =
    ## The root element of the configuration file
    [
      db:para [
        "Each KIWI configuration file consists of a root element\x{a}" ~
        "        image."
      ]
    ]
    element image {
      k.image.attlist
      & k.description
      & k.preferences+
      & k.profiles?
      & k.instsource?
      & k.users*
      & k.drivers*
      & k.repository+
      & k.pxedeploy?
      & k.split?
      & k.packages*
      & k.vmwareconfig?
      & k.xenconfig?
    }

What does that mean?? Let’s go through it step by step:

k.image =
This is a definition of a named pattern. I used the convention k.ELEMENTNAME for each element in the schema.
## The root element of the configuration file
Although it looks like a comment, it is an annotation actually. Annotations are used to document the corresponding object, which is always a good idea. In this case this is even better: any XML editor which supports annotations can read it and displays it as tool tips or the like on request. Usually annotations are short.
[ db:para [ "Each KIWI configuration file ..." ] ]

RELAX NG allows you to insert elements from foreign namespaces. The db:para element is from DocBook 5. I used it to insert more descriptions or example when the object needs a more elaborate explanation. This element can also be create a kind of “API documentation”.
element image { ... }
We want an image element and this line defines it. The KIWI elements do not belong to a namespace at the moment.

k.image.attlist
This refers to all attributes of the image element. I used the convention k.ELEMENTNAME.attlist to group all attributes for the element ELEMENTNAME. A single attribute is named k.ELEMENTNAME.ATTRIBUTENAME.attribute. In its full beauty, the k.image.attlist pattern looks like this:
```
k.image.attlist = k.image.name.attribute
		& k.image.displayname.attribute?
		& k.image.inherit.attribute?
		& k.image.kiwirevision.attribute?
		& k.image.id?
		& k.image.schemaversion.attribute
		& ( k.image.noNamespaceSchemaLocation.attribute?
		  | k.image.schemaLocation.attribute? )?
```
As you can see, the image element contains several attributes, some of them are optional (flagged with the “?” character.) Attributes in XML have no order. The compact syntax expresses this with the interleave pattern, available as “&” character.
k.description & k.preferences+ & ...
The content model (relationships and structure) of the image element. The schema allows an unordered modell which is expressed with the interleave pattern (&).

To summerize it: each element in the KIWI schema contains a short annotation, a more verbose documentation in DocBook 5, and the corresponding content modell. Attributes have a similar structure.

Customize it—Modify The Schema

Why this effort you might ask? I made the KIWI RELAX NG schema extensible and added lots of named patterns so it is very easy to customize it.

Maybe you program a new functionality and need a new element or attribute. However, you still need the original, unchanged schema. How can you do this? One solution to this problem is to customize the KIWI schema: include the original schema and overwrite the named patterns with your changes. Some of these named patterns are introduced in the above list and it is straightforward to derive the name of a certain element or attribute according to the naming convention.

It is pretty easy to add or remove elements, attributes, or attribute values. For example, the following lines adds an optional remote attribute (definied in k.user.remote.attribute) to k.user.attlist which belongs to the user element:

include "KIWISchema.rnc"

k.user.remote.attribute =
  ## Is user a remote user?
  attribute remote { xsd:boolean }

k.user.attlist &= k.user.remote.attribute?

So what happens here?

First, the original KIWI RELAX NG schema is incorporated with include. As we want to add a new attribute, we define a new pattern and name it k.user.remote.attribute. There we insert the annotation and define the attribute remote.
Finally, we just extend the existing attribute collection k.user.attlist with our new attribute. This is done with the &= notation. If you used = you would overwrite the k.user.remote.attribute named pattern. The result is one attribute remote in the element user which is not what we intended.

I know, the example is a bit artifical. Normally you don’t need to touch the KIWI schema. However, if you need it, the article has demonstrated how you can extend the schema with just a few lines of code.

Save the above lines in a file and move it into the directory where the KIWI RNC schema is stored. Use the customization file in your code instead of the original schema to “activate” it. Your configuration file validates with the new, optional remote attribute in the user element.

Enjoy!

Playing With XPath Expressions in The xmllint Shell

Thomas Schraitle — Mon, 23 Nov 2009 09:22:06 +0000

When XML is transformed into something else, in most cases XSLT comes to play. One of the challenges of XSLT is to select just the nodes you are interested in. This task is done by XPath, “a query language for selecting nodes from a XML document.”

However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!

To make it easier: Test your XPath expressions in the internal xmllint shell!

Using Easy XPath Expressions

Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:


  Dancing with Penguins
  
    
      Tux
     Penguin
    
  
  
    Getting to Know Penguins
    
      Penguins are cute.
    
    
      The Head
      ...
    
    
    
      The Coat
      ...

The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:

xmllint --shell penguin-dance.xml

You first see the prompt:

/ >

The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.

Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:

/ > xpath /book
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT book
    ATTRIBUTE lang
      TEXT
        content=en

Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:

/ > cd book/chapter
  chapter >

Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.

Let’s see what we have inside book:

chapter > xpath *
  1  ELEMENT title
  2  ELEMENT abstract
  3  ELEMENT sect1
  4  ELEMENT sect1

Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:

chapter > xpath sect1[not(@id)]
  Object is a Node Set :
  Set contains 1 nodes:
  1  ELEMENT sect1

We need the title, so we just append /title after the previous expression:

chapter > xpath sect1[not(@id)]/title
  1  ELEMENT title

and we want the content so we wrap it into the string XPath function:

chapter > xpath string(sect1[not(@id)]/title)
  Object is a string : The Head

We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.

Using Namespaces in XPath Expressions

When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.

Let’s consider KIWI. The configuration is a XML file, based on a RELAX NG schema. The RELAX NG schema are bound to a namespace. Load the KIWI schema with the following command:

xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng

As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.

Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:

/ > xpath *
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT grammar
namespace db href=http://docbook.org/ns/docbook
namespace a href=http://relaxng.org/ns/compatibility/anno...
namespace rng href=http://relaxng.org/ns/structure/1.0
namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc...
default namespace href=http://relaxng.org/ns/structure/1.0
ATTRIBUTE datatypeLibrary
TEXT
content=http://www.w3.org/2001/XMLSchema-datatyp...

As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:

/ > setns r=http://relaxng.org/ns/structure/1.0

The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:

/ > xpath count(//r:element)
Object is a number : 80

Generally, every RELAX NG schema contains a start element. What contains it?

/ > xpath /r:grammar/r:start/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT ref
    ATTRIBUTE name
      TEXT
        content=k.image

Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:

/ > xpath /r:grammar/r:start/r:ref/@name
Object is a Node Set :
Set contains 1 nodes:
1  ATTRIBUTE name
    TEXT
      content=k.image

In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:

/ > xpath //r:define
Object is a Node Set :
Set contains 310 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image.name.attribute
...
310  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.users

Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image

What’s inside?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT element
    ATTRIBUTE name
      TEXT
        content=image

An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/*
Object is a Node Set :
Set contains 3 nodes:
1  ELEMENT a:documentation
2  ELEMENT db:para
3  ELEMENT interleave

The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema.

This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.

Happy XPath-ing!

DocBook-XML die Zweite!

Thomas Schraitle — Mon, 21 Sep 2009 01:00:59 +0000

(Disclaimer: This post is mainly intended for a German audience and describes my book about DocBook XML. For this reason, as an exception, the following text is written in German only.)

Nach ca. 1½ Jahren und unzähligen Stunden, dem Verschleiß von hunderten von Korrekturseiten und Stiften, einer überstandenen Druckerei-Pleite, viel verbrauchtem Gehirnschmalz, einem unwilligem PC und über 1000 Revisionen im SVN-Repository, ist jetzt die 2. Auflage meines Buches “DocBook-XML — Medienneutrales und plattformunabhängiges Publizieren” beim Millin-Verlag erschienen. 5 Jahre nach der ersten Auflage.

Was ist neu?

Eine Menge! Hier eine kleine Übersicht der Neuerungen:

Das Buch beschreibt sowohl die neue DocBook 5-Spezifikation als auch die ältere DocBook 4.x Version. Das komplette Buch habe ich für beide Versionen entsprechend angepasst.
Die Installation und Konfiguration von DocBook für Linux und Windows wird Schritt für Schritt beschrieben.
Es gibt viele neue Kapitel zu folgenden Themen:
- ein Kapitel zu den Unterschieden zwischen Version 4 und 5.
- RELAX NG beschreibt die Schemasprache, in der DocBook 5 geschrieben ist.
- Modulare Dokumente zeigt, wie Dokumente in DocBook aufgeteilt werden.
- Konditionale Elemente (Profiling), demonstriert die Fähigkeiten von Profiling, einer Methode um DocBook-Elemente auszublenden.
- E-Book-Format EPUB erzeugen, beschreibt, wie Sie aus Ihrem Dokument ein E-Book erstellen, das Sie auf jedem E-Book-Lesegerät anzeigen können, sofern es Formate wie EPUB oder Mobipocket versteht.
- Tipps und Tricks, gibt verschiedene Hilfestellungen und Tipps zu DocBook.
- Migration nach DocBook, erklärt wie Sie von einem älteren Format nach DocBook gelangen.
Viele Kapitel wurden überarbeitet. Beispielsweise wurde das alte Kapitel DocBook anwenden in vier separate Kapitel aufgeteilt: Strukturelemente, Block-Elemente, Inline-Elemente und Querverweise und Links.
Wie DocBook 4/5 angepasst werden kann wird selbstverständlich ebenso gezeigt.
Es gibt eine praktische Übersicht über alle DocBook-Elemente, in welcher Version sie erscheinen, wo sie angewendet werden usw. Für die tägliche Arbeit sehr nützlich zum Nachschlagen!
Der Index wurde stark erweitert und umfasst jetzt 18 Seiten.

Interessiert? Eine Leseprobe ist als PDF erhältlich.

Technisches

Beim Erstellen des Buches wurde nahezu ausschließlich Programme aus kontrolliertem Open-Source-Anbau verwendet. Einzige Ausnahmen waren der XML-Editor und der FO-Formatierer.

In der ersten Auflage wurde der DocBook 4-Quellcode noch nach LaTeX transformiert und daraus PDF bzw. PostScript erzeugt. Für die zweite Auflage wurde zuerst der vorhandene Quellcode mittels XSLT nach DocBook 5 umgewandelt. Mit Hilfe einer XSLT-Transformation wurde eine XSL-FO-Datei erstellt, die von einem FO-Formatierer eingelesen und in PDF umgewandelt wurde.

Als Textschrift wird die Charis von SIL International verwendet, für Überschriften die TheSans und für Listings die TheSansMono Condensed; die beiden Letzteren stammen von Lucas de Groot. Die Grafiken wurden entweder direkt in SVG mit Inkscape gezeichnet oder in OpenOffice.org Draw erstellt, nach PDF exportiert und im DocBook-Code entsprechend referenziert. Das Cover-Bild wurde mit Scribus erstellt und für die Druckerei nach TIFF exportiert.

Noch eine kleine Statistik: Das Buch enthält 23 Kapitel, 3 Anhänge, 230 Beispiele, 40 Abbildungen, 64 Tabellen, 65 Schritt-für-Schritt-Anleitungen, über 900 kleine und große Listings, 252 Links, 27 Zitate und über 970 Querverweise.

Wo erhältlich?

Das Buch können Sie in drei Versionen bestellen:

Als 662 Seiten starkes gedrucktes Buch (ISBN-13-Nummer: 978-3-938626-14-6) oder
als Online-Paket, bestehend aus EPUB, PDF und HTML zu einem Preis! Somit steht einem Einsatz auf E-Book-Readern, Laptops oder Browsern nichts im Wege.
In Planung ist derzeit, dass einzelne Kapitel separat heruntergeladen werden können. Somit können Sie sich nur die Kapitel heraussuchen, für die Sie sich interessieren.

Alle im Buch besprochenen Programme befinden sich bereits als einfach zu installierende Pakete für openSUSE. Weitere können von meinem Buildserver Repository nachinstalliert werden.

Vielen Dank an Alle, die daran ihren Anteil hatten!

Why Ant Sucks (Somehow)

Thomas Schraitle — Mon, 16 Feb 2009 12:44:17 +0000

What is Ant?

According to Ant’s webpage:

“Ant is a Java-based build tool. In theory, it is kind of like Make, without Make’s wrinkles and with the full portability of pure Java code.”

Sounds nice, isn’t it? But XML design problems make Ant nearly unusable which this post becomes kind of a rant…

The Current Situation

It started some time ago when I tried to use Ant for building a book. So I’ve looked into the documentation and saw Ant uses XML for their build files. “Great!”, I thought, “I start my XML editor and I’m ready to go!”

A good XML user should use a Schema whenever possible. So I searched for some DTD, RNG, or W3C Schema which my XML editor can use. However, I found this notice from the FAQ (cited from this page):

An incomplete DTD can be created by the task – but this one has a few problems:

It doesn’t know about required attributes. Only manual tweaking of this file can help here.
It is not complete – if you add new tasks via it won’t know about it. See this page by Michel Casabianca for a solution to this problem. Note that the DTD you can download at this page is based on Ant 0.3.1.
It may even be an invalid DTD. As Ant allows tasks writers to define arbitrary elements, name collisions will happen quite frequently – if your version of Ant contains the optional and tasks, there are two XML elements named test (the task and the nested child element of ) with different attribute lists. This problem cannot be solved; DTDs don’t give a syntax rich enough to support this.

(Hint: The DTD is not accessible through this link anymore—it can be found in the Ant-Wiki now.)

So with a XML editor on one hand, and with no schema on the other, Ant left me in the dark.

The Problem

So why then use XML? Why not use another syntax? The reason to use XML is not only it is easy to parse its syntax, it is also for its ability to validate your XML file. I don’t know the decisions that drove the Ant team to XML. However, it seems to me, Ant uses XML only halfhearted. This wouldn’t be bad per se. The bad thing is, that it circumvents XML good practices!

So what’s exactly my problem with Ant? I have these:

A XML Structure Without Schema is Useless
Ok, to be fair, not every XML structure needs a Schema. Very easy ones does not probably need one. However, if your structure becomes a bit more complicated, it is very important to have a Schema. It is even more important when your structure are used by other people and becomes widespread. How can people know if something is wrong when there is no Schema? Impossible! They have to run Ant every time or look into the documentation.
Redefining the XML Structure is Breaking XML
The taskdef is probably one of the worst elements in Ant. I have never seen this concept in any other XML structure before. Even when you have a Schema, it circumvents the structure by introducing new elements just “accidentally”. This makes the whole concept of your validation useless.
No Official Schema
Why using such “complicated” XML when no Schema is available? This seems not very userfriendly to offer XML without an underlying fundament.
Why XML?
Although you can tell me a “fan” of XML, I don’t think it is useful in every case. There are certainly reasons why the Ant team chooses XML over another syntax. However, then it should take into account of a schema too. I’m not fully convinced of the advantages of XML as a building tool (with the lack of a good schema.)

A Possible Solution

To improve the situation for XML users, I think the following should be interesting:

Provide a RELAX NG Schema (RNG)
In my humble opinion, this is the absolute minimum to work efficiently with Ant’s XML structure. The RNG should contain the core elements of Ant’s structure. Weather the RNG is liberal enough to allow further elements from other namespaces or not, is a matter of design.
If your XML editor does not support RNG, there should be a DTD or W3C Schema available. However, the semantic power is not always the same with these schema languages.
Use The Expressive Power of RELAX NG
Ant’s XML structure contains sometimes conditions like: “Use only one attribute from this list.” This condition can be expressed with RNG’s choice element. The advantage of this approach: your XML editor helps you in creating valid XML during your writing.
Get Rid of taskdef (And The Like)
This was not only totally unexpected, it is a nightmare for every XML editor (as described above). No Schema should be fooled with this “broken by design” method. There are other methods to introduce further elements, be it in a separate namespace or, for example, with an element usetask with an attribute name. This might be not as short as taskdef, but it is much more safer.
Document The RELAX NG Schema
With documenting I mean not only a decent HTML page, but also some helpful tips inside the RELAX NG schema too. Modern XML editors which support annotations, can show a small tool tip to provide users with helpful information. This is very useful as the user does not have to find the information-the information comes automatically through the XML editor.
Another advantage: You can extract the annotation to create a separate HTML page, for example.
Provide a Conversation Tool
You can only make these “radical” steps, when there is a good conversion tool from major version n to n+1 available. This could be a XSLT stylesheet that converts the taskdef into something more useful.
Offer a “Compact Syntax”
RELAX NG schemas can be written in two forms: as XML and as a compact syntax. This is very, very useful, as it allows users to take part when they don’t like XML. It is also possible to convert the two without problems. If you still need the validation structure, convert it into XML, validate it and you are done.

Schema design is not as easy as it looks. There are lots of problems that you only see after some time. That’s probably the reason, why the World Wide Web Consortium needs some time to declare a specification a W3C Recommendation.

XML can be good. But it can also be broken when it is badly designed.

Query your XML with xpathgrep.py

Thomas Schraitle — Mon, 09 Jun 2008 13:32:16 +0000

Maybe you know this problem: You have a couple of XML files and you need a specific information. Probably everybody would think of grep or similar tools first. But maybe your query is a bit more complicated than just a simple piece of text. What do do?

Recently I’ve found a very useful command line utility, which is probably not very known. It’s named xpathgrep.py and you can get it from the lxml repository (you need lxml too). Let’s assume we have the following DocBook file:

File db.xml


  My Cooking Book
  
    Ingredients
    ...
  
  
    How to cook
    ...

Now, if I want to get all the titles I have to use a XPath (which is a path description language for XML, similar to Unix/Linux paths, but more powerful). To get all title elements all I have to do is to write //title, regardless of the level:

$ xpathgrep.py //title db.xml

and I get this:

My Cooking Book Ingredients How to cook

Nice, isn’t it? Probably you say: “But, hey, I can get this with grep too!” Yes, but if you want just all chapter titles, you have a problem with grep. With XPath and xpathgrep.py I only modify my XPath expression a bit:

$ xpathgrep.py //chapter/title db.xml

Now this reduces the above output just to the wanted chapter titles. And I can extent my query just for all chapters that doesn’t have an id attribute:

$ xpathgrep.py '//chapter[not(@id)]/title' db.xml

(You need the apostroph because of the shell.) The tool outputs this:

Ingredients

That’s nice, isn’t it? There are a lot of more to discover. A few hours ago I send a small patch to the lxml-devel mailinglist to support namespaces. Hopefully, it will be accepted.