For those who has used KIWI manually already know the details: KIWI’s configuration file is XML and based on a RELAX NG schema. This article give developers a little background of the history, a short overview of some design decisions around KIWI’s RELAX NG schema, and how to customize it to your needs.
When KIWI was young, it used a W3C XML schema to validate its configuration file. Well, for several reasons (which is unimportant for this post) I don’t really like this schema language. Some time ago I discovered RELAX NG, a schema language which was used to develop DocBook 5.
RELAX NG has less than 30 elements, can be written in XML or in a compact syntax and is surprisingly simple. To become more familiar with this schema language, I thought it would be a good idea to rewrite KIWI’s W3C XML Schema into RELAX NG.
Although there are tools that can convert it, it is much more fun to do it manually. " class="wp-smiley" style="height: 1em; max-height: 1em;" /> Well, the old schema was also partely documented, so I thought this could be integrated as well. Well, after some testing I’ve sent the first draft of the RELAX NG schema to Marcus on November 11, 2007. I never thought he would integrate it into his production code. Well, you know the result. Thanks Marcus!
" class="wp-smiley" style="height: 1em; max-height: 1em;" />
Based on Marcus’ former W3C schema, the RELAX NG schema had the following design decisions in mind:
I will just focus on some principles. This makes it easier to find your way through the schema, if needed. To explain the complete schema would be too boring.
Ok, let’s consider the image element, KIWI’s root element. KIWI’s RNC schema says:
k.image = ## The root element of the configuration file [ db:para [ "Each KIWI configuration file consists of a root element\x{a}" ~ " image." ] ] element image { k.image.attlist & k.description & k.preferences+ & k.profiles? & k.instsource? & k.users* & k.drivers* & k.repository+ & k.pxedeploy? & k.split? & k.packages* & k.vmwareconfig? & k.xenconfig? }
What does that mean?? Let’s go through it step by step:
RELAX NG allows you to insert elements from foreign namespaces. The db:para element is from DocBook 5. I used it to insert more descriptions or example when the object needs a more elaborate explanation. This element can also be create a kind of “API documentation”.
k.image.attlist = k.image.name.attribute & k.image.displayname.attribute? & k.image.inherit.attribute? & k.image.kiwirevision.attribute? & k.image.id? & k.image.schemaversion.attribute & ( k.image.noNamespaceSchemaLocation.attribute? | k.image.schemaLocation.attribute? )?
As you can see, the image element contains several attributes, some of them are optional (flagged with the “?” character.) Attributes in XML have no order. The compact syntax expresses this with the interleave pattern, available as “&” character.
To summerize it: each element in the KIWI schema contains a short annotation, a more verbose documentation in DocBook 5, and the corresponding content modell. Attributes have a similar structure.
Why this effort you might ask? I made the KIWI RELAX NG schema extensible and added lots of named patterns so it is very easy to customize it.
Maybe you program a new functionality and need a new element or attribute. However, you still need the original, unchanged schema. How can you do this? One solution to this problem is to customize the KIWI schema: include the original schema and overwrite the named patterns with your changes. Some of these named patterns are introduced in the above list and it is straightforward to derive the name of a certain element or attribute according to the naming convention.
It is pretty easy to add or remove elements, attributes, or attribute values. For example, the following lines adds an optional remote attribute (definied in k.user.remote.attribute) to k.user.attlist which belongs to the user element:
include "KIWISchema.rnc" k.user.remote.attribute = ## Is user a remote user? attribute remote { xsd:boolean } k.user.attlist &= k.user.remote.attribute?
So what happens here?
First, the original KIWI RELAX NG schema is incorporated with include. As we want to add a new attribute, we define a new pattern and name it k.user.remote.attribute. There we insert the annotation and define the attribute remote.
Finally, we just extend the existing attribute collection k.user.attlist with our new attribute. This is done with the &= notation. If you used = you would overwrite the k.user.remote.attribute named pattern. The result is one attribute remote in the element user which is not what we intended.
I know, the example is a bit artifical. Normally you don’t need to touch the KIWI schema. However, if you need it, the article has demonstrated how you can extend the schema with just a few lines of code.
Save the above lines in a file and move it into the directory where the KIWI RNC schema is stored. Use the customization file in your code instead of the original schema to “activate” it. Your configuration file validates with the new, optional remote attribute in the user element.
Enjoy!
However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!
To make it easier: Test your XPath expressions in the internal xmllint shell!
Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:
<book lang="en"> <title>Dancing with Penguins</title> <bookinfo> <author> <firstname>Tux</firstname> <surname>Penguin</surname> </author> </bookinfo> <chapter id="know.penguins"> <title>Getting to Know Penguins</title> <abstract> <para>Penguins are cute.</para> </abstract> <sect1> <title>The Head</title> <para>...</para> </sect1> <!-- A small comment --> <sect1 id="penguin.coat"> <title>The Coat</title> <para>...</para> </sect1> </chapter> </book>
The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:
xmllint --shell penguin-dance.xml
You first see the prompt:
/ >
The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.
Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:
/ > xpath /book
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT book
ATTRIBUTE lang
TEXT
content=en
Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:
/ > cd book/chapter
chapter >
Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.
Let’s see what we have inside book:
chapter > xpath *
1 ELEMENT title
2 ELEMENT abstract
3 ELEMENT sect1
4 ELEMENT sect1
Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:
chapter > xpath sect1[not(@id)]
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT sect1
We need the title, so we just append /title after the previous expression:
chapter > xpath sect1[not(@id)]/title
1 ELEMENT title
and we want the content so we wrap it into the string XPath function:
chapter > xpath string(sect1[not(@id)]/title)
Object is a string : The Head
We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.
When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.
Let’s consider KIWI. The configuration is a XML file, based on a RELAX NG schema. The RELAX NG schema are bound to a namespace. Load the KIWI schema with the following command:
xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng
As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.
Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:
/ > xpath *
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT grammar
namespace db href=http://docbook.org/ns/docbook
namespace a href=http://relaxng.org/ns/compatibility/anno...
namespace rng href=http://relaxng.org/ns/structure/1.0
namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc...
default namespace href=http://relaxng.org/ns/structure/1.0
ATTRIBUTE datatypeLibrary
TEXT
content=http://www.w3.org/2001/XMLSchema-datatyp...
As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:
/ > setns r=http://relaxng.org/ns/structure/1.0
The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:
/ > xpath count(//r:element)
Object is a number : 80
Generally, every RELAX NG schema contains a start element. What contains it?
/ > xpath /r:grammar/r:start/*
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT ref
ATTRIBUTE name
TEXT
content=k.image
Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:
/ > xpath /r:grammar/r:start/r:ref/@name
Object is a Node Set :
Set contains 1 nodes:
1 ATTRIBUTE name
TEXT
content=k.image
In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:
/ > xpath //r:define
Object is a Node Set :
Set contains 310 nodes:
1 ELEMENT define
ATTRIBUTE name
TEXT
content=k.image.name.attribute
...
310 ELEMENT define
ATTRIBUTE name
TEXT
content=k.users
Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT define
ATTRIBUTE name
TEXT
content=k.image
What’s inside?
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/*
Object is a Node Set :
Set contains 1 nodes:
1 ELEMENT element
ATTRIBUTE name
TEXT
content=image
An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?
/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/*
Object is a Node Set :
Set contains 3 nodes:
1 ELEMENT a:documentation
2 ELEMENT db:para
3 ELEMENT interleave
The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema.
This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.
Happy XPath-ing!
Nach ca. 1½ Jahren und unzähligen Stunden, dem Verschleiß von hunderten von Korrekturseiten und Stiften, einer überstandenen Druckerei-Pleite, viel verbrauchtem Gehirnschmalz, einem unwilligem PC und über 1000 Revisionen im SVN-Repository, ist jetzt die 2. Auflage meines Buches “DocBook-XML — Medienneutrales und plattformunabhängiges Publizieren” beim Millin-Verlag erschienen. 5 Jahre nach der ersten Auflage.
Eine Menge! Hier eine kleine Übersicht der Neuerungen:
Interessiert? Eine Leseprobe ist als PDF erhältlich.
Beim Erstellen des Buches wurde nahezu ausschließlich Programme aus kontrolliertem Open-Source-Anbau verwendet. Einzige Ausnahmen waren der XML-Editor und der FO-Formatierer.
In der ersten Auflage wurde der DocBook 4-Quellcode noch nach LaTeX transformiert und daraus PDF bzw. PostScript erzeugt. Für die zweite Auflage wurde zuerst der vorhandene Quellcode mittels XSLT nach DocBook 5 umgewandelt. Mit Hilfe einer XSLT-Transformation wurde eine XSL-FO-Datei erstellt, die von einem FO-Formatierer eingelesen und in PDF umgewandelt wurde.
Als Textschrift wird die Charis von SIL International verwendet, für Überschriften die TheSans und für Listings die TheSansMono Condensed; die beiden Letzteren stammen von Lucas de Groot. Die Grafiken wurden entweder direkt in SVG mit Inkscape gezeichnet oder in OpenOffice.org Draw erstellt, nach PDF exportiert und im DocBook-Code entsprechend referenziert. Das Cover-Bild wurde mit Scribus erstellt und für die Druckerei nach TIFF exportiert.
Noch eine kleine Statistik: Das Buch enthält 23 Kapitel, 3 Anhänge, 230 Beispiele, 40 Abbildungen, 64 Tabellen, 65 Schritt-für-Schritt-Anleitungen, über 900 kleine und große Listings, 252 Links, 27 Zitate und über 970 Querverweise.
Das Buch können Sie in drei Versionen bestellen:
Alle im Buch besprochenen Programme befinden sich bereits als einfach zu installierende Pakete für openSUSE. Weitere können von meinem Buildserver Repository nachinstalliert werden.
Vielen Dank an Alle, die daran ihren Anteil hatten!
]]>According to Ant’s webpage:
“Ant is a Java-based build tool. In theory, it is kind of like Make, without Make’s wrinkles and with the full portability of pure Java code.”
Sounds nice, isn’t it? But XML design problems make Ant nearly unusable which this post becomes kind of a rant…
It started some time ago when I tried to use Ant for building a book. So I’ve looked into the documentation and saw Ant uses XML for their build files. “Great!”, I thought, “I start my XML editor and I’m ready to go!”
A good XML user should use a Schema whenever possible. So I searched for some DTD, RNG, or W3C Schema which my XML editor can use. However, I found this notice from the FAQ (cited from this page):
So with a XML editor on one hand, and with no schema on the other, Ant left me in the dark.
So why then use XML? Why not use another syntax? The reason to use XML is not only it is easy to parse its syntax, it is also for its ability to validate your XML file. I don’t know the decisions that drove the Ant team to XML. However, it seems to me, Ant uses XML only halfhearted. This wouldn’t be bad per se. The bad thing is, that it circumvents XML good practices!
So what’s exactly my problem with Ant? I have these:
To improve the situation for XML users, I think the following should be interesting:
Schema design is not as easy as it looks. There are lots of problems that you only see after some time. That’s probably the reason, why the World Wide Web Consortium needs some time to declare a specification a W3C Recommendation.
XML can be good. But it can also be broken when it is badly designed.
]]>