RELAX NG – openSUSE Lizards

KIWI RELAX NG Schema Explained

Thomas Schraitle — Sun, 06 Dec 2009 19:20:29 +0000

KIWI, invented by Marcus Schäfer, is a magnificent tool to build your own SUSE Linux distribution. It is also the backend of SUSE Studio.

For those who has used KIWI manually already know the details: KIWI’s configuration file is XML and based on a RELAX NG schema. This article give developers a little background of the history, a short overview of some design decisions around KIWI’s RELAX NG schema, and how to customize it to your needs.

Invent it—The History

When KIWI was young, it used a W3C XML schema to validate its configuration file. Well, for several reasons (which is unimportant for this post) I don’t really like this schema language. Some time ago I discovered RELAX NG, a schema language which was used to develop DocBook 5.
RELAX NG has less than 30 elements, can be written in XML or in a compact syntax and is surprisingly simple. To become more familiar with this schema language, I thought it would be a good idea to rewrite KIWI’s W3C XML Schema into RELAX NG.

Although there are tools that can convert it, it is much more fun to do it manually. " class="wp-smiley" style="height: 1em; max-height: 1em;" /> Well, the old schema was also partely documented, so I thought this could be integrated as well. Well, after some testing I’ve sent the first draft of the RELAX NG schema to Marcus on November 11, 2007. I never thought he would integrate it into his production code. Well, you know the result. Thanks Marcus! " class="wp-smiley" style="height: 1em; max-height: 1em;" />

Design it—What We Need

Based on Marcus’ former W3C schema, the RELAX NG schema had the following design decisions in mind:

The Compact syntax (also known as RNC) was used.
Elements were defined as named patterns for easier customization.
Single attributes were also definied as named patterns.
A group of attributes were collected as named pattern.
A convention (naming schema) made the RELAX NG schema much more consistent. This naming schema is borrowed from DocBook 5.
Datatypes were used, if possible.
Annotations were integrated to document KIWI’s elements, attributes, and attribute values.

I will just focus on some principles. This makes it easier to find your way through the schema, if needed. To explain the complete schema would be too boring.

Investigate It—The Technical Details

Ok, let’s consider the image element, KIWI’s root element. KIWI’s RNC schema says:

k.image =
    ## The root element of the configuration file
    [
      db:para [
        "Each KIWI configuration file consists of a root element\x{a}" ~
        "        image."
      ]
    ]
    element image {
      k.image.attlist
      & k.description
      & k.preferences+
      & k.profiles?
      & k.instsource?
      & k.users*
      & k.drivers*
      & k.repository+
      & k.pxedeploy?
      & k.split?
      & k.packages*
      & k.vmwareconfig?
      & k.xenconfig?
    }

What does that mean?? Let’s go through it step by step:

k.image =
This is a definition of a named pattern. I used the convention k.ELEMENTNAME for each element in the schema.
## The root element of the configuration file
Although it looks like a comment, it is an annotation actually. Annotations are used to document the corresponding object, which is always a good idea. In this case this is even better: any XML editor which supports annotations can read it and displays it as tool tips or the like on request. Usually annotations are short.
[ db:para [ "Each KIWI configuration file ..." ] ]

RELAX NG allows you to insert elements from foreign namespaces. The db:para element is from DocBook 5. I used it to insert more descriptions or example when the object needs a more elaborate explanation. This element can also be create a kind of “API documentation”.
element image { ... }
We want an image element and this line defines it. The KIWI elements do not belong to a namespace at the moment.

k.image.attlist
This refers to all attributes of the image element. I used the convention k.ELEMENTNAME.attlist to group all attributes for the element ELEMENTNAME. A single attribute is named k.ELEMENTNAME.ATTRIBUTENAME.attribute. In its full beauty, the k.image.attlist pattern looks like this:
```
k.image.attlist = k.image.name.attribute
		& k.image.displayname.attribute?
		& k.image.inherit.attribute?
		& k.image.kiwirevision.attribute?
		& k.image.id?
		& k.image.schemaversion.attribute
		& ( k.image.noNamespaceSchemaLocation.attribute?
		  | k.image.schemaLocation.attribute? )?
```
As you can see, the image element contains several attributes, some of them are optional (flagged with the “?” character.) Attributes in XML have no order. The compact syntax expresses this with the interleave pattern, available as “&” character.
k.description & k.preferences+ & ...
The content model (relationships and structure) of the image element. The schema allows an unordered modell which is expressed with the interleave pattern (&).

To summerize it: each element in the KIWI schema contains a short annotation, a more verbose documentation in DocBook 5, and the corresponding content modell. Attributes have a similar structure.

Customize it—Modify The Schema

Why this effort you might ask? I made the KIWI RELAX NG schema extensible and added lots of named patterns so it is very easy to customize it.

Maybe you program a new functionality and need a new element or attribute. However, you still need the original, unchanged schema. How can you do this? One solution to this problem is to customize the KIWI schema: include the original schema and overwrite the named patterns with your changes. Some of these named patterns are introduced in the above list and it is straightforward to derive the name of a certain element or attribute according to the naming convention.

It is pretty easy to add or remove elements, attributes, or attribute values. For example, the following lines adds an optional remote attribute (definied in k.user.remote.attribute) to k.user.attlist which belongs to the user element:

include "KIWISchema.rnc"

k.user.remote.attribute =
  ## Is user a remote user?
  attribute remote { xsd:boolean }

k.user.attlist &= k.user.remote.attribute?

So what happens here?

First, the original KIWI RELAX NG schema is incorporated with include. As we want to add a new attribute, we define a new pattern and name it k.user.remote.attribute. There we insert the annotation and define the attribute remote.
Finally, we just extend the existing attribute collection k.user.attlist with our new attribute. This is done with the &= notation. If you used = you would overwrite the k.user.remote.attribute named pattern. The result is one attribute remote in the element user which is not what we intended.

I know, the example is a bit artifical. Normally you don’t need to touch the KIWI schema. However, if you need it, the article has demonstrated how you can extend the schema with just a few lines of code.

Save the above lines in a file and move it into the directory where the KIWI RNC schema is stored. Use the customization file in your code instead of the original schema to “activate” it. Your configuration file validates with the new, optional remote attribute in the user element.

Enjoy!

Playing With XPath Expressions in The xmllint Shell

Thomas Schraitle — Mon, 23 Nov 2009 09:22:06 +0000

When XML is transformed into something else, in most cases XSLT comes to play. One of the challenges of XSLT is to select just the nodes you are interested in. This task is done by XPath, “a query language for selecting nodes from a XML document.”

However, it can be tedious to create a XPath expression, run the transformation, and check if you got the expected result. After hours of debugging you find out: It’s the wrong XPath expression!

To make it easier: Test your XPath expressions in the internal xmllint shell!

Using Easy XPath Expressions

Generally, xmllint is known as a popular tool to validate your XML structure. Mostly unknown is its internal shell. With this shell you can make some spiffy XPath tests and check if it returns exactly what you want. Let’s consider the following DocBook 4 document:


  Dancing with Penguins
  
    
      Tux
     Penguin
    
  
  
    Getting to Know Penguins
    
      Penguins are cute.
    
    
      The Head
      ...
    
    
    
      The Coat
      ...

The content is not so important than the structure. To examine some XPath features of xmllint, we load the document into its shell using its --shell option:

xmllint --shell penguin-dance.xml

You first see the prompt:

/ >

The prompt shows you the path to your current node. After loading you just see the root node, which is indicated as /. Pretty similar than a Linux path notation.

Use help to list all available commands. For this little post, we focus on the xpath command. It evaluates an XPath expression in the context and prints the result. Let’s try an absolute XPath:

/ > xpath /book
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT book
    ATTRIBUTE lang
      TEXT
        content=en

Well, that was to be expected. The interesting part is, you can change the context. For example, we could change it to the first chapter:

/ > cd book/chapter
  chapter >

Surprised we didn’t use an absolute XPath? Well, our context was already the root node, containing the book node. In this case, it doesn’t matter to use a relative or absolute XPath. Both lead to the same node. However, this is not always the case.

Let’s see what we have inside book:

chapter > xpath *
  1  ELEMENT title
  2  ELEMENT abstract
  3  ELEMENT sect1
  4  ELEMENT sect1

Yes, that’s right. Ok, we want all sections in this chapter, that don’t have an id attribute. This can be achieved by using a XPath predicate and the XPath function not:

chapter > xpath sect1[not(@id)]
  Object is a Node Set :
  Set contains 1 nodes:
  1  ELEMENT sect1

We need the title, so we just append /title after the previous expression:

chapter > xpath sect1[not(@id)]/title
  1  ELEMENT title

and we want the content so we wrap it into the string XPath function:

chapter > xpath string(sect1[not(@id)]/title)
  Object is a string : The Head

We could use a lot more expressions to get the previous or following nodes, the parent nodes or the child nodes. For now, this section is enough and I make it a bit more difficult.

Using Namespaces in XPath Expressions

When dealing with XML it is not uncommon that documents contain one or more XML namespaces. To work with such structures, it is not enough to reuse the previous expressions. They will not work. Before you can work with namespaces, you have to define it first.

Let’s consider KIWI. The configuration is a XML file, based on a RELAX NG schema. The RELAX NG schema are bound to a namespace. Load the KIWI schema with the following command:

xmllint --shell http://gitorious.org/kiwi/kiwi/blobs/raw/master/modules/KIWISchema.rng

As the KIWI schema can (and probably will) change, your results might be a little different than mine. But the principle is the same.

Now we want to know, what contains the root element. As we do not know (yet) the root element’s name, we use a wildcard:

/ > xpath *
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT grammar
namespace db href=http://docbook.org/ns/docbook
namespace a href=http://relaxng.org/ns/compatibility/anno...
namespace rng href=http://relaxng.org/ns/structure/1.0
namespace xsi href=http://www.w3.org/2001/XMLSchema-instanc...
default namespace href=http://relaxng.org/ns/structure/1.0
ATTRIBUTE datatypeLibrary
TEXT
content=http://www.w3.org/2001/XMLSchema-datatyp...

As you can see, the KIWI schema defines 5 namespaces in the grammar element. A RELAX NG schema uses the namespace http://relaxng.org/ns/structure/1.0 which is bound to the rng prefix in our case. For convenience reason, we define it with the setns command just as r:

/ > setns r=http://relaxng.org/ns/structure/1.0

The prefix is unimportant, important is the namespace. We could use the one which is definied in the schema, it wouldn’t matter. But “r” is shorter than “rng”. After you have definied the XML namespace, you can enter all XPath expressions. However, you have to insert the prefix in front of your element names. For example, we can count all definied elements in the KIWI schema. RELAX NG uses the name element (surprise!) for this. To get the number of all definied elements, apply the XPath function count on the // expression:

/ > xpath count(//r:element)
Object is a number : 80

Generally, every RELAX NG schema contains a start element. What contains it?

/ > xpath /r:grammar/r:start/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT ref
    ATTRIBUTE name
      TEXT
        content=k.image

Aha, there is a ref element. This element contains an attribute name. We could also use an absolute path. Let’s try it:

/ > xpath /r:grammar/r:start/r:ref/@name
Object is a Node Set :
Set contains 1 nodes:
1  ATTRIBUTE name
    TEXT
      content=k.image

In RELAX NG, every ref element has to point to a define element. Let’s see what we get, when we want it all, using the // expression again:

/ > xpath //r:define
Object is a Node Set :
Set contains 310 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image.name.attribute
...
310  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.users

Ohh, that’s a bit too much. We want to know just the one from /r:grammar/r:start/r:ref/@name. The good news is: you can combine both with a predicate:

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT define
    ATTRIBUTE name
      TEXT
        content=k.image

What’s inside?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/*
Object is a Node Set :
Set contains 1 nodes:
1  ELEMENT element
    ATTRIBUTE name
      TEXT
        content=image

An image element! This element appears as root element in every KIWI configuration (which you guessed already.) And what’s the definition?

/ > xpath //r:define[@name=/r:grammar/r:start/r:ref/@name ]/r:element/*
Object is a Node Set :
Set contains 3 nodes:
1  ELEMENT a:documentation
2  ELEMENT db:para
3  ELEMENT interleave

The first two elements (a:documentation, db:para) are just for documentation. Interesting part is interleave. I leave it up to you, to investigate XPath and the KIWI schema.

This was just an overview of the xmllint shell. It is very helpful to test some XPath expressions before you integrate them in XSLT or in programs.

Happy XPath-ing!

DocBook-XML die Zweite!

Thomas Schraitle — Mon, 21 Sep 2009 01:00:59 +0000

(Disclaimer: This post is mainly intended for a German audience and describes my book about DocBook XML. For this reason, as an exception, the following text is written in German only.)

Nach ca. 1½ Jahren und unzähligen Stunden, dem Verschleiß von hunderten von Korrekturseiten und Stiften, einer überstandenen Druckerei-Pleite, viel verbrauchtem Gehirnschmalz, einem unwilligem PC und über 1000 Revisionen im SVN-Repository, ist jetzt die 2. Auflage meines Buches “DocBook-XML — Medienneutrales und plattformunabhängiges Publizieren” beim Millin-Verlag erschienen. 5 Jahre nach der ersten Auflage.

Was ist neu?

Eine Menge! Hier eine kleine Übersicht der Neuerungen:

Das Buch beschreibt sowohl die neue DocBook 5-Spezifikation als auch die ältere DocBook 4.x Version. Das komplette Buch habe ich für beide Versionen entsprechend angepasst.
Die Installation und Konfiguration von DocBook für Linux und Windows wird Schritt für Schritt beschrieben.
Es gibt viele neue Kapitel zu folgenden Themen:
- ein Kapitel zu den Unterschieden zwischen Version 4 und 5.
- RELAX NG beschreibt die Schemasprache, in der DocBook 5 geschrieben ist.
- Modulare Dokumente zeigt, wie Dokumente in DocBook aufgeteilt werden.
- Konditionale Elemente (Profiling), demonstriert die Fähigkeiten von Profiling, einer Methode um DocBook-Elemente auszublenden.
- E-Book-Format EPUB erzeugen, beschreibt, wie Sie aus Ihrem Dokument ein E-Book erstellen, das Sie auf jedem E-Book-Lesegerät anzeigen können, sofern es Formate wie EPUB oder Mobipocket versteht.
- Tipps und Tricks, gibt verschiedene Hilfestellungen und Tipps zu DocBook.
- Migration nach DocBook, erklärt wie Sie von einem älteren Format nach DocBook gelangen.
Viele Kapitel wurden überarbeitet. Beispielsweise wurde das alte Kapitel DocBook anwenden in vier separate Kapitel aufgeteilt: Strukturelemente, Block-Elemente, Inline-Elemente und Querverweise und Links.
Wie DocBook 4/5 angepasst werden kann wird selbstverständlich ebenso gezeigt.
Es gibt eine praktische Übersicht über alle DocBook-Elemente, in welcher Version sie erscheinen, wo sie angewendet werden usw. Für die tägliche Arbeit sehr nützlich zum Nachschlagen!
Der Index wurde stark erweitert und umfasst jetzt 18 Seiten.

Interessiert? Eine Leseprobe ist als PDF erhältlich.

Technisches

Beim Erstellen des Buches wurde nahezu ausschließlich Programme aus kontrolliertem Open-Source-Anbau verwendet. Einzige Ausnahmen waren der XML-Editor und der FO-Formatierer.

In der ersten Auflage wurde der DocBook 4-Quellcode noch nach LaTeX transformiert und daraus PDF bzw. PostScript erzeugt. Für die zweite Auflage wurde zuerst der vorhandene Quellcode mittels XSLT nach DocBook 5 umgewandelt. Mit Hilfe einer XSLT-Transformation wurde eine XSL-FO-Datei erstellt, die von einem FO-Formatierer eingelesen und in PDF umgewandelt wurde.

Als Textschrift wird die Charis von SIL International verwendet, für Überschriften die TheSans und für Listings die TheSansMono Condensed; die beiden Letzteren stammen von Lucas de Groot. Die Grafiken wurden entweder direkt in SVG mit Inkscape gezeichnet oder in OpenOffice.org Draw erstellt, nach PDF exportiert und im DocBook-Code entsprechend referenziert. Das Cover-Bild wurde mit Scribus erstellt und für die Druckerei nach TIFF exportiert.

Noch eine kleine Statistik: Das Buch enthält 23 Kapitel, 3 Anhänge, 230 Beispiele, 40 Abbildungen, 64 Tabellen, 65 Schritt-für-Schritt-Anleitungen, über 900 kleine und große Listings, 252 Links, 27 Zitate und über 970 Querverweise.

Wo erhältlich?

Das Buch können Sie in drei Versionen bestellen:

Als 662 Seiten starkes gedrucktes Buch (ISBN-13-Nummer: 978-3-938626-14-6) oder
als Online-Paket, bestehend aus EPUB, PDF und HTML zu einem Preis! Somit steht einem Einsatz auf E-Book-Readern, Laptops oder Browsern nichts im Wege.
In Planung ist derzeit, dass einzelne Kapitel separat heruntergeladen werden können. Somit können Sie sich nur die Kapitel heraussuchen, für die Sie sich interessieren.

Alle im Buch besprochenen Programme befinden sich bereits als einfach zu installierende Pakete für openSUSE. Weitere können von meinem Buildserver Repository nachinstalliert werden.

Vielen Dank an Alle, die daran ihren Anteil hatten!

Why Ant Sucks (Somehow)

Thomas Schraitle — Mon, 16 Feb 2009 12:44:17 +0000

What is Ant?

According to Ant’s webpage:

“Ant is a Java-based build tool. In theory, it is kind of like Make, without Make’s wrinkles and with the full portability of pure Java code.”

Sounds nice, isn’t it? But XML design problems make Ant nearly unusable which this post becomes kind of a rant…

The Current Situation

It started some time ago when I tried to use Ant for building a book. So I’ve looked into the documentation and saw Ant uses XML for their build files. “Great!”, I thought, “I start my XML editor and I’m ready to go!”

A good XML user should use a Schema whenever possible. So I searched for some DTD, RNG, or W3C Schema which my XML editor can use. However, I found this notice from the FAQ (cited from this page):

An incomplete DTD can be created by the task – but this one has a few problems:

It doesn’t know about required attributes. Only manual tweaking of this file can help here.
It is not complete – if you add new tasks via it won’t know about it. See this page by Michel Casabianca for a solution to this problem. Note that the DTD you can download at this page is based on Ant 0.3.1.
It may even be an invalid DTD. As Ant allows tasks writers to define arbitrary elements, name collisions will happen quite frequently – if your version of Ant contains the optional and tasks, there are two XML elements named test (the task and the nested child element of ) with different attribute lists. This problem cannot be solved; DTDs don’t give a syntax rich enough to support this.

(Hint: The DTD is not accessible through this link anymore—it can be found in the Ant-Wiki now.)

So with a XML editor on one hand, and with no schema on the other, Ant left me in the dark.

The Problem

So why then use XML? Why not use another syntax? The reason to use XML is not only it is easy to parse its syntax, it is also for its ability to validate your XML file. I don’t know the decisions that drove the Ant team to XML. However, it seems to me, Ant uses XML only halfhearted. This wouldn’t be bad per se. The bad thing is, that it circumvents XML good practices!

So what’s exactly my problem with Ant? I have these:

A XML Structure Without Schema is Useless
Ok, to be fair, not every XML structure needs a Schema. Very easy ones does not probably need one. However, if your structure becomes a bit more complicated, it is very important to have a Schema. It is even more important when your structure are used by other people and becomes widespread. How can people know if something is wrong when there is no Schema? Impossible! They have to run Ant every time or look into the documentation.
Redefining the XML Structure is Breaking XML
The taskdef is probably one of the worst elements in Ant. I have never seen this concept in any other XML structure before. Even when you have a Schema, it circumvents the structure by introducing new elements just “accidentally”. This makes the whole concept of your validation useless.
No Official Schema
Why using such “complicated” XML when no Schema is available? This seems not very userfriendly to offer XML without an underlying fundament.
Why XML?
Although you can tell me a “fan” of XML, I don’t think it is useful in every case. There are certainly reasons why the Ant team chooses XML over another syntax. However, then it should take into account of a schema too. I’m not fully convinced of the advantages of XML as a building tool (with the lack of a good schema.)

A Possible Solution

To improve the situation for XML users, I think the following should be interesting:

Provide a RELAX NG Schema (RNG)
In my humble opinion, this is the absolute minimum to work efficiently with Ant’s XML structure. The RNG should contain the core elements of Ant’s structure. Weather the RNG is liberal enough to allow further elements from other namespaces or not, is a matter of design.
If your XML editor does not support RNG, there should be a DTD or W3C Schema available. However, the semantic power is not always the same with these schema languages.
Use The Expressive Power of RELAX NG
Ant’s XML structure contains sometimes conditions like: “Use only one attribute from this list.” This condition can be expressed with RNG’s choice element. The advantage of this approach: your XML editor helps you in creating valid XML during your writing.
Get Rid of taskdef (And The Like)
This was not only totally unexpected, it is a nightmare for every XML editor (as described above). No Schema should be fooled with this “broken by design” method. There are other methods to introduce further elements, be it in a separate namespace or, for example, with an element usetask with an attribute name. This might be not as short as taskdef, but it is much more safer.
Document The RELAX NG Schema
With documenting I mean not only a decent HTML page, but also some helpful tips inside the RELAX NG schema too. Modern XML editors which support annotations, can show a small tool tip to provide users with helpful information. This is very useful as the user does not have to find the information-the information comes automatically through the XML editor.
Another advantage: You can extract the annotation to create a separate HTML page, for example.
Provide a Conversation Tool
You can only make these “radical” steps, when there is a good conversion tool from major version n to n+1 available. This could be a XSLT stylesheet that converts the taskdef into something more useful.
Offer a “Compact Syntax”
RELAX NG schemas can be written in two forms: as XML and as a compact syntax. This is very, very useful, as it allows users to take part when they don’t like XML. It is also possible to convert the two without problems. If you still need the validation structure, convert it into XML, validate it and you are done.

Schema design is not as easy as it looks. There are lots of problems that you only see after some time. That’s probably the reason, why the World Wide Web Consortium needs some time to declare a specification a W3C Recommendation.

XML can be good. But it can also be broken when it is badly designed.