Chapter 1. Unit Test: chapter.003.xml

In principle, the stylesheets will run with any conformant XSLT 3.0 processor. For many users, that means Saxon. Although earlier versions may work, Saxon 10.1 or later is recommended.

In principle, the instructions for using the stylesheets are straightforward: using your XSLT 3.0 processor of choice, transform your DocBook source documents with the docbook.xsl stylesheet in the xslt directory of the distribution.

In practice, for most users, running the stylesheets will require getting a Java environment configured appropriately. For many, one of the most significant challenges is getting all of the dependencies sorted out. Modern software development, for better or worse, often consists of one library relying on another which relies on another, etc.

The DocBook xslTNG stylesheets attempt to simplify this process, especially for the “out of the box” experience by providing two convenience methods for running the stylesheets: a jar file with a Main class, and a Python script that attempts, among other things, to make sure all of the dependencies are available.

If you’re an experience Java user, you may prefer to simply run the stylesheets directly with Java.

Irrespective of which method you choose, running the stylesheets is simply a matter of processing your input document myfile.xml with xslt/docbook.xsl. For example:

$ saxon myfile.xml -xsl:xslt/docbook.xsl -o:myfile.html

The exact path to docbook.xsl will vary, of course, but it’s in the xslt directory of the distribution.

ⓘ

Note

The resulting HTML document contains references to CSS stylesheets and possibly JavaScript libraries. The output won’t look as nice in your browser if those resources aren’t available. They’re in the resources directory of the distribution. A quick and easy way to see the results is simply to send the output to the samples directory from the distribution. The resources have already been copied into that directory. In the longer run, you’ll want to make sure that they get copied into the output directory for each of your projects.

Alternatively, you can copy them to a web location of your choosing and point to them there. You can even point to them in the DocBook CDN, but beware that those are not immutable. The “current” version will change with every release and versioned releases will not persist indefinitely.

Change the resource-base-uri to adjust the paths used in the output document.

Many aspects of the transformation can be controlled simply by setting parameters. It’s also possible to change the transformation by writing your own customization layer.

1. Using the Jar

The ZIP distribution includes a JAR file that you can run directly. That JAR file is $ROOT/libs/docbook-xslTNG-version.jar where “$ROOT” is whatever directory you chose to unzip the distribution into and version is the stylesheet version.

Assuming you unzipped the version XXX distribution into /home/ndw/xsltng, you can run the JAR like this:

java -jar /home/ndw/xsltng/libs/docbook-xslTNG-XXX.jar

Let’s try it out. Open a shell window and change to the samples directory, /home/ndw/xsltng/samples assuming you unzipped it as described above. Now run the java command:

$ java -jar ../libs/docbook-xslTNG-XXX.jar article.xml
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml">
…more HTML here...
<nav class="bottom"></nav></body></html>

That big splash of HTML was your first DocBook document formatted by the stylesheets! Slightly more usefully, you can save that HTML in a file:

$ java -jar ../libs/docbook-xslTNG-XXX.jar article.xml \
        -o:article.html

If you now open article.html in your favorite web browser, you’ll see the transformed sample document which should look like …

The JAR file, run this way, accepts the same command line options as Saxon, with a few caveats:

No -x, -y, or -r options: The executable in the JAR file automatically configures Saxon to use a catalog-based resolver and points the resolver at a catalog that includes the files in the distribution.
No -init option: The DocBook xslTNG extension functions will be registered automatically.
Multiple -catalog options: You can repeat the -catalog option. All of the catalogs you specify will be searched before the default catalog.
Default stylesheet: If you do not specify a stylesheet with the -xsl option, the xslt/docbook.xsl stylesheet will be used automatically.

2. Using the Python script

The ZIP distribution includes a Python script in the bin directory. This helper script is a convenience wrapper around Saxon. It sets up the Java classpath and automatically configures a catalog resolver and the DocBook extension functions.

☝

Important

The script requires the click and pygments packages, which you must install with pip before running the script. For example:

python3 -m pip install pygments=2.6.1 click

This script behaves much like the JAR file described in …. In particular, it accepts the same command line options as Saxon, with the same caveats.

The significant feature of the Python script is that it will attempt to sort out the dependencies for you. It assumes that you’ve used Maven to install the package and its dependencies, so you’ll have to have installed Maven. How you do that varies by platform, but your package manager probably has it.

The following command will assure that you’ve downloaded all of the necessary dependencies. You only have to do this once.

$ mvn org.apache.maven.plugins:maven-dependency-plugin:2.4:get \
      -Dartifact=org.docbook:docbook-xslTNG:XXX

That might take a while.

The script will work through the dependencies that you have installed, and the things that they depend on, and construct a Java class path that includes them all.

The script stores its configuration in .docbook-xsltng.json in your home directory.

Options passed to the script are processed as follows: the initial options, all introduced by two hyphens, are interpreted by this script; all the remaining options are passed directly to Saxon.

The script options are:

--help: Prints a usage message.
--config:filename: Use filename as the configuration file. The default configuration file is .docbook-xsltng.json in your home directory.
--resources:dir: This option will copy the resources directory (the CSS and JavaScript files) from the distribution into the directory where your output files are going, dir. If dir is not specified, the script attempts to work out the directory from the -o option provided to Saxon. If no directory is specified and it can’t work out what the directory is, it does nothing.
--java:javaexec: Use javaexec as the Java executable. The default java executable is the first one on your PATH.
--home:dir: Use dir as the DocBook xslTNG home directory. This should be the location where you unzipped the distribution. (You probably shouldn’t change this.)
--verbose: Enables verbose mode; it prints more messages about what it finds.
--debug: Enables debug mode. Instead of running the transformation, it will print out the command that would have been run.
--: Immediately stop interpreting options. Everything that follows this option will be passed to Saxon, even if it begins with two hyphens.

3. Run with Java

Assuming you’ve organized your class path so that all of the dependencies are available (you may find that using a tool like Gradle or Maven simplifies this process), simply run the Saxon class.

For Saxon HE, the class is net.sf.saxon.Transform. For Saxon PE and EE, the class is com.saxonica.Transform.

4. Run with Docker

This is experimental.

The docker directory contains an experimental Dockerfile. Using docker allows you to isolate the environment necessary to run the DocBook xslTNG Stylesheets from your local environment.

Using Docker is a three step process. Step 0, you have to have installed Docker!

Build the docker image. In the docker directory, run the docker build command:
```
$ docker build -t docbook-xsltng .
```
The “-t” option provides a tag for the image; you can make this anything you want. There’s a VERSION build argument if you want to build a particular release. For example,
```
$ docker build --build-arg VERSION=0.9.14 -t docbook-xsltng .
```
will build a Docker image for the 0.9.14 release of the stylesheets irrespective of the version in the Dockerfile.
Run the image, for example:
```
$ docker run docbook-xsltng samples/article.xml
```
If you chose a different tag when you built the image, use that tag in place of docbook-xsltng in the run command. Everything after the container tag becomes options to the docbook Python script.

ⓘ

Note

The context the script runs in is inside the container. It can’t for example, see your local filesystem. The example above works because the distribution is unpacked inside the container. So the article.xml document isn’t the one on your local filesystem.

You can use the Docker facilities for mounting directories to change what documents the script can access. For example:

$ docker run -v /tmp:/output -v /path/to/samples:/input \
       docbook-xsltng /input/article.xml chunk=index.html \
       chunk-output-base-uri=/output/

Assuming that the “samples” directory in the distribution is located at /path/to/samples, this will chunk the article.xml sample document that the script sees in /input (which is where you mounted samples) and it will write the output to /output (which is where you mounted /tmp).

When the script finishes, the chunked output will be in /tmp.

☞

Tip

If you choose to use Docker, you don’t have to rebuild the container everytime a new stylesheet release occurs. You can simply mount the new xslt directory into the container like any other directory.

5. Extension functions

The stylesheets are distributed with several extension functions:

ext:cwd: Returns the “current working directory” where the processor is running.
ext:image-properties: Returns basic properties of an image, width and height.
ext:image-metadata: Returns much more comprehensive image properties and understands far more image types than ext:image-properties. Requires the … libraries.
ext:pygmentize: Runs the external … processor on a verbatim listing to add syntax highlighting.
ext:pygmentize-available: Returns true if the external … processor is available on the current system.
ext:xinclude: Performs XInclude processing. This extension supports the basic XPointer schemes, RFC 5147 fragment identifiers, and …, a scheme that supports searching in text documents.
ext:validate-with-relax-ng: Performs RELAX NG validation.

At the time of this writing, all of these extension functions require Saxon 10.1 or later. Make sure that the docbook-xsltng-version.jar file is on your class path and use the Saxon -init option to load them:

-init:org.docbook.xsltng.extensions.Register

5.1. Extension function debugging

When an extension function fails, or produces result other than what you expect, it can be difficult sometimes to work out what happened. You can enable debugging messages by setting the the system property org.docbook.xsltng.verbose.

Setting the property to the value “true” enables all of the debugging messages. For a more selective approach, set it to a comma separated list of keyword values.

The following keywords are recognized:

registration: Enables messages related to function registration.
image-properties: Enables messages related to image properties.
image-errors: Enables messages related to image properties, but only when the function was unable to find the properties or encountered some sort of error condition.
pygmentize-show-command: Enables a message that will show the pygmentize command as it was run.
pygmentize-show-results: Enables a message that will show the output of the pygmentize command, before it is processed by the function.
pygmentize-errors: Enables messages related to errors encountered attempting to highlight listings with pygmentize.

6. “Chunked” output

Transforming myfile.xml with docbook.xsl usually produces a single HTML document. For large documents, books like this one for example, it’s sometimes desirable to break the input document into a collection of web pages. You can achieve this with the DocBook xslTNG Stylesheets by setting two parameters:

chunk

This parameter should be set to the name that you want to use for the first, or top, page of the result. The name index.html is a common choice.

chunk-output-base-uri

This parameter should be set to the absolute path where you want to use as the base URI for the result documents, for example /home/ndw/output/guide/.

ⓘ

Note

The trailing slash is important, this is a URI. If you specify only /home/ndw/output/guide, the base URI will be taken to be /home/ndw/output/, and the documents won’t have the URIs you expect.

This output URI has nothing to do with where your documents are ultimately published and the documents themselves won’t contain any references to it. It simply establishes the root of output. If you’re running your XSLT processor from the command line, it’s likely that the documents will be written to that location. If you’re running an XProc pipeline, it simply controls the URIs that appear on the secondary result port.

Many aspects of chunking can be easily customized. A few of the most relevant parameters and templates are:

chunk-include and chunk-exclude: Taken together, these two parameters determine what elements in your source document will be considered “chunks” in the output.
persistent-toc: If this parameter is true, then a JavaScript “fly-out” table of contents will be available on every page.
chunk-nav: This parameter, discussed more thoroughly in speaker notes enables keyboard navigation between chunks.
t:top-nav and t:bottom-nav: These templates control how the top-of-page and bottom-of-page navigation aids are constructed.

6.1. Keyboard navigation and speaker notes

If the chunk-nav parameter is true, a reference to an additional JavaScript library will be included in the resulting pages. This library supports keyboard navigation between the pages. The navigation keys are described in the parameter reference page.

There is an additional customization layer (xslt/speaker-notes.xsl) provided for adding speaker notes to the pages. This is provided both as an example of a customization layer and because the author finds it convenient.

If you use the speaker notes customization layer, the any top-level element in a chunk with the role “speaker-notes” will be suppressed from the default presentation. If you press “S” on the page, then you’ll get a “speaker notes” view of the page.

This can be combined with another extension, the use of browser local storage, to create a simple presentation system. Add this meta tag to the info element of your document:

<meta xmlns="http://www.w3.org/1999/xhtml"
      name="localStorage.key" content="keyname"/>

That will cause the pages to keep track of their location using the “keyname” property in local storage. This is important because it enables the following trick:

Configure keyboard navigation, speaker notes, and local storage in your document.
Arrange for your document to be served up from a web server. You can do this by running one locally or by putting the documents on a web server somewhere else.
Open up the main page of your document in a browser.
Open up a second browser window pointing to the same page. Navigate back and forth between the pages. You should see that the two windows stay in sync.
Now press “S” in one of the windows and navigate around. You should see that the two windows stay in sync and that your speaker notes are consistently presented in one of the windows.

I often use this trick when I’m giving presentations. I can project the slides in one browser window and keep the other browser window on my laptop. This allows me to see my notes while easily projecting the “real” content.

7. Effectivity attributes and profiling

When documenting computer hardware and software systems, it’s very common to have different documentation sets that overlap signficantly. Documentation for two different models of network router, for example, might differ only in a few specific details. Or a user guide aimed at experts might have a lot in common with the new user guide.

7.1. Effectivity

There are many ways to address this problem, but one of the simplest is to identify the “effectivity” of different parts of a document. Effectivity in this context means identifying the parts of a document that are effective for different audiences.

When a document is formatted, the stylesheets can selectively include or omit elements based on their configured effectivity. This “profiled” version of the document is the one that’s explicitly targeted to the audience specified.

DocBook supports a wide variety of common attributes for this purpose:

Table 1.1. Common DocBook effectivity attributes

Attribute	Nominal effectivity axis
arch	The architecture, Intel or AMD
audience	The audience, operations or development
condition	Any condition (semantically neutral)
conformance	The conformance level
os	The operating system, Windows or Linux
outputformat	The output format, print or online
revision	The revision, 3.4 or 4.0.
security	The security, secret or top-secret
userlevel	The user level, novice or expert
vendor	The vendor, Acme or Yoyodyne
wordsize	The word size, 32 or 64 bit

In addition, the stylesheets support profiling on several common attributes that are not explicitly for effectivity: xml:lang, revisionflag, and role.

ⓘ

Note

DocBook places no constraints on the values used for effectivity and the stylesheets don’t either. You’re free to use “cat” and “dog” as effectivity values in the wordsize attribute, if you wish. The further you deviate from the nominal meaning, the more important it is to document your system!

Consider ….

<para>This is an utterly contrived example of
some common text. Options are introduced with the
<phrase os="windows">/</phrase>
<phrase os="mac;linux">-</phrase> character.</para>

Example 1.1. A contrived effectivity example

If this document is formatted with the profile-os parameter set to “windows”, it will produce:

This is an utterly contrived example of some common text. Options are introduced with the character.

If “mac” or “linux” is specified, it will produce:

This is an utterly contrived example of some common text. Options are introduced with the - character.

☝

Important

If the document is formatted without any profiling, all of the versions will be included:

This is an utterly contrived example of some common text. Options are introduced with the / - character.

That is unlikely to work well.

7.2. Profiling

The profiling parameters are applied to every document: profile-arch, profile-audience, profile-condition, profile-conformance, profile-lang, profile-os, profile-outputformat, profile-revision, profile-revisionflag, profile-role, profile-security, profile-userlevel, profile-vendor, and profile-wordsize. Each of these values is treated as a string and broken into tokens at the profile-separator.

For every element in the source document:

If it specifies a value for an effectivity attribute, the value is split into tokens at the profile-separator.
If the corresponding profile parameter is not empty, then the element is discarded unless at least one of the tokens in the profile parameter list is also in the effectivity list.

In practice, elements that don’t specify effectivity are always included and profile parameters that are empty don’t exclude any elements.

7.3. Dynamic profiling

Dynamic profiling is a feature that allows you to profile the output of the stylesheets according to the runtime values of stylesheet parameters. You can, for example, produce different output depending on whether or not chunking is enabled or JavaScript is being used for annotations.

To enable dynamic profiling, set the dynamic-profiles parameter to “true”.

In the interest of performance, security, and legibility, dynamic profiles don’t support arbitrary expressions. You can use a variable name by itself, $flag, which tests if that variable is true, or you can use a simple comparison, $var=value which tests if (the string value of) $var has the value value. (If $var is a list, it’s an existential test.) You also can’t use boolean operators or any other fancy expressions.

If you really need to have a dynamic profile based on some arbitrary condition, you can do it by making a customization layer that stores that computation in a variable and then testing that variable in your dynamic profile.

An element with dynamic profiling will be published if none of it’s profile expressions evaluate to false. This is slightly different from the ordinary profiling semantic which publishes the element if any of it’s values match.

8. Syntax highlighting

Program listings and other verbatim environments can be “syntax highlighted”, that is, the significant tokens in the listing can be colored differently (keywords in red, quoted strings in blue, that sort of thing).

The default syntax highlighter is …, an external Python program. This has the advantage that the highlighted listing is available to the stylesheets. The stylesheets can then render line numbers, call outs, and other features.

But running an external program for every verbatim environment requires having the external program and also, if there are many verbatim environments, may slow down the formatting process

An alternative is to use a JavaScript highlighter in the browser such as highlight.js or Prism. This approach has no impact on formatting and doesn’t require an external process. However, it means the xslTNG Stylesheets have no control over the process. Most of the verbatim options only apply when Pygments is used.

The choice of syntax highlighter is determined by the verbatim-syntax-highlighter parameter.

9. Persistent ToC

The persistent Table of Contents (ToC) provides a full ToC for an entire document accessible from each chunked page.

The ToC is accessed by clicking on the “book” icon in the upper right corner of the page as shown in ….

The icon and other aspects of the style can be changed by providing persistent-toc-css.

Once open, the ToC is displayed. A long ToC will be scrolled to the location of the current page in the document as shown in ….

The persistent ToC popup is transient by default, meaning that it will disappear if you use it to navigate to a different page. If you open the popup by “shift-clicking” on it, the ToC will persist until you dismiss it. This can also be accomplished by selecting the check box in the ToC. The presense of the search bar is controlled by the persistent-toc-search parameter.

The ToC can be stored in a separate file or stored in each chunk. This is controlled by the persistent-toc-filename.

If chunking is enabled and the persistent-toc-filename parameter is non-empty, it’s used as a filename and a single copy of the ToC will be saved in that file.
The benefit of this approach is that the HTML chunks are smaller. If the persistent ToC is written into every chunk, the size of each HTML chunk increases in proportion to the size of the ToC. For a large document with lots of small pages, this can be a significant percentage of the overall size.
The disadvantage of this approach is that opening the ToC requires another document to be loaded into the browser. For a large ToC, this can introduce visible latency, although browser caching tends to reduce that after the document has been loaded once. More significantly, this will not work unless the documents are served with http (and in some environments, perhaps https). The browser will (quite reasonably) not allow JavaScript to load documents from the filesystem.
If the persistent-toc-filename parameter is the empty sequence, a copy of the ToC is stored in each chunk.
ⓘ
Note
When stored in each chunk, the Table of Contents is secreted away in a script element so that it will be ignored by screen readers and other user agents that don’t support JavaScript or CSS.
The benefit of this approach is that it requires no additional document to be loaded and will work even if the documents are loaded with file URIs.
The disadvantage of this approach is that it increases the size of each chunk. Whether that matters depends on the size of the ToC, the relative size of the chunks, bandwidth and other constraints.
If chunking is not being used, there will only be one HTML result and the ToC will always be stored in that chunk.

10. Print output (dead tree editions)

Formatters, the tools that turn markup of any sort into aesthetically pleasing (or even passably acceptable) printed pages are fiendishly difficult to write.

In the XML space, there have been a number of standards and vendor-specific solutions to this problem. The current standards are XSL FO and CSS.

At present, the DocBook xslTNG Stylesheets are focused on CSS for print output. There’s a customization layer that produces “paged-media-ready” HTML that can be processed with a CSS formatter such as Antenna House or Prince.

To get print output, format your documents with the print.xsl stylesheet instead of the docbook.xsl stylesheet. The additional cleanup provided by print.xsl assures that footnotes, annotations, and other elements will appear in the right place, and with reasonable presentation, in the printed version.

The resulting HTML document can be formatted directly with a CSS paged-media formatter.

11. EPUB output

The DocBook xslTNG Stylesheets will produce output designed for EPUB(3) if you use the epub.xsl stylesheet instead of docbook.xsl. This is new in version 1.11.0 and may be incomplete. The output conforms to EPUBCheck version 3.2.

Producing an EPUB file is a slightly complicated process. You must produce (X)HTML that conforms to strict requirements, you must produce a media type document containing a specific text string, you must produce a manifest that identifies all of the content including all the images, stylesheets, fonts, etc, and you must finally create a ZIP archive (with some special consideration as well).

The stylesheets can only do part of this process. In some future release where we use, for example, an XProc 3.0 pipeline, it may be practical to do more.

When you run the EPUB stylesheet, the principle result document is the media type document. This has two useful side effects: first, it establishes the output base URI from which all the relative documents can be created, and second, if you fail to process some element in the input, you’re likely to get extra text characters in the principle result document. That will cause tools to reject the EPUB and draw your attention to the error.

The stylesheets also produce the META-INF files and the OPS directory containing the document parts and the manifest.

There are two parameters specific to EPUB:

pub-id: This is the unique identifier for your EPUB. If you don’t specify one, a random one will be generated for you.
manifest-extra: This is a URI. If it’s specified, then it must be an XML document and that will be added to the EPUB manifest. This is how you can add links to media and other resources that the stylesheets don’t know about.

11.1. Adding metadata

You can add elements to the info element of the root element of your document to add metadata to your EPUB files. Elements in the Dublin Core namespace will be copied through. You can also add the elements meta and link in the special namespace http://docbook.org/ns/docbook/epub.

11.2. EPUB in action

The Getting Started project has been updated to show how to create EPUB from a book. The project has support for dealing with external media, fonts, and constructing the final ZIP file.