Chapter 3Customizing the stylesheets

In many circumstances, the stylesheets can be used “out of the box” without any customization. But sometimes you may need to change the formatting of certain elements. One common reason is to change the formatting of title pages or navigational features. In other cases, it may be to support local extensions to DocBook or simply to change the markup to support a particular use case.

Three approaches are possible, with increasing degrees of effort: changing stylesheet parameters, creating your own customization layer, or making broader changes to the stylesheet’s implementation.

The subject of broader implementation changes is the subject of Chapter 5, Implementation details. In this chapter, we’ll look at the easier options.

3.1Changing stylesheet parameters

The DocBook xslTNG Stylesheets define a lot of parameters. They are all described in IParameter reference. If the change you want to make has already been parameterized, you may be able to achieve your goal simply by setting a parameter at runtime.

For example, if you want to change the formatting of dates and times in date elements, you can simply change the date and time formatting parameters. Similarly, if you want to change the numeration style of ordered lists, you can simply change the ordered list item numeration parameter.

These changes can be accomplished by simply passing the new values to the processor, on the command line or in a configuration file, for example. You do not have to write any XSLT to make these changes.

Parameter values apply to the entire document processed by the stylesheets. In some cases, you may wish to change the presentation of just one or small number of elements. This can often be accomplished with a db processing instruction in the source document itself. These customizations can also be accomplished without writing any XSLT.

If you want to make a change that isn’t supported by a parameter, or an ad hoc exception that doesn’t have a supporting processing instruction, you will have to write a customization layer. (You are invited to submit an issue with your use case if you think it would be of general interest.)

You may also find it convenient to write a customization layer if you want to change several parameters and you find it inconvenient to pass them all to the processor on every invocation.

3.2Creating a customization layer

A customization layer is simply an XSLT stylesheet that you write which extends the DocBook stylesheets. The simplest* customization layer is:

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet
   |    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |    xmlns:db="http://docbook.org/ns/docbook"
 5 |    xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |    xmlns="http://www.w3.org/1999/xhtml"
   |    exclude-result-prefixes="db xs"
   |    version="3.0">
   | 
10 |<!-- This href has to point to your local copy
   |     of the stylesheets. -->
   |<xsl:import href="docbook/xslt/docbook.xsl"/>
   | 
   |</xsl:stylesheet>

This customization doesn’t do anything. But you can, for example, redefine parameters if you wish:

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet
   |    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |    xmlns:db="http://docbook.org/ns/docbook"
 5 |    xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |    xmlns="http://www.w3.org/1999/xhtml"
   |    exclude-result-prefixes="db xs"
   |    version="3.0">
   | 
10 |<xsl:import href="docbook/xslt/docbook.xsl"/>
   | 
   |<xsl:param name="orderedlist-item-numeration"
   |           select="'1'"/>
   | 
15 |<xsl:param name="date-dateTime-format"
   |           select="'[D01] [MNn,*-3] [Y0001]
   |                   at [H01]:[m01]'"/>
   | 
   |</xsl:stylesheet>

This will have the effect of running the DocBook stylesheets with those two parameters set as specified.

If you want to change the HTML output for an element, you can write a template for that element in your customization layer. Consider this DocBook document:

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<article xmlns="http://docbook.org/ns/docbook"
   |         version="5.1">
   |<info>
 5 |<title>Sample Document</title>
   |<date>2020-07-05</date>
   |</info>
   | 
   |<para>This is a sample <productname>DocBook</productname>
10 |document.</para>
   | 
   |</article>

Suppose that you decided you wanted to have the productname element link automatically to the vendor webpage.

Important

The DocBook xslTNG Stylesheets process all DocBook elements in the m:docbook mode. This is different from previous XSLT stylesheets for DocBook which simply used the default mode.

You must either specify a default mode in your customization layer or remember to specify the mode on match templates and template applications. If you forget the mode, you’ll get unexpected results!

One way to do that would be to redefine the template that processes the productname element:

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet
   |    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |    xmlns:db="http://docbook.org/ns/docbook"
 5 |    xmlns:m="http://docbook.org/ns/docbook/modes"  
   |    xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |    xmlns="http://www.w3.org/1999/xhtml"
   |    exclude-result-prefixes="db m xs"  
   |    version="3.0">
10 | 
   |<xsl:import href="docbook/xslt/docbook.xsl"/>
   | 
   |<xsl:param name="orderedlist-item-numeration"
   |           select="'1'"/>
15 | 
   |<xsl:param name="date-dateTime-format"
   |           select="'[D01] [MNn,*-3] [Y0001]
   |                   at [H01]:[m01]'"/>
   | 
20 |<xsl:template match="db:productname"
   |              mode="m:docbook">  
   |  <xsl:variable name="name"
   |                select="normalize-space(.)"/>
   |
25 |  <xsl:variable name="url" as="xs:string?">
   |    <xsl:choose>
   |      <xsl:when test="$name='DocBook'">
   |        <xsl:sequence select="'https://docbook.org/'"/>
   |      </xsl:when>
30 |      <xsl:when test="$name='DocBook xslTNG Stylesheets'">
   |        <xsl:sequence select="'https://xsltng.docbook.org/'"/>
   |      </xsl:when>
   |      <xsl:when test="$name='Wikipedia'">
   |        <xsl:sequence select="'https://wikipedia.org/'"/>
35 |      </xsl:when>
   |      <xsl:otherwise>
   |        <!-- Unrecognized -->
   |      </xsl:otherwise>
   |    </xsl:choose>
40 |  </xsl:variable>
   |                  
   |  <xsl:choose>
   |    <xsl:when test="empty($url)">
   |      <xsl:next-match/>  
45 |    </xsl:when>
   |    <xsl:otherwise>
   |      <a href="{$url}" title="Home page">
   |        <xsl:next-match/>  
   |      </a>
50 |    </xsl:otherwise>
   |  </xsl:choose>
   |</xsl:template>
   | 
   |</xsl:stylesheet>

All of the DocBook elements are processed in the “m:docbook” mode.

Remember to exclude all the namespaces you declare so that they don’t wind up scattered about in your HTML.

I repeat, all of the DocBook elements are processed in the “m:docbook” mode. I expect failure to declare this mode is going to be a common error.

Yes, this whole listing is rather cramped. I’m trying to make it all narrow enough to fit in the display without making horizontal scrolling necessary.

Calling xsl:next-match invokes the underlying processing. The effect of this template is to wrap an HTML “a” around the default processing for productname.

It’s worth pointing out that if the tag has an xlink:href attribute, that will generate an HTML a as well. A more robust stylesheet would check for that, but I’m trying to keep the example simple.

3.3Managing CSS stylesheets

The HTML that the DocBook xslTNG stylesheet produce is intended to be clean, robust markup for styling with CSS. Exactly how you control which stylesheet links are produced has changed several times. The current scheme is this:

  1. If syntax highlighting is enabled, a link to the $verbatim-syntax-highlight-css stylesheet is included.

  2. If $persistent-toc is true a link to the $persistent-toc-css stylesheet is included.

  3. If $use-docbook-css is true, links to the standard DocBook stylesheets are included. Those stylesheets are docbook.css (for all media), docbook-screen.css (for screen media), and docbook-page-setup.css and docbook-paged.css (for print media).

  4. The DocBook element that is the context element when the HTML head is being generated is processed in the m:html-head-links mode. By default, that template does nothing, but you can change that in a customization layer.

  5. If any CSS stylesheets are defined in $user-css-links, they are included.

  6. The DocBook element that is the context element when the HTML head is being generated is processed in the m:html-head-last mode. By default, that template does nothing, but you can change that in a customization layer.

3.4Managing media

References to external media through imagedata, videodata, audiodata, and even textdata can be tricky to manage. On the one hand, it’s most convenient if the URIs in the source documents point to the actual media files. This allows extensions, like the image properties extension function, to access the files. At the same time, the references generated in the HTML have to point to the locations where they will be published. It is often, but not always, the case that the authoring structures and the publishing structures are the same.

The stylesheets are regularly tested against five possible arrangements: three where the media are stored in locations relative to the XML files and two where the media are stored in a separate hierarchy. These are unimaginative named “mo-1”, “mo-2”, “mo-3”, “mo-4”, and “mo-5”. You can find them in the src/test/resources/xml hierarchy in the repository.

mo-1

All of the XML files are in a single directory, the media are in the same hierarchy. Media references in the source use relative URIs to refer to the underlying media: preface.xml refers to the “this is a test” audio clip as media/this-is-a-test.mp3.

mo-2

The XML files are in different directories (this changes the base URI of the media elements). The media are in the same hierarchy. Media references in the source use relative URIs to refer to the underlying media: front/preface.xml refers to the “this is a test” audio clip as ../media/spinning-top.mp4.

mo-3

The XML files are in different directories, but the structure is deeper. This scenario represents the case where there might be multiple books, each with their own media, but also a shared media folder “above” the book hierarchies. The media are in the same hierarchy, but some are “above” the book. Media references in the source use relative URIs to refer to the underlying media: book/front/preface.xml refers to the “this is a test” audio clip as ../../media/spinning-top.mp4.

mo-4

The XML files are still in different directories, but the significant change here is that the media are in their own hierarchy. Media references in the source use URIs relative to the root of that hierarchy: book/front/preface.xml refers to the “this is a test” audio clip as spinning-top.mp4.

mo-5

The XML files are in different directories and the media are in their own hierarchy. What’s different here is that the media hierarchy is further subdivided by media type. Media references in the source use URIs relative to the root of media hierarchy without the media type: book/front/preface.xml still refers to the “this is a test” audio clip as spinning-top.mp4, but this time it is found in media/mp4/spinning-top.mp4 rather than directly in media.

For each arrangement, we look at five possible output structures:

  1. A single HTML document with the media in the same relative locations as the sources.

  2. A single HTML document with the media in a single media subdirectory.

  3. “Chunked” HTML output with the media in the same relative locations as the sources.

  4. “Chunked” HTML output with the media in custom locations. (This is especially tricky for the “mo-5 case because there are two kinds of customization involved.)

  5. “Chunked” HTML output with the media in a single media subdirectory.

The list below gives a brief summary of the parameters used to achieve the desired results for each combination of input and output arrangements.

Note

Remember that in each case, the questions are: can the stylesheets find the media files to query them and are the correct HTML references produced? Actually copying the media files from where they are in the source system to where they need to be in the HTML is “not our problem.”

mo-1, mo-2, and mo-3 / scenario 1

No parameters are needed, this combination works correctly with the defaults.

mo-1, mo-2, and mo-3 / scenario 2
  |mediaobject-output-base-uri = "media/"
  |mediaobject-output-paths = "false"

The output base URI is relative to the “root” of the HTML result. Setting the output paths to “false” removes intermediate hierarchy from the image references.

mo-1, mo-2, and mo-3 / scenario 3
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"

These parameters aren’t related to media objects, they just tell the stylesheets how and where to “chunk” the output.

mo-1, mo-2, and mo-3 / scenario 4
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"

This combination is really the same as the previous except that it uses a custom stylesheet with a template in the m:mediaobject-output-adjust mode to add an extra level of hierarchy to the output URIs. This is just an example of arbitrary, custom processing.

mo-1, mo-2, and mo-3 / scenario 5
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"
  |mediaobject-output-base-uri = "media/"
  |mediaobject-output-paths = "false"

The output base URI is relative to the “root” of the HTML result. Setting the output paths to “false” removes intermediate hierarchy from the image references.

mo-4 / scenario 1
  |mediaobject-input-base-uri = "../media/"

The input base URI will be made absolute relative to the base URI of the input document, so it’s often convenient to specify it as a relative URI. It’s equally possible to specify it as an absolute URI.

mo-4 / scenario 2
  |mediaobject-input-base-uri = "../media/"
  |mediaobject-output-base-uri = "media/"
  |mediaobject-output-paths = "true"

This example has two images with the same name in different directories, so it’s necessary to preserve the output paths.

mo-4 / scenario 3
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"
  |mediaobject-input-base-uri = "../media/"

This is the combination of chunking and a single media directory.

mo-4 / scenario 4
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"
  |mediaobject-input-base-uri = "../media/"

This combination is really the same as the previous except that it uses a custom stylesheet with a template in the m:mediaobject-output-adjust mode to add an extra level of hierarchy to the output URIs. This is just an example of arbitrary, custom processing.

mo-4 / scenario 5
  |chunk = "index.html"
  |chunk-output-base-uri = "/path/to/output/location/"
  |mediaobject-input-base-uri = "../media/"
  |mediaobject-output-base-uri = "media/"
  |mediaobject-output-paths = "true"

This is effectively scenario 2 with chunking.

mo-5 / scenarios 1-5

The “mo-5” scenarios are all the same as the “mo-4” scenarios with the addition of one more parameter:

  |mediaobject-grouped-by-type = "true"

In each case, this adds the extra “media object type” level to the URI path.

If you download the source repository, you can see these combinations in action with the build targets “mo_number_test_scenario”, for example, run:

  |./gradlew mo_3_test_2

to see the results of processing “mo-3” in scenario 2. The output will be in the build/actual directory. The build target all_mo_tests will run them all.

3.5Controlling numeration

Numeration refers to the process(es) by which sets, books, divisions, components, sections, and formal objects are numbered. There are three separate aspects to numeration: what’s numbered, where does numbering begin, and does the number inherit from its ancestors.

Consider this book:

 1 |<book>
   |  <title>Book title</title>
   |  <part>
   |    <title>Part title</title>
 5 |    <chapter>
   |      <title>Chapter title</title>
   |      <para></para>
   |    </chapter>
   |  </part>
10 |  <part>
   |    <title>Another part title</title>
   |    <chapter>
   |      <title>Another chapter title</title>
   |      <para></para>
15 |    </chapter>
   |    <chapter>
   |      <title>Yet another chapter title</title>
   |      <para></para>
   |    </chapter>
20 |  </part>
   |</book>

Let’s suppose that parts are numbered “I” and “II”. (The number format is controlled by the localization, see Chapter 4, Localization.) If chapter numbering begins at the book level, those chapters will be numbered “1”, “2”, and “3”. If chapter numbering begins at the division level (the part), those chapters will be numbered “1”, “1”, and “2”. If division numbers are inherited, those numbers will be “I.1”, “II.1”, “II.2”.

In the 1.x versions of these stylesheets, all of the aspects of numeration were controlled by three now obsolete parameters: $component-numbers-inherit, $division-numbers-inherit, and $section-numbers-inherit. In the 2.x stylesheets, the various aspects can be controlled independently and the result is much more consistent, if a bit more complicated.

The default numeration parameters are designed to cover the most common use cases and are specified with strings so that they’re easy to control with parameters. Any numeration scheme can be implemented with a customization layer, but hopefully that will be necessary only rarely and in uncommon cases.

To simplify the problem, we divide the DocBook elements into six categories:

sets

The set is the only member of this category.

books

The book is the only member of this category.

divisions

The divisions elements are part and reference.

components

The component elements are acknowledgements, appendix, article, bibliography, chapter, colophon, dedication, glossary, index, partintro, preface, refentry, and setindex.

sections

The section elements are section, sect1, sect2, sect3, sect4, sect5, simplesect. The refentry section elements are not included because they are not typically numbered.

formal objects

The formal objects are figure, table, example, equation, formalgroup, procedure.

There’s a bit of complexity here. A formalgroup that contains figures counts as a figure, a formalgroup that contains tables counts as a table, etc. An equation or procedure only counts as a formal object if it has a title.

Six parameters control where numbering starts (or restarts): $sets-number-from, $books-number-from, $divisions-number-from, $components-number-from, $sections-number-from, and $formal-objects-number-from. In each case, the value of the parameter must be the name of one of the categories. Sets and books can only number from sets, divisions can number from sets or books, components can number from sets, books, or divisions, etc. It is also possible to specify the value root to indicate that elements in the relevant category are numbered sequentially through the whole document.

To assure consistency, “numbering from” resets when the specified category or one of its ancestors is encountered. In other words, if you’re formatting a set of books and numbering components from divisions, the numbering resets when a new division, book, or set begins.

Six parameters control how numbers are inherited: $sets-inherit-from, $books-inherit-from, $divisions-inherit-from, $components-inherit-from, $sections-inherit-from, and $formal-objects-inherit-from. Like the “number from” parameters, each parameter takes the value of the categories above it. In this case, however, you can specify more than one category.

For example, the default value for formal objects is to inherit from “component section”. That means that the first figure in chapter 2 will be labeled “2.1” and the first figure in the first section in chapter 2 will be labeled “2.1.1”, etc. This most closely reproduces the numbering from the 1.x stylesheets.

3.6Creating something completely different

Your input documents go through several pre-processing steps before they are rendered into HTML. If you want to produce completely different outputs, the place to start is with root template in the m:docbook mode.

Consider, for example, the task of creating a JSON version of the Table of Contents. In principle, you could write your own stylesheet to do this, but leveraging the DocBook xslTNG Stylesheets means you can make use of functions like f:generate-id() to create links.

To produce completely different results, override the root template in the m:docbook mode:

1 |<xsl:template match="/" mode="m:docbook">
  |  <xsl:document>
  |    <!-- your processing here -->
  |  </xsl:document>
5 |</xsl:template>

This template must return a document node.

Note that you can mix-and-match your processing with default processing by processing DocBook elements in the m:docbook mode.

Here is a simple example of a stylesheet that produces a JSON version of the Table of Contents for a DocBook document:

 1 |<?xml version="1.0" encoding="utf-8"?>
   |<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   |                xmlns:db="http://docbook.org/ns/docbook"
   |                xmlns:f="http://docbook.org/ns/docbook/functions"
 5 |                xmlns:m="http://docbook.org/ns/docbook/modes"
   |                xmlns:t="http://docbook.org/ns/docbook/templates"
   |                xmlns:xs="http://www.w3.org/2001/XMLSchema"
   |                xmlns="http://www.w3.org/1999/xhtml"
   |                exclude-result-prefixes="db f m t xs"
10 |                version="3.0">
   | 
   |  <!-- This href has to point to your local copy
   |       of the stylesheets. -->
   |  <xsl:import href="docbook/xslt/docbook.xsl"/>
15 | 
   |  <xsl:output method="text"/>
   | 
   |  <!-- Suppress xslTNG's default HTML output; note that this template
   |       must return a document node.  -->
20 |  <xsl:template match="/" mode="m:docbook">
   |    <xsl:document>
   |      <xsl:apply-templates select="." mode="TOC"/>
   |    </xsl:document>
   |  </xsl:template>
25 | 
   |  <!-- The templates below generate a simple JSON ToC. -->
   | 
   |  <xsl:template match="/" mode="TOC">
   |    {"toc": [
30 |    <xsl:apply-templates mode="TOC"/>
   |    ]}
   |  </xsl:template>
   | 
   |  <xsl:template match="db:part|db:article|db:section|db:chapter" mode="TOC"
35 |                expand-text="yes">
   |    <xsl:if test="preceding-sibling::db:part
   |                  | preceding-sibling::db:article
   |                  | preceding-sibling::db:section
   |                  | preceding-sibling::db:chapter">,&#10;</xsl:if>
40 |    {{
   |    "ref": "{f:generate-id(.)}",
   |    "title": "{normalize-space(db:info/db:title)}",
   |    "subtitle": "{normalize-space(db:info/db:subtitle)}",
   |    "items": [
45 |    <xsl:apply-templates select="db:part|db:article|db:section|db:chapter" mode="TOC"/>
   |    ]
   |    }}
   |  </xsl:template>
   | 
50 |  <xsl:template match="*" mode="TOC">
   |    <xsl:apply-templates select="*" mode="TOC"/>
   |  </xsl:template>
   |</xsl:stylesheet>
Note

This example is meant as a starting point; it’s not robust as it only handles a few of the possible elements that might appear in a Table of Contents.

When processing documents this way, be aware that you are transforming the pre-processed, normalized versions of your input documents. For example, whether or not you put info wrappers around the titles of your sections, in the pre-processed input, titles always appear inside info wrappers. This normalization greatly simplifies processing in many places.


Ok, technically, this stylesheet has a couple of namespace references that aren’t strictly necessary so it could be a teeny bit simpler, but you’ll need those declarations (and more!) if you want to do anything useful.