Strange section title structure

Frank Steimke

Chapter 1The problem

I recently received a DocBook file that has obviously been created in another format (MS Word?) and then automatically converted to DocBook. It contained some very strange constructs, such as the following:

1Titles with anchor Elements

All section elements had the same structure:

  • None of them had an xml:id attribute

  • Instead, every title had an anchor child element, and this had the @xml:id attribute.

So, instead of

  |<section xml:id='the-id'>
  |  <title>The title</title>
  |</section

all sections looked like

  |<section>
  |  <title><anchor xml:id='the-id'/>The title</title>
  |</section

This is absolutely not how I would author my DocBook Dokuments. But it is valid, so xslTNG Stylesheet should transform it into valid HTML. Release 2.7.1, however, copies the anchor Element as part of the title into the table of content, which leads to duplicate ID Errors. In the ToC you will find something like this:

1 |<li>
  |  <a href="#R_ch1_s1">
  |    <span class="label">1</span>
  |    <span class="sep">. </span>
5 |    <span id="anchor_pmy_rsc_vhc" class="anchor"/>
  |Titles with anchor Elements
  |  </a>
  |</li>

Chapter 2The solution

I wrote a Schematron schema which detected the section/title/anchor structure, and a Schematron Quick fix to convert it into something sensible. The Paper Taking Schematron QuickFix To The Next Level from Octavian Nadolu from MarkupUK 2019 was a great help.

And I reported this issue together with a PR to Norm.