Strange section title structure
Chapter 1. The problem
I recently received a DocBook file that has obviously been created in another format (MS Word?) and then automatically converted to DocBook. It contained some very strange constructs, such as the following:
1. Titles with anchor Elements
All section elements had the same structure:
None of them had an
xml:idattributeInstead, every
titlehad ananchorchild element, and this had the@xml:idattribute.
So, instead of
|<section xml:id='the-id'>|<title>The title</title>|</section
all sections looked like
|<section>|<title><anchor xml:id='the-id'/>The title</title>|</section
This is absolutely not how I would author my DocBook Dokuments. But it is valid, so
xslTNG Stylesheet should transform it into valid HTML. Release 2.7.1, however, copies the
anchor Element as part of the title into the table of content, which
leads to duplicate ID
Errors. In the ToC you will find something like
this:
1 |<li>|<a href="#R_ch1_s1">|<span class="label">1</span>|<span class="sep">. </span>5 |<span id="anchor_pmy_rsc_vhc" class="anchor"/>|Titles with anchor Elements|</a>|</li>
Chapter 2. The solution
I wrote a Schematron schema which detected the section/title/anchor
structure, and a Schematron Quick fix to convert it into something sensible. The Paper
Taking
Schematron QuickFix To The Next Level
from
Octavian Nadolu from MarkupUK 2019 was a great help.
And I reported this issue together with a PR to Norm.