6.3.3 Publication Process Tasks Reference
6.3.3.1 General
All publication tasks read and write their files in the book instance that is currently being processed. Thus all input and output paths specified on the individual publication tasks are relative paths within a book instance, and are prefixed with the directory of the current publication output, thus /publications/<publication output name>/.
6.3.3.2 applyDocumentTypeStyling
6.3.3.2.1 Syntax
<applyDocumentTypeStyling/>
6.3.3.2.2 Description
Applies document type specific stylesheets.
6.3.3.2.2.1 Location of the document type specific stylesheets
The stylesheets are searched in the following locations:
First a document-type-specific, publication-type-specific stylesheet is searched here:
<wikidata dir>/books/publicationtypes/<publication-type-name>/document-styling/<document-type-name>.xsl
If not found, then a document-type-specific stylesheet is searched here:
<wikidata dir>/books/publicationtypes/document-styling/<document-type-name>.xsl
Finally, if not found, a generic stylesheet is used:
webapp/daisy/books/publicationtypes/common/book-document-to-html.xsl
6.3.3.2.2.2 Output of the document type specific stylesheets
In contrast with the Daisy Wiki, the document type specific stylesheets should always produce Daisy-HTML output. This is because later on a lot of processing still needs the logical HTML structure (such as header shifting, formatting cross references, ...). If you want to do output-medium specific things, you can leave custom attributes and elements in the output and later on interpret them in the stylesheet that will translate the HTML to its final format.
The output of the stylesheets should follow the following structure:
<html>
<body>
<h0 id="dsy<document id>"
daisyDocument="<document id>"
daisyBranch="<branch>"
daisyLanguage="<language>">document name</h0>
... the rest of the content ...
</body>
</html>
Since headers in a document start at h1, but the name of the document usually corresponds to the section title, the name of the document should be left in a <h0> tag. This will then later be corrected by the shiftHeaders task.
The id attribute on the <h0> element is required for proper resolving of links and cross references pointing to this document/book section.
The applyDocumentTypeStyling task writes its result files in the book instance in the following directory:
publications/<publication-name>/documents
6.3.3.3 addSectionTypes
6.3.3.3.1 Syntax
<addSectionTypes/
6.3.3.3.2 Description
In the book definition, types can be assigned to sections (e.g. a type of 'appendix' might be assigned to a section). This task will add an attribute called daisySectionType to the <h0> tag of each document for which a section type is specified.
This task should be run after the applyDocumentTypeStyling task and will make its changes to the files generated by that task (thus no new files are written).
6.3.3.4 shiftHeaders
6.3.3.4.1 Syntax
<shiftHeaders/>
6.3.3.4.2 Description
This task shifts the headers in the documents depending on their hierarchical nesting in the book definition. All headers are always shifted by at least 1, to move the h0 headers to h1, see the applyDocumentTypeStyling task.
This task does not write new files but replaces the existing files created by the applyDocumentTypeStyling task.
6.3.3.5 assembleBook
6.3.3.5.1 Syntax
<assembleBook output="filename"/>
6.3.3.5.2 Description
Assembles one big XML containing all the content of the book. Thus this task combines all documents specified in the book definition in one big XML. It also inserts headers for sections in the book definition that only specify a title. If a document contains included documents, these are also merged in at the include position.
The <html> and <body> tags of the individual documents are hereby removed, the resulting assembled XML has just one <html> element containing one <body> element.
The output is written to the specified output path (in the book instance).
6.3.3.6 addNumbering
6.3.3.6.1 Syntax
<addNumbering input="filename" output="filename"/>
6.3.3.6.2 Description
Assigns numbers to headers (sections), figures and tables.
The numbering is done based on numbering patterns specified in the publication properties. The numbering is different for sections, figures or tables of different types.
For example, the following properties define the numbering patterns for sections of the type "default":
|
Property name |
Property value |
|---|---|
|
numbering.default.h1 |
1 |
|
numbering.default.h2 |
h1.1 |
|
numbering.default.h3 |
h1.h2.1 |
The property values are numbering patterns, which define the formatting of the number, following a certain syntax, explained below.
The properties for figures and tables are called "figure.<figuretype>.numberpattern" and "table.<tabletype>.numberpattern", respectively. The numbering of figures and tables happens per chapter (per h1-level). Figures and tables are only numbered if they have a caption (defined by the daisy-caption attribute)
For sections, the following additional properties can be defined:
- numbering.<section-type>.increase-number: defines whether the section number must be increased when a section of this type is encountered. It can be useful to disable this for "anonymous" sections that do not require numbering, though for which the numbering of the next sections should simply continue as if the anonymous sections were not there.
- numbering.<section-type>.reset-number: defines whether the section numbering must be restarted when a section of this type is encountered after a section of another type, inside the same section level.
- numbering.<section-type>.hx.start-number: defines the initial number for sections of this type on level x (default: 1)
On the elements to which a number is assigned, the following attributes are added:
- daisyNumber: the number formatted according to the number pattern
- daisyPartialNumber: only the number of the element itself formatted according to the style indicated in the numbering pattern (1, i, I, a or A), without any of the other parts of the numbering pattern
- daisyRawNumber: the unformatted number of the element.
6.3.3.6.2.1 Syntax of the numbering patterns
Each numbering pattern should contain exactly one of the following characters: 1, i, I, a or A. The number of the section (or figure/table) will be inserted at the location of that character. The character indicates the type of numbering (e.g. A for numbering with letters).
The number of ancestor sections can be refered using 'h1', 'h2', ... till 'h9'. The number of the highest ancestor which has a number can be referred to using 'hr' ("root header").
It is possible to retrieve text from the resource bundle of the current publication type by putting a resource bundle key between $ signs, for example $mykey$.
Any other characters used in the numbering pattern will be output as-is.
6.3.3.7 verifyIdsAndLinks
6.3.3.7.1 Syntax
<verifyIdsAndLinks input="filename" output="filename"/>
6.3.3.7.2 Description
This task does two things:
- It does some linking related checks: it will warn for double IDs, or Daisy-links and cross-references pointing to documents or IDs not present in the book. It will also warn for images of which the source starts with "file:", which is most often caused by accident. All these warnings are written to the link log.
- It will assign IDs to the following elements that do not have an ID yet:
- headers
- images and tables which have a caption
6.3.3.8 addIndex
6.3.3.8.1 Syntax
<addIndex input="..." output="..."/>
6.3.3.8.2 Description
This task generates the index based on the index entries in the document (which are marked with <span class="indexentry">...</span>).
It collects all index entries, sorts them, creates hierarchy in them (by splitting index entries on any colon that appears in them), and writes out the original document with index appended before the body close tag, whereby the output has the following structure:
<h1 id="index">Index</h1>
<index>
<indexGroup name="A">
<indexEntry name="A...">
<id>...</id>
<id>...</id>
[... more id-s ...]
[... nested index entries ...]
</indexEntry>
[... more index entries ...]
</indexGroup>
[... more index groups ...]
</index>
The <indexGroup> elements combine index entries based on their first letter. Any entries before the letter A are grouped in an <indexGroup> without a name attribute.
The <id> elements inside the indexEntries list all the IDs of the indexentry-spans that define this index entry.
Note that this task will also assign IDs to the indexentry spans.
6.3.3.9 addTocAndLists
6.3.3.9.1 Syntax
<addTocAndLists input="filename" output="filename"/>
6.3.3.9.2 Description
This task creates the Table Of Contents and the lists of figures and tables.
6.3.3.9.2.1 Table Of Contents (TOC)
The TOC is created based on the HTML header elements (h1, h2, etc.). Only headers up to a certain level are included, which is configurable using the publication property called "toc.depth", whose value should be an integer number (1, 2, etc.).
The TOC is inserted at the beginning of the document, after the <body> opening tag, and has an XML like this:
<toc>
<tocEntry targetId="..." daisyNumber="..." daisyPartialNumber="..." daisyRawNumber="...">
<caption>...</caption>
[... nested tocEntry elements ...]
</tocEntry>
[... more tocEntry elements ...]
</toc>
The targetId attribute is the ID of the corresponding header. The daisyNumber, daisyPartialNumber and daisyRawNumber attributes are only present if the corresponding number had a number assigned by the addNumbering task. See the description of that task for the meaning of these attributes.
The caption element contains the content of the header tag, including any mixed content. However, footnotes or index entries which might occur in the heading are not copied into the caption element.
6.3.3.9.2.2 Lists of figures and lists of tables
Lists of figures and lists of tables are created per type of figure or table. The types for which the lists should be created have to specified in two properties:
- list-of-figures.include-types
- list-of-tables.include-types
These properties should contain a comma separated list of types. For figures and tables that do not have a specific type assigned, the type is assumed to be "default". For example to have a list of all default figures, and a list of all figures with type "screenshot", one would set the list-of-figures.include-types property to "default,screenshot". Note that the order in which the types are specified is the order in which the lists will be inserted in the output.
The lists are inserted in the output after the TOC, and have an XML structure like this:
<list-of-figures type="..."> <list-item targetId="..." daisyNumber="..." daisyPartialNumber="..." daisyRawNumber="...">the caption</list-item> [... more list-item elements ...] </list-of-figures>
For tables the root element is "list-of-tables".
6.3.3.10 applyPipeline
6.3.3.10.1 Syntax
<applyPipeline input="..." output="..." pipe="..."/>
6.3.3.10.2 Description
This task calls a Cocoon pipeline in the publication type sitemap.
The pipeline is supplied with the following parameters (flow context attributes):
- bookXmlInputStream: an inputstream for the file specified in the input attribute
- bookInstanceName
- bookInstance (the BookInstance object)
- locale: java.util.Locale object for the locale in which to publish the book
- localeAsString: the locale as a string
- pubProps: java.util.Map containing the publication properties
- bookMetadata: java.util.Map containing the book metadata
- publicationTypeName
- publicationOutputName
The pipe attribute specified the pipeline to be called (thus the path to be matched by a matcher in the sitemap). The output of the pipeline execution is saved to the file specified in the output attribute.
For practical usage examples, see the default publication types included with Daisy.
6.3.3.11 copyResource
6.3.3.11.1 Syntax
<copyResource from="..." to="..."/>
6.3.3.11.2 Description
Copies a file or directory (recursively) from the publication type to the book instance. As with all other tasks, the "to" path will automatically be prepended with the directory of the current publication output (/publications/<publication output name>/).
6.3.3.12 splitInChunks
6.3.3.12.1 Syntax
<splitInChunks input="..." output="..." firstChunkName="..."/>
6.3.3.12.2 Description
Groups the input into chunks. New chunks are started on each <hX>, in which X is configurable using the publication property "chunker.chunklevel".
The output will have the following format:
<chunks>
<chunk name="...">
<html>
<body>
[content of the chunk]
</body>
</html>
</chunk>
[... more chunk elements ...]
</chunks>
By default the name of each chunk will be the ID of the header where the new chunk started, except for the first chunk for which the chunk name can optionally be defined using the firstChunkName attribute on the splitInChunks task element.
The original <html> and <body> elements are discarded, the new <chunks> element will be the root of the output. New <html> and <body> elements are inserted into each chunk, so that the content of each chunk forms a stand-alone HTML document.
6.3.3.13 writeChunks
6.3.3.13.1 Syntax
<writeChunks input="..." outputPrefix="..." chunkFileExtension="..."
applyPipeline="..." pipelineOutputPrefix="..." chunkAfterPipelineFileExtension="..."/>
6.3.3.13.2 Description
Writes the content of individual chunks, as created by the splitInChunks taks, to separate XML files.
The attributes applyPipeline, pipelineOutputPrefix and chunkAfterPipelineFileExtension are optional. If present, the pipeline specified in the applyPipeline attribute will be applied to each of the chunks, and the result will be written to a file with the same name as the original chunk, but with the extension specified in the attribute chunkAfterPipelineFileExtension.
6.3.3.14 makePDF
6.3.3.14.1 Syntax
<makePDF input="..." output="..."/>
6.3.3.14.2 Description
Transforms an XSL-FO file to PDF. The current implementation uses the (commercial) Ibex PDF serializer. [todo: note on serialized execution]
6.3.3.15 getDocumentPart
6.3.3.15.1 Syntax
<getDocumentPart propertyName="..." propertyOrigin="..." partName="..." saveAs="..." setProperty="..."/>
6.3.3.15.2 Description
This task retrieves the content of a part of a Daisy document. The Daisy document is specified using a "daisy:" link in a publication property or book metadata attribute.
- propertyName: specifies the name of the property
- propertyOrigin: either 'publication' for a publication property or 'metadata' for a book metadata attribute
- partName: the name of the part from which to get the data. For example, "ImageData" for images.
- saveAs: where the data should be saved.
- setProperty (optional): specifies the name of (publication) property which will be set to true if the part data has been effectively retrieved
This task can be useful when you let the user specify e.g. a logo to put in the header or footer by specifying a daisy link in a publication/metadata property.
6.3.3.16 copyBookInstanceResources
Previously (Daisy 1.4) this was called copyBookInstanceImages. This old currently name still works for backwards-compatibility.
6.3.3.16.1 Syntax
<copyBookInstanceResources input="..." output="..." to="..."/>
6.3.3.16.2 Description
Copies all resources which are linked to using the "bookinstance:" scheme to the directory specified in the to attribute, unless the resource would already be in the output directory (thus when the resource link starts with "bookinstance:output/"). The links are adjusted to the new path and the resulting XML is written to the file specified in the output attribute.
The following resource links are taken into account:
|
HTML element |
Corresponding attribute |
|---|---|
|
img |
src |
|
a |
href |
|
object |
data |
|
embed |
src |
This task is ideally suited to copy e.g. the images to the output directory when publishing as HTML (for PDF, this is not needed since the images are embedded inside the HTML file).
6.3.3.17 zip
6.3.3.17.1 Syntax
<zip/>
Creates a zip file containing all files in the output directory. The zip file itself is also written in the output directory, with as name the name of the book instance concatenated with a dash and the name of the publication.
6.3.3.18 custom
6.3.3.18.1 Syntax
<custom class="..." [... any other attributes ...]/>
6.3.3.18.2 Description
This task provides a hook for implementing your own tasks. A publication process task should implement the following interface:
org.outerj.daisy.books.publisher.impl.publicationprocess.PublicationProcessTask
The implementation class can have three possible constructors (availability checked in the order listed here):
- A constructor taking an XMLBeans XmlObject object as argument. The XmlObject will represent the <custom> XML element. This constructor is useful if you want to access nested XML content of the <custom> element (for advanced configuration needs).
- A constructor taking a java.util.Map as argument. The Map will contain all attributes of the <custom> element.
- A default constructor (no arguments)
6.3.3.19 renderSVG
6.3.3.19.1 Syntax
This task currently has no native tag, so it should be used through the custom task capability.
<custom class="org.outerj.daisy.books.publisher.impl.publicationprocess.SvgRenderTask"
input="..." output="..."/>
The following table lists additional optional attributes.
|
attribute |
description |
|---|---|
|
outputPrefix |
where the generated SVGs should be stored in the book instance, relative to the output of the current publication process, default: from-svg/ |
|
format |
jpg or png. default: jpg. the Ibex XSL-FO renderer doesn't seem to handle the png's. |
|
dpi |
dots per inch, default: 96. For good quality, put this to e.g. 250. |
|
quality |
for jpegs, by default 1 (should be a value between 0 and 1) |
|
backgroundColor |
for transparent areas in images, color specified in a form like #FFFFFF (default: leave transparent) |
|
enableScripts |
should scripts in the SVG be executed, default: false. (Note: the Rhino version included with Cocoon 2.1 doesn't work well together with Batik, this can be resolved by upgrading to rhino 1.6-RC2, though this also needs a recompile of Cocoon -- ask on the mailing list if help needed) |
|
maxPrintWidth |
maximum value for the generated print-width attribute, in inches. The other dimension scales proportionally. Default: 6.45 |
|
maxPrintHeight |
maximum value for the generated print-height attribute, in inches. The other dimension scales proportionally. Default: 8.6 |
6.3.3.19.2 Description
Parses the file specified in the input attribute, and reacts on all renderSVG tags it encounters:
<rs:renderSVG xmlns:rs="http://outerx.org/daisy/1.0#bookSvgRenderTask"
bookStorePath="..."/>
The bookStorePath attribute points to some resource in the book instance, which will be interpreted as an SVG file and rendered. The renderSVG tag is removed, and replaced with an <img> tag, with:
- a src attribute pointing to the generated image (the produced image file name is the same as the original file name, but in a different directory as defined by the outputPrefix attribute)
- height and width attributes specifying the size in pixels (for HTML)
- print-height and print-width attributes specifying the size in a form suited for XSL-FO (e.g. 3in), depending on the specified dpi.
To make use of this task, you will typically download the content of some document part in the book instance using <requiredParts/>, and have a doctype XSL which generates the renderSVG tag. This will eventually be illustrated in a tutorial on the community Wiki.
The renderSVG task requires Batik, which is (at the time of this writing) not included by default in the Daisy Wiki.
6.3.3.20 callPipeline
6.3.3.20.1 Syntax
This task currently has no native tag, so it should be used through the custom task capability.
<custom class="org.outerj.daisy.books.publisher.impl.publicationprocess.CallPipelineTask"
input="..." output="..." outputPrefix="..."/>
The outputPrefix attribute is optional and defaults to after-call-pipeline/.
6.3.3.20.2 Description
Parses the file specified in the input attribute, and reacts on all callPipeline tags it encounters. This is different from the applyPipeline task which applies a Cocoon pipeline on the file specified in the input attribute itself. This task is ideally suited to do some processing on part content downloaded using <requiredParts/>. The callPipeline tag is typically produced in the document-type specific XSLT for the document type containing the part.
Syntax for the callPipeline tag:
<cp:callPipeline xmlns:cp="http://outerx.org/daisy/1.0#bookCallPipelineTask"
bookStorePath="..."
pipe="..."
outputPrefix="..."
outputExtension="...">
... any nested content ...
</cp:callPipeline>
When such a tag is encountered, the Cocoon pipeline specified in the pipe attribute will be applied on the document specified in the bookStorePath attribute. The pipeline is a pipeline in the sitemap of the current publication type.
The outputPrefix attribute is optional, and can specify an alternative outputPrefix than the one globally configured. The outputExtension attribute is optional too, and specifies an extention for the result file. The base file name is the same as the current filename (the one specified in the bookStorePath attribute).
The cp:callPipeline tag itself is removed from the output, however its nested content is passed through. For all elements nested inside <cp:callPipeline>, the attributes will be searched for the string {callPipelineOutput}, which will be replaced with the path of the produced file. For example, if you want to transform some XML file to SVG and then render it with the renderSVG task, you can do something like:
<cp:callPipeline bookStorePath="something"
outputPrefix="something/"
pipe="MyPipe">
<rs:renderSVG bookStorePath="{callPipelineOutput}"/>
</cp:callPipeline>
If you would put the above fragment in an XSL, don't forget to escape the braces by doubling them: "{{callPipelineOutput}}".
Previous