Website Downloads Documentation Knowledgebase Wiki Issue tracker Commercial support

Full Text Indexer

Full text indexing in Daisy happens automatically when document variants are updated, so you do not need to worry about updating the index yourself. Technically, the full text indexer has a durable subscription on the JMS events generated by the repository, and it are these events which trigger the index updating.

Technology

Daisy uses Jakarta Lucene as full-text indexer.

Included content

Only document variants which have a live version are included in the full text index. Thus retired document variants or document variants having only draft versions are not included. It is the content of the live version which is indexed, thus full text search operations always search on the live content.

For each document variant, the included content consists of the document name, the value of string fields, and text extracted from the parts. For the parts, text extraction will be performed on the data if the mime type is one of the following:

Mime type

Comment

text/plain

text/xml

e.g. the "Daisy HTML" parts

application/xhtml+xml

XHTML documents

application/pdf

PDF files

application/vnd.sun.xml.writer

OpenOffice Writer files

application/msword
application/vnd.ms-word

Microsoft Word files

application/mspowerpoint
application/vnd.ms-powerpoint

Microsoft Powerpoint files

application/msexcel
application/vnd.ms-excel

Microsoft Excel files

Support for other formats can be added by implementing a simple interface. Ask on the Daisy Mailing List if you need more information about this.

Index management

It is possible to trigger optimisation of the index, and rebuilding of the index, through the JMX management interface (accessible through your web browser, runs by default on port 9264). Rebuilding the index can be useful for example when a new version of Daisy has support for new data formats. When rebuilding the index you can select the documents to be re-indexed using a query, or simply re-index all documents. For example, to re-index all PDF files you could enter the query "select id where HasPartWithMimeType('application/pdf')" (what you put in the select-part does not matter).

Comments (0)
Advertisement

Daisy hosting, installation, support. Workshops and turnkey Daisy CMS projects. Get Daisy from its creators.

outerthought.org

Downloads provided by

SourceForge.net Logo

Open source stats