Full Text Indexer
Full text indexing in Daisy happens automatically when document variants are updated, so you do not need to worry about updating the index yourself. Technically, the full text indexer has a durable subscription on the JMS events generated by the repository, and it are these events which trigger the index updating.
Technology
Daisy uses Jakarta Lucene as full-text indexer.
Included content
Only document variants which have a live version are included in the full text index. Thus retired document variants or document variants having only draft versions are not included. It is the content of the live version which is indexed, thus full text search operations always search on the live content.
For each document variant, the included content consists of the document name, the value of string fields, and text extracted from the parts. For the parts, text extraction will be performed on the data if the mime type is one of the following:
|
Mime type |
Comment |
|---|---|
|
text/plain |
|
|
text/xml |
e.g. the "Daisy HTML" parts |
|
application/xhtml+xml |
XHTML documents |
|
application/pdf |
PDF files |
|
application/vnd.sun.xml.writer |
OpenOffice Writer files |
|
application/msword |
Microsoft Word files |
|
application/mspowerpoint |
Microsoft Powerpoint files |
|
application/msexcel |
Microsoft Excel files |
Support for other formats can be added by implementing a simple interface. Ask on the Daisy Mailing List if you need more information about this.
Index management
It is possible to trigger optimisation of the index, and rebuilding of the index, through the JMX management interface (accessible through your web browser, runs by default on port 9264). Rebuilding the index can be useful for example when a new version of Daisy has support for new data formats. When rebuilding the index you can select the documents to be re-indexed using a query, or simply re-index all documents. For example, to re-index all PDF files you could enter the query "select id where HasPartWithMimeType('application/pdf')" (what you put in the select-part does not matter).



There are no comments.