Website Downloads Documentation Knowledgebase Wiki Issue tracker Commercial support

Retired

This document has been retired.


Aggregated Daisy Documentation

Installation

Downloading Daisy

Packaged versions of Daisy can be found in the distribution area. This includes everything required to run Daisy, except for:

  • a Java Virtual Machine (JVM): version 1.4.2, version 5 (= 1.5) or higher required
  • a MySQL database: version 4.0.20 or higher, or version 4.1.7 or higher required. Version 5.x is not yet supported (will be in Daisy 1.4)

If you don't have these already, the installation of these will be covered further on.

Consider subscribing to the Daisy mailing list to ask questions and talk with fellow Daisy users and developers.

There is also information available about the source code.

Installation Overview

Daisy is a multi-tier application, consisting of a repository server and a publication layer. Next to those, a JMS server (OpenJMS) and a database server (MySQL) are required. All together, this means four processes, which can run on the same server or on different servers.

The Daisy binary distribution packs most of the needed software together, the only additional things you'll need is a Java Virtual Machine for your platform, and MySQL. All libraries and applications shipped with Daisy are the original, unmodified distributions that will be configured as part of the installation. We've only grouped them in one download for your convenience.

If you follow the instructions in this document, you can have Daisy up and running in less than an hour.

The diagram below gives an overview of the the setup. All shown port numbers are configurable of course.

Daisy Deployment

Platform Requirements

We have tested the Daisy installation on Windows 2000/XP, GNU/Linux and MacOSX. Other unixes like Solaris should also work, though we don't test that ourselves.

Memory Requirements

By default, the Daisy Wiki and Daisy Repository Server are started with a maximum heap size of 128 MB each, OpenJMS uses the JVM default (64 MB). To this you need to add some overhead of the JVMs themselves, and then some memory for MySQL, the OS and its (filesystem) caches. This doesn't mean all this memory will be used, that will depend on usage intensity.

Required knowledge

These installation instructions assume you're comfortable with installing software, editing configuration (XML) files, running applications from the command line, setting environment variables, and that sort of stuff.

Can I use Oracle, PostgreSQL, MS-SQL, ... instead of MySQL? Websphere, Weblogic, Tomcat, ... instead of Jetty? What is this Merlin thing?

Daisy contains the necessary abstractions to support different database engines, though we currently only support MySQL. Users are welcome to contribute and maintain different databases (ask on the mailing list how to get started).

The Daisy Wiki webapp should be able to run in any servlet container (at least one that can run unpacked webapps, and as far as there aren't any Cocoon-specific issues), but we ship Jetty by default. For example, using Tomcat instead of Jetty is very simple and is described on this page.

The Daisy Repository Server runs on top of a component runtime platform called Merlin. Generally you won't be aware of this, but if you see "merlin" popping up in process names, that means it is basically the same as the Daisy Repository Server.

Installing a Java Virtual Machine

Daisy requires either the Java SDK 1.4.2 or the Java JDK 5.0.

You can download the Java JDK 5.0 from here on the Sun site (take the JDK, not the JRE). Install it now if you don't have it already.

After installation, make sure the JAVA_HOME environment variable is defined and points to the correct location (ie, the directory where Java is installed). To verify this, open a command prompt or shell and enter:

For Windows:
%JAVA_HOME%/bin/java -version

For Linux:
$JAVA_HOME/bin/java -version

This should print out something like:

java version "1.4.2_xx"

or

java version "1.5.0"

Installing JAI (Java Advanced Imaging) -- optional, Java 1.4 only

If you want images (especially PNG) to appear in PDFs, it is highly adviceable to install JAI, which you can download from the JAI homepage. Just take the "JDK Install" option, this will make JAI support globally available.

Installing MySQL

Daisy requires one of the following MySQL versions:

  • version 4.0.20 or higher (we're not making this number up: there have been various bug fixes or improvements between versions 4.0.0 and 4.0.20 on which Daisy depends)
  • version 4.1.7 or a newer version from the 4.1.x series
  • version 5.x is not yet supported (will be in Daisy 1.4)

The 4.1 version of MySQL has the big advantage that it supports UTF-8, so if you can choose, pick the 4.1.

MySQL can be downloaded from mysql.com. Install it now, and start it (often done automatically by the install).

Windows users can take the "Windows Essentials" package. During installation and the configuration wizard, you can leave most things to their defaults. In particular, be sure to leave the "Database Usage" to "Multifunctional Database", and leave the TCP/IP Networking enabled (on port 3306). When it asks for the default character set, select "Best Support For Multilangualism" (this will use UTF-8). When it asks for Windows options, check the option "Include Bin Directory In Windows Path".

Linux users: install the "MySQL server" and "MySQL client" packages. Installing the MySQL server RPM will automatically initialise and start the MySQL server. To enable UTF-8: after installation, shutdown MySQL (/etc/init.d/mysql stop), and edit or create the file /var/lib/mysql/my.cnf, and add the following:
[mysqld]
default-character-set=utf8
[mysql]
default-character-set=utf8
(note: don't do this if you already have data in your mysql database! See the MySQL documentation)
Afterwards, start the server again by executing /etc/init.d/mysql start

Creating MySQL databases and users

MySQL is used by both the Daisy Repository Server and OpenJMS. Therefore, we are now going to create two databases and two users.

Open a command prompt, and start the MySQL client as root user:

mysql -uroot -pYourRootPassword

On some systems, the root user has no password, in which case you can drop the -p parameter.

Now create the necessary databases, users and access rights by entering (or copy-paste) the commands below in the mysql client. What follows behind the IDENTIFIED BY is the password for the user, which you can change if you wish. The daisy@localhost entries are necessary because otherwise the default access rights for anonymous users @localhost will take precedence. If you'll run MySQL on the same machine as OpenJMS and the Daisy Repository Server, you only need the @localhost entries.

CREATE DATABASE daisyrepository;
GRANT ALL ON daisyrepository.* TO daisy@"%" IDENTIFIED BY "daisy";
GRANT ALL ON daisyrepository.* TO daisy@localhost IDENTIFIED BY "daisy";
CREATE DATABASE openjms;
GRANT ALL ON openjms.* TO openjms@"%" IDENTIFIED BY "openjms";
GRANT ALL ON openjms.* TO openjms@localhost IDENTIFIED BY "openjms";

Extract the Daisy download

Extract the Daisy download. On Linux/Unix you can extract the .tar.gz file as follows:

tar xvzf daisy-<version>.tar.gz

On non-Linux unixes, use the GNU tar version if you experience problems extracting.

On Windows, use the .zip download, which you can extract using a tool like WinZip.

Make sure that wherever you extract the download, none of the parent directories contains spaces in their names. For example, on Windows, do not extract Daisy inside your c:\Program Files directory.

After extraction, you will get a directory called daisy-<version>. This directory is what we will call from now on the DAISY_HOME directory. You may set a global environment variable pointing to that location, or you can do it each time in the command prompt when needed.

OpenJMS configuration

Open a command prompt or shell and set the following environment variables:

  • DAISY_HOME, pointing to the directory where Daisy is installed
  • OPENJMS_HOME, pointing to <DAISY_HOME>/openjms

Then go to the directory <DAISY_HOME>/install, and execute:

daisy-openjms-config

(Unix users might have to do ./daisy-openjms-config, depending on their settings)

The application will ask a few database parameters and then update configuration files accordingly. If you have MySQL running on localhost and created the tables using default names and passwords as shown earlier, you can simply press enter on each question.

Then go to the directory <OPENJMS_HOME>/bin, and execute the following to create the OpenJMS database tables (the complete statement should be on one line):

For Linux:
dbtool.sh -create -config $OPENJMS_HOME/config/openjms.xml
   -Dlog4j.configuration=$DAISY_HOME/misc/openjms-log4j.properties

For Windows:
dbtool.bat -create -config %OPENJMS_HOME%\config\openjms.xml
   -Dlog4j.configuration=%DAISY_HOME%\misc\openjms-log4j.properties

This should print out as last line:

Successfully created tables

Now you can start OpenJMS: go to the directory <OPENJMS_HOME>/bin, and execute:

For Linux:
startup.sh

For Windows:
startup.bat

Daisy Repository Server

Initialiasing and configuring the Daisy Repository

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Then go to the directory <DAISY_HOME>/install, and execute:

daisy-repository-init

Follow the instructions on screen. The installation will (1) initialiase the database tables for the repository server and (2) create a Daisy data directory containing customized configuration files.

Starting the Daisy Repository Server

Still in the same command prompt (or in a new one, but make sure DAISY_HOME is set), go to the directory <DAISY_HOME>/repository-server/bin, and execute:

daisy-repository-server <location-of-daisy-data-dir>

If you're on Windows and installed Java in a directory containing spaces in the name (for example, c:\Program Files), do the following: open the file <DAISY_HOME>/repository-server/merlin/bin/merlin.bat in a text editor. The third line counting from the end starts with %JAVA_HOME%, change this to "%JAVA_HOME%" (thus, put it between quotes).

In which you replace <location-of-daisy-data-dir> with the location of the daisy data directory created in the previous step.

The startup can take a few seconds, though nothing special is printed to the screen to indicate it is completely started (the prompt will not return)

OpenJMS and MySQL still need to be running, otherwise the repository server will be unable to start.

Daisy Wiki

Initialising the Daisy Wiki

Before you can run the Daisy Wiki, the repository needs to be initialiased with some document types, a "guest" and "registrar" user, a default ACL configuration, etc.

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-init

The program will start by asking a login and password, enter here the user created during the execution of daisy-repository-init (the default was testuser/testuser). It will also ask for the URL where the repository is listening, you can simply press enter here.

If everything goes according to plan, the program will now print out some informational messages and end with "Finished.".

Creating a Daisy Wiki Site

The Daisy Wiki has the concept of multiple sites, these are multiple views on top of the same repository. You need at least one site to do something useful with the Daisy Wiki, so we are now going to create one.

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-add-site

The application starts by asking the same parameters as for daisy-wiki-init.

Then it will ask a name for the site. This should be a name without spaces. If you're inspirationless, enter something like "test" or "main".

Then it will ask for the sites directory location, for which the presented default should be OK, so just press enter.

Starting the Daisy Wiki

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Go to the directory <DAISY_HOME>/daisywiki/bin, and execute:

daisy-wiki

This will start Jetty (a servlet container) with the webapp found in <DAISY_HOME>/daisywiki/webapp.

OpenJMS, MySQL, and the Daisy Repository Server should still be running of course.

Finished!

Now you can point your web browser to:

http://yourhost:8888/daisy/

(Note the final slash on the end)

To be able to create or edit documents, you will have to change the login, you can use the user you created for yourself while running daisy-repository-init (the default was testuser/testuser).

1.2 to 1.3 Milestone 1 upgrade

Changes (compared to Daisy 1.2)

Features

  • Document Variants: it is possible to have multiple variants of one document, which can be branch-variants or language-variants. These variants all share the same document ID, but are then distinguished by their branch and language.
  • Document Tasks: these are tasks that run across a number of documents and are executed in the background. This was added to enable the creation of branches (see variants) across a number of documents, but is useful for many other things as well. Tasks can be written in Javascript (Administrators only) or selected from predefined actions.
  • daisy-js: a small shell script which sets the Java classpath to contain all Daisy client jars and then launches a Javascript interpreter (Rhino). This provides a low entry-bar for people who want to experiment with the Daisy API.
  • Navigation tree: when there is an error in a query or a recursive import of navigation trees, a special error node is now inserted in the tree, instead of showing "Error generating the navigation tree".
  • Made the installation easier: automated more steps (no more editing of files by hand), clarified the texts displayed by the installation programs, ask less questions, added log4j config for openjms dbtool so that its error messages become visible, added check for MySQL database version, provide zip download for Windows users, and more.
  • Passwords are now stored as SHA digests in the database (previously they were stored as clear text)

Bug Fixes and improvements

Daisy 1.3-M1 contains numerous bug fixes and smaller improvements. We didn't really keep track of each and everyone, here are some from the top of my head:

  • Daisy now works on Java 1.5 (Java 5)
  • Fixed a potential encoding issue that could occur when there's a proxy (such as Apache) in between the browser and the Daisy Wiki, or when using certain servlet containers.
  • Use TCP as default transport protocol for OpenJMS (instead of RMI), which is more firewall friendly since it uses a fixed port.
  • Fixed issues with unhandable redirects in the navigation tree.
  • The id attribute of link nodes was not supported by the graphical navigation tree editor, causing it to be dropped.
  • Show little triangles before expandable nodes in the navigation tree (when the navigation tree is not rendered fully expanded). For this, an attribute named hasChildren has been added to document nodes in the output of the navigation tree.
  • Fixed an error in the determination of available documenttypes (DSY-118)
  • Added document URLs in comment notfication mails
  • The order of out-of-line links was not maintained when using the remote API.
  • Use appropriate HTTP status codes in the Daisy Wiki (e.g. 404 when a document does not exist)
  • IE improvements: items in the page menu (on the right) are now spaced correctly, the sample queries on the Query Search page now work.
  • The navigation tree editor didn't escape special characters when generating the navigation tree XML (such as < and &).
  • Some fixes in the HTML cleanup:
    • sometimes whitespace could be introduced where it shouldn't be
    • once a <pre> tag was encountered, <br/> elements in any elements followng that would be removed
    • remove <br/>s at the end of <li>s, which Mozilla/Firefox tends to insert
  • Fixed problem with encoding of HTMLArea language files, causing the failure of the wysiwyg editor to load for some languages in Internet Explorer.
  • Enabled the switching of the block-style within lists.
  • Daisy now ships with Cocoon 2.1.7 (instead of an SVN snapshot).
  • Fixed a bug that could cause a lot of "&#0;" to appear at the end of a document after editing using Internet Explorer.
  • ...

Miscellaneous

  • In this Milestone release only the English and Dutch language bundles are up to date.
  • Due to the introduction of the variants feature, some incompatible API changes occured (both in the Java and HTTP API), though nothing major. In general the specification of the branch and language is optional and defaults to the built-in "main" branch and "default" language. There are also a few new JMS events related to branches: DocumentVariantCreated, DocumentVariantUpdated, DocumentVariantDeleted, (Branch|Language)(Created|Updated|Deleted).

Upgrading

Before starting

Shutdown Daisy (the Repository Server, the Daisy Wiki, and the OpenJMS server)

Make backups! More specifically:

  • make a copy of the daisy data directory
  • do a dump of the database:
    mysqldump daisyrepository -uuser -ppassword > daisyrepo.sql

Update database schema

There have been changes to the database schema (among other things, to support the document variants and the document task manager).

To do the appropriate changes to the MySQL database, an upgrade script is available.

Daisy 1.3 has a new feature called language variants that allows to have multiple language versions of the same document. By default, after migrating the repository, all documents will belong to a language called "default". If all the documents in your current repository are in the same language, and you want them to be classified under a more specific langauge then "default", then see the instructions at the top of the file misc/daisy-1_2-to-1_3_M1.sql

To execute the upgrade script, perform the following steps in a command prompt or shell:

cd <DAISY_HOME>/misc
mysql -Ddaisyrepository -udaisy -p<password>
[then on the mysql prompt]
\. daisy-1_2-to-1_3_M1.sql 

Update OpenJMS configuration

Starting with Daisy 1.3-M1, the installation configures OpenJMS to use TCP instead of RMI as transport protocol, which has the advantage that is more firewall-friendly since it uses a fixed port number.

Therefore, perform the following steps:

  1. copy <OLD_DAISY_HOME>/openjms/config/openjms.xml to <NEW_DAISY_HOME>/openjms/config/openjms.xml
  2. Open the file <NEW_DAISY_HOME>/openjms/config/openjms.xml in a text editor
  3. Inside the <Connectors> element, add the following:
        <Connector scheme="tcp">
          <ConnectionFactories>
            <QueueConnectionFactory name="TCPQueueConnectionFactory"/>
            <TopicConnectionFactory name="TCPTopicConnectionFactory"/>
          </ConnectionFactories>
        </Connector>
  4. Add the following element somewhere as a child of the root element (for example, before the </Configuration> closing tag):
    <TcpConfiguration port="3030" jndiPort="3035"/>
  5. open the file <DAISY_DATA>/conf/myconfig.xml in a text editor. Look for the following line:
    <target path="/daisy/jmsclient/jmsclient">

    and replace the content of the <configuration> element following it with the following, but be sure to keep the old password on the credentials element:

    <configuration>
      <jmsConnection>
        <credentials username="admin" password="openjms"/>
        <initialContext>
          <property name="java.naming.provider.url" value="tcp://localhost:3035/"/>
          <property name="java.naming.factory.initial"
                    value="org.exolab.jms.jndi.InitialContextFactory"/>
        </initialContext>
        <topicConnectionFactoryName>TCPTopicConnectionFactory</topicConnectionFactoryName>
        <queueConnectionFactoryName>TCPQueueConnectionFactory</queueConnectionFactoryName>
      </jmsConnection>
    </configuration>

Copy over the old configuration

  • Copy <OLD_DAISY_HOME>/daisywiki/webapp/daisy/sites/* to <NEW_DAISY_HOME>/daisywiki/webapp/daisy/sites
  • In <NEW_DAISY_HOME>/daisywiki/webapp/WEB-INF/cocoon.xconf, adjust the following passwords with the values you can find in <OLD_DAISY_HOME>/daisywiki/webapp/WEB-INF/cocoon.xconf:
    • Adjust the password for openjms user (if not left to default): <credentials password="openjms" username="admin"/>
    • Adjust the password for "internal' user: <cacheUser login="internal" password="defaultpwd"/>
    • Adjust the password for the "registrar" user: <registrarUser login="registrar" password="defaultpwd"/>
  • If you created any document type-specific stylesheets, you can copy them over also
  • If you developed a custom skin, then you'll have to update it to work with the new version. There's no easy or quick way to do this.

Edit <DAISY_HOME>/openjms/bin/setenv.(sh|bat), uncomment the line defining the CLASSPATH and put the MySQL driver in the CLASSPATH, which can be found at (substitute DAISY_HOME by its actual location):

DAISY_HOME/lib/mysql/jars/mysql-connector-java-3.0.15-ga-bin.jar

Update siteconf.xml files

This step is only needed if you did not use the "default" language when upgrading your database (thus if you adjusted the language in the daisy-1_2-to-1_3_M1.sql file).

Edit all your siteconf.xml files, thus those found in <NEW_DAISY_HOME>/daisywiki/webapp/daisy/sites/<sitename>/siteconf.xml

In each file, add the following element somewhere as child of the root element:

<language>your_language</language>

for example:

<language>en</language>

Erase Full Text Indexer files

Because of the introduction of the document variants feature, the format of the full text index files has changed. Therefore, delete all the files in the directory <DAISY_DATA>/indexstore.

Start the servers

Start OpenJMS, the Daisy Repository Server, and the Daisy Wiki.

If necessary, first update the DAISY_HOME and OPENJMS_HOME variables to point to the location of the new Daisy version.

Trigger full text index updating

Recreating the full text index can be done via the JMX console. Open a webbrowser and surf to http://localhost:9264.

If this prompts for a password and you don't know it, look in the file <DAISY_DATA>/conf/myconfig.xml for a line like:

<httpAdaptor authenticationMethod="none" host="localhost"
             password="daisyjmx" port="9264" username="daisyjmx"/>

and use the username and password mentioned there.

By default, the JMX console only allows connection from localhost. Change the host attribute on the above configuration element if needed. You need to restart the Daisy Repository Server for this change to take effect.

In the JMX web interface, follow the link that says "Daisy:name=FullTextIndexUpdater". Then press the Invoke button on the line that starts with "reIndexAllDocuments". Press this button only once, depending on the number of documents you have this can take a little while.

Note about static resources and caching

Since static resources are cached for about 5 hours by your browser, you might need to do a full reload (shift + reload button on the browsers' toolbar). Also when opening the editor for the first time do a shift + reload.

If these instructions or unclear to you, or if you find an error in them, please share them with us on the Daisy mailing list or by leaving a comment.

Source Code

Sources can be obtained through SVN. Instructions for setting up a development environment with Daisy (which is slightly different from using the packaged version) are included in the README.txt's in the source tree. For anonymous, read-only access to Daisy SVN, use the following command:

svn co http://svn.cocoondev.org/repos/daisy/trunk/daisy

This will give the latest development code (the "trunk"). To get the source code of a specific release, use a command like this:

svn co http://svn.cocoondev.org/repos/daisy/tags/RELEASE_1_3_1 daisy

See also the existing tags.

No authentication is required for anonymous access. If you're behind a (transparent) proxy, you might want to verify whether your proxy supports the extended HTTP WebDAV methods.

Documents

Introduction

The purpose of the Daisy Repository Server is managing documents. This document will describe the structure (or features) of such documents.

The diagram below gives an overview of the document structure, this will be further explained in the remainder of this document.

Document Structure

Documents & Document Variants

A document in itself has very little properties, the real meat is in the document variants. A document never exists without at least one document variant. On the other hand, using variants is optional, in which case you could consider a document and a document variant to be the same.

The details of working with variants are described in another document. For now, it suffices to know that in a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given, and you'll only work with document IDs, so it is as if the existence of variants is transparent.

Many times when we speak about a document in Daisy, we implicitely mean "a certain variant of a document" (a "document variant").

Refer to the diagram above to see if a certain aspect applies to a document, a document variant, or a version of a document variant.

A document is always retrieved from the repository in combination with a document variant, a document in itself without a variant cannot be retrieved. This is, among other things, because the access rights to a document are based on information that is part of the document variant (it can thus be that a user has access to one document variant but not to another). Another way to look at this is that there are only document variants, and that certain properties of them are shared across the variants.

Document Types, The Repository Schema

The main "data" of a document is contained in its so-called parts and fields. Parts can contain arbitrary binary data, and fields contain 'simple' information of a certain type (string, date, decimal, ...). Which parts and fields a document can have is determined by the document's document type. A document type is actually a combination of zero or more part types and zero or more field types, which further describe these aspects. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

Repository Schema Structure

Common aspects of document, part and field types

Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.

Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for "fr-BE " would mean it is in french, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localisation, you can simply enter them for the empty locale.

Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.

Document types

A document type combines a number of part types and field types, and indicates for each of these if it is required or not.

Part types

Before going into the details of part types, it might make sense to justify their existence. In many document repository systems, each document has simply one 'content chunk'. For example, a resource addressable over webdav is one atomic piece of data. Daisy allows a document to consist of multiple parts. This makes these parts seperately addressable and retrievable. For example, suppose we have a document type consisting of a part "Abstract" and a part "Main Content". It is then simple to retrieve the abstracts of all documents conforming to this document type. As another example, for an "Image" document type we could have parts "ImageData" containing a rendered form of the image, "ImageSource" containing the original source (eg a Photoshop or CorelDraw file), and "Thumbnail" containing a small rendition of the image.

A part instance consists of some binary data (or if you wish, data which is treated as binary, it could be plain text of course), and the mime-type of the data. A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.

The Daisy HTML flag and Link Extraction

A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements should not be in any namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limitted to text/xml. To the repository server, the Daisy HTML flag indicates that link extraction should be performed on the part. The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts.

If the "Daisy HTML" flag is on, the repository server will perform link extraction on the data. More specifically, it will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:

daisy:<document id>
or
daisy:<document id>@<branch id or name>:<language id or name>:<version id>

Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST". A link without a version specification denotes a link to the live version of the document. The branch, language and version parts are all optional. For example, daisy:15@:nl is a link to the dutch version of document 15.

Using the Daisy Query Language, it is possible to retrieve the content of parts which have the Daisy HTML flag set to true. The content of the part will simply be directly embedded in the XML result returned from the query operation.

Field types

Value Type

The most important thing a field type tells about a field is its Value Type. A Value Type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.

Value Type Name

Corresponding Java class

string

java.lang.String

date

java.util.Date

datetime

java.util.Date

long

java.lang.Long

double

java.lang.Double

decimal

java.math.BigDecimal

boolean

java.lang.Boolean

Multi-Value

A field type can specify that it concerns a multi-value field, thus that fields of that type can have multiple values. All of the values of the field should be of the same value type.

A multi-value field can have more then once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form are an ordered list.

Selection Lists

It is possible to define a selection list for a field type. This is a list of possible values that an enduser can choose from when completing the field.

ACL allowed flag

In the access control system, it is possible to define access rules for documents by using an expression to select the documents to which the access rules apply. In these expressions, it is also possible to check the value of fields, but only of fields whose field types' ACL allowed flag is set to true. The ACL allowed flag also enables the front-end to indicate that changing the value of that particular field can influence the access control checks.

Size hint

A field can have have a size hint, this is simply an integer number. This information is used by the front end to display an input field of an appropriate width. The repository server doesn't associate any further meaning to it, it doesn't cause any validation to happen, nor does it specify the unit of the width (most likely to be "number of characters").

Document and document type association, how changes to document types are handled

Upon creation of a document, a document type must be supplied. When saving a document, the repository will check that the document conforms to its document type. Thus it will check that all required fields and parts are present, and that there are no parts and fields in the document that are not allowed by the document type.

The document type of a document can be changed at any time. This is useful if you start out with a generic document type but later want to switch to a more specialized document type.

The definition of a document type can be changed at any time. Part and field types can be added or removed from it, or can be made required. A logical question that pops up is what happens to exisiting documents in the repository that use that document type. The answer is basically "nothing". If for example a required field is added to a document type, then the next time a document of that type is edited, it will fail to save unless a value for the field is specified. The newly saved version of the document will then conform to the new state of the document type. Older versions of the document will remain unchanged however. When saving a document, it is also possible to supply an option that tells not to do the document type conformance check.

So basically the document type system doesn't give any guarantees about the structure of the documents in the repository, but rather hints at how the documents should be structured and interpreted.

Documents

A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards. In other words, it provides a history of who made what changes at what time.

When a document is saved for the first time, it is assigned a unique, numeric ID. The ID is just a sequence counter, so the first created document gets ID 1, then 2, and so on. The ID of a document never changes. The user who creates the document is the owner of the document. The date and time of document creation is also stored.

When creating a document, its document type must be specified. The document type can afterwards be changed.

Daisy has no directories like a filesystem. Everything is just in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching, or browsing through navigation trees.

Versioned Content

The versioned content of a document consists of the following:

  • the document name
  • the parts
  • the fields
  • the links

So if any changes are made to any of these, and the document is stored, a new version is created.

Version ID

Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.

Document Name

The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.

The name is usually also rendered as the title of the document.

Parts

The parts. Each part is associated with a part type and has a mime type and some data. There cannot be two parts of the same part type in one document.

Each part can optionally have a file name, this file name can be used as default file name when the content of the part is saved (downloaded) in a file.

Fields

The fields. Each field is associated with a field type and specifies the field value. There cannot be two fields of the same field type in one document.

Links

A document can contain two kinds of links: links can occur as content of a part (for example, an <a> element in HTML), and a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.

Out-of-line links are useful in case you want to link to related documents and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.

Version State

Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying.

Non-versioned properties

Collections and collection membership

Collections are sets of documents. A document can belong to one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way.

Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.

Custom fields

Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the ealier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.

Private

A document marked as private can only be read (and written) by its owner.

While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.

There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.

Retired

If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.

The retired flag can be set on or off at any time, retiring is not a one-time operation.

Lock

A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.

Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.

A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock then isn't really a lock, it is just an informational mechansism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.

A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.

For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them when the user keeps editing.

A lock can be removed either by the person who created it, or by an Administrator.

Owner

The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.

Last Modified and Last Modifier

Each time a document is saved, the user id of the person who saved it is stored as the last modifier, and the date and time of the save operation as the "last modified" time. Each document variant also has their own Last Modified and Last Modifier information. For document variants, this will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.

Variants

Introduction

The variants feature of Daisy allows to have multiple alternatives of a document stored in one logical document, thus identified by one uique ID.

Daisy allows to have variants among two axes:

  • branches
  • languages

For example, if there would not be a variants feature, and you had the same content in different languages, for each of these languages you would need to create a different document, thus with a different ID.

Language variants are quite obvious, but you may wonder what branches are. The purpose of branches is to have multiple parallel editable versions of the same content. As an example, take the Daisy documentation. Between major Daisy releases there might be quite some changes to the documentation. However, while creating the documentation of e.g. Daisy 1.3, we still want the ability to update the documentation of Daisy 1.2. Sure, this could be solved by duplicating all documentation documents for each new release, but then the identity of these documents would be lost since they get new IDs assigned, and the relationship between the documents in different releases would be lost.

Defining variants

By default, Daisy predefines one branch and one language variant: the branch main and the language default.

You can yourself define other ones, in the Daisy Wiki you can do this via the administration screens.

The definition of a branch or language consists of a numeric ID (assigned by the repository server), a name and optionally a description. Internally, the ID is used, but towards the user mostly the name is shown.

The built-in main branch and default langauge each have as ID 1.

Once a branch and/or language is defined, you can create new document variants using them.

Defining the branches and languages is something that can only be done by users who have the Administrator role, but adding variants to documents (which is almost the same as creating documents) can of course be done by any user, as far as the ACL allows the user to do so.

Deleting a branch or language definition is only possible when there are no more document variants for that branch or language. You can easily delete all document variants for a certain branch or language using the Document Task Manager, similarly to what is described further on for creating a variant across a set of documents.

Creating a variant on a document

When adding a new variant to a document, this can be done in two ways:

  1. from scratch
  2. based on the content of (a certain version of) an existing variant

When you opt for the second option (which is mostly done when creating branch-variants) then the (branch,language,version)-tripple from which the content is taken will be stored as part of the new variant, so that later on you can see from where this variant "branched" (in the Daisy Wiki, this information is shown on the version list page).

In the Daisy Wiki, there is an "Add Variant" action that allows to add a new variant to a document.

Searching for non-existing variants

When translating a site, it can be useful to search which documents are not yet translated in a certain language. Similarly, it can be useful to see which documents exist on one branch but not on another. For this purpose, the query language provides a function called DoesNotHaveVariant(branch, language).

For example, to search on the Daisy site for all documents that have been added in the documentation of version 1.3 compared to 1.2, you can use the following query:

select id, name
  where
   InCollection('daisydocs')
   and branch = 'daisydocs-1_3' and language = 'en'
   and DoesNotHaveVariant('daisydocs-1_2', 'en')

Queries embedded in documents

When using queries embedded in documents together with variants, usually you will want to limit the query results to variants with the same branch and language as the one containing the query. You could specify these explicitely, as in:

select id, name where <conditions> and branch='my_branch' and language='my_lang'

However, this means that you will need to adjust these queries when adding new variants to the document. Especially if you are adding a certain branch to a set of documents, this is not something you want to do. Therefore, it should be possible to refer to the branch and language of the containing document. This feature is not yet available in Daisy 1.3, but will be added in a future release.

Creating a variant across a set of documents

When using branches, you will often want to add a variant for that branch to a set of documents (in other words: create a branch across a set of documents). To avoid the need to do this one-by-one for each document, Daisy has a "Document Task Manager" which allows the exeuction of a certain task on a set of documents. And that task could for example be "adding a new variant".

The Document Task Manager is also covered by a separate document, here we will just focus on how to use it to create a new variant.

Before using the Document Task Manager, be sure you have defined the new branch (or language) using the administration screens.

In the Daisy Wiki, the Document Task Manager is accessed via the drop-down User-menu (in the main navigation bar). Select the option to create a new task. You are then first presented with a screen where you need to specify the documents (document variants actually) with which you want to do something. As you can see, it is possible to add documents using queries. For example, for the Daisy site, when we want to create a branch starting from the Daisy 1.2 documentation, we would use a query like:

select id, name where InCollection('daisydocs') and branch = 'daisydocs-1_2' and language = 'en'

Once you selected the documents, press Next to go to the next page where the action to be performed on the documents is specified. For Type of task choose Simple Actions. Then press the Add button to add a new action. Change the type of the action to Create Variant (if necessary), and specify the branch and language you want to create. Finally press start to start the task. You can then follow up on the progress of this operation, and check if it finished successfully for all documents.

Document Comments

This document is about document comments: comments that can be added to Daisy documents. More precisely, they are actually added to document variants, thus each document variant has its own comments.

Comment features

The current Daisy comments system is rather simple (text-only comments, no editing after creation, no threading) but nonetheless very useful.

Comment visibility

Each comment has a certain visibility:

  • public comments: everyone who can read the document ('read live' permission) can see them,
  • editors-only comments: only users who have write access to the document can see them,
  • private comments: only the creator of the comment can see them.

Creation of comments

Everyone who has read access to a document can add comments to it. Editors-only comments can however only be created by users with write access to the document.

Deletion of comments

Comments can be removed from a document by the users who have write access to the document (this includes users acting in the Administrator role). Private comments can be deleted by its creator, independent of whether that user has write access to the document the comment belongs too.

When a document is deleted, all its associated comments are removed too, including private ones that the deleter of the document may not be aware of.

Daisy Wiki specific notes

Guest user cannot create comments

The guest user, though it is for the repository server an ordinary user like any other, is not allowed to create comments via the Daisy Wiki. This means that to create comments, users should first log in.

'My Comments' page

Users can get a list of all the private comments they added to documents via a "My Comments" page (accessible via the drop-down menu behind the user name).

Query Language

Introduction

The Daisy Query Language can be used to search for documents (more precisely, document variants). Queries can be used in various places:

  • explicitely via the "Query Search" page
  • embedded inside documents
  • embedded inside navigation trees

The implementation of various Daisy features is also based on queries, such as the recent changes page or the referrers page. And of course it is possible to execute queries from your own applications, using the HTTP interface or Java API.

The query language is a somewhat SQL-like language that allows to search on various document properties (including the fields), fulltext on the part content, or a combination of those. The sort order of the results can also be defined. The resulting document list is filtered to only include documents to which the user has at least read access.

An example query, searching all documents in a collection call "mycollection":

select id, name where InCollection('mycollection') order by name

Internally, non-fulltext queries are translated to SQL and executed on the database and fulltext queries are executed by Jakarta Lucene.

Allthough the query language is somewhat SQL-like, it hides the complexity of the actual SQL-queries that are performed by the repository server on the relational database, which can quickly grow quite complex.

Note: everytime in this document when we talk about "searching documents", this is equivalent to "searching document variants". The result of query is a set of document variants, i.e. each member of the result set is identified by a tripple (document ID, branch, language).

Query Language

General structure of a query

select
  ...
where
  ...
order by
  ...
limit x
option
  ...

The select and where parts are required, the rest is optional. Whitespace is of no importance.

The select part

The select part should list one or more identifiers, separated by commas. Available identifiers are listed further on.

The where part

The where part should contain a conditional expression, thus an expression which tests the value of identifiers using operators, or uses some built-in functions.

Besides the identifiers listed in the table below, the opertions AND and OR are supported, and parentheses can be used for grouping.

Operators & datatypes

string

long

double

decimal

date

datetime

boolean

=

X

X

X

X

X

X

X

!=

X

X

X

X

X

X

X

<

X

X

X

X

X

X

>

X

X

X

X

X

X

<=

X

X

X

X

X

X

>=

X

X

X

X

X

X

[NOT] LIKE

X

[NOT] BETWEEN

X

X

X

X

X

X

[NOT] IN

X

X

X

X

X

X

IS [NOT] NULL

X

X

X

X

X

X

X

Wildcards for LIKE are _ and %, escape using \_ and \%.

All keywords such as AND, LIKE, BETWEEN, ... can be written in either uppercase or lowercase (but not mixed case).

If these operators are used on multi-value fields, they return true if at least one of the values of the multi-value field satisfies. See further on for a set of conditions specifically for multi-value fields.

Identifiers

The table below lists the available identifiers.

Some notes:

  • identifier names are case sensitive
  • non-searcheable identifiers are identifiers which can only be used in the select clause of the query, not in the where clause
  • the datatype symbolic means it should be a string, but the string is internally translated into another code. For example, when searching on ownerLogin, the given string is internally translated to a user id, which is then used when performing the database search. This means that certain operators will not work on it (such as like, less then, greater then, ...)
  • version dependent means that the searched or retrieved data is version dependent data. By default this will search in, or retrieve data from, the live version of the document, but by specifying the query option search_last_version (see further on) the last version can also be searched.
  • the names in italic, i.e. partTypeName, fieldTypeName and customFieldName must be replaced by an actual name.

name

searchable

datatype

version dependent

remarks

id

yes

long

no

name

yes

string

yes

branch

yes

symbolic

no

branchId

yes

long

no

language

yes

symbolic

no

languageId

yes

long

no

documentType

yes

symbolic

no

versionId

yes

long

yes

ID of the live version, or if the query option search_last_version is specified, of the last version

creationTime

yes

datetime

no

ownerId

yes

long

no

ownerLogin

yes

symbolic

no

ownerName

no

string

no

summary

no

string

no

always of last published version

retired

yes

boolean

no

private

yes

boolean

no

lastModified

yes

datetime

no

lastModifierId

yes

long

no

lastModifierLogin

yes

symbolic

no

lastModifierName

no

string

no

variantLastModified

yes

datetime

no

variantLastModifierId

yes

long

no

variantLastModifierLogin

yes

symbolic

no

variantLastModifierName

yes

string

no

%partTypeName.mimeType

yes

string

yes

%partTypeName.size

yes

long

yes

%partTypeName.content

no

xml

yes

only works for part types for which the flag 'daisy html' is set to true, and additionally the actual part must have the mime type 'text/xml'

versionCreationTime

yes

datetime

yes

versionCreatorId

yes

long

yes

versionCreatorLogin

yes

symbolic

yes

versionCreatorName

yes

string

yes

versionState

yes

symbolic

yes

'draft' or 'publish'

totalSizeOfParts

yes

long

yes

sum of the size of all parts in document

versionStateLastModified

yes

datetime

yes

lockType

yes

symbolic

no

'pessimistic' or 'warn'

lockTimeAcquired

yes

datetime

no

lockDuration

yes

long

no

(in milliseconds)

lockOwnerId

yes

long

no

lockOwnerLogin

yes

symbolic

no

lockOwnerName

no

string

no

collections

yes

symbolic

no

The collections (the names of the collections) the document belongs too. Behaves the same as a multi-value field with respect to applicable search conditions.

collections.valueCount

yes

symbolic

no

The number of collections a document belongs too.

$fieldTypeName

yes

yes

datatype depends on field type

$fieldTypeName.valueCount

yes

long

yes

Useful for multi-value fields. Searching for a value count of 0 does not work, use the "is null" condition instead.

#customFieldName

yes

string

no

Literals

String literals

Strings (text) should be put between single quotes, the single quote is escaped by doubling it, for example:

'''t is mooi weer vandaag'

Numeric literals

These consists of digits (0-9), the deicmal separator is a dot (.).

Numeric literals can be put between single quotes like strings, but it is not required to do so.

Date & datetime literals

Date format: 'YYYY-MM-DD'

Datetime format: 'YYYY-MM-DD HH:MM:SS'

Special conditions for multi-value fields

$fieldName has all (value1, value2, value3, ...)

Tests that the multi-value field has all the specified values (and possibly more).

$fieldName has exactly (value1, value2, value3, ...)

Tests that the multi-value field has all the specified values, and none more. The order is not important.

$fieldName has some (value1, value2, value3, ...)
or
$fieldName has any (value1, value2, value3, ...)

has some and has any are synomyms. They test that the multi-value field has at least one of the specified values.

$fieldName has none (value1, value2, value3, ...)

Tests that the multi-value field has none of the specified values.

In addition to these conditions, you can use is null and is not null to check if a document has a certain (multi-value) field. The special sub-identifier $fieldName.valueCount can be used to check the number of values a multi-value field has.

Other special conditions

InCollection

InCollection('collectionname' [, collectioname, collectionname])

Searches documents contained in at least one of the specified collections. To search documents that occur in multiple collections (thus in the intersection of those collections), use the function InCollection multiple times with AND in between: InCollection('collection1') and InCollection('collection2'). This also works for OR but in that case it is more efficient to give the collections as arguments to one InCollection call.

Instead of the InCollection condition, you can use the collections identifier in combination with the multi-value field search conditions such as has some, has all or has none for more powerful search possibilities. The InCollection condition predates the existence of multi-value fields, but remains supported.

LinksTo, LinksFrom, LinksToVariant, LinksFromVariant

LinksTo(documentId, inLastVersion, inLiveVersion)
LinksFrom(documentId, inLastVersion, inLiveVersion)
LinksToVariant(documentId, branch, language, inLastVersion, inLiveVersion)
LinksFromVariant(documentId, branch, language, inLastVersion, inLiveVersion)

Searches documents which link to or from the specified document (or document variant). The other two parameters, inLastVersion and inLiveVersion, are interpreted as booleans: 0 is false, any other (numeric) value is true.

If inLastVersion is true, only documents whose last version link to the specified document are included.

If inLiveVersion is true, only documents whose live version link to the specified document are included.

If both parameters are true or both are false, all documents are returned for which either the last or live version link to the specified document.

IsLinked, IsNotLinked

IsLinked()
IsNotLinked()

IsLinked() evaluates to true for any document which is linked by other documents, IsNotLinked() evaluates to true for any document that is not linked from any other document (thus not reachable by following links in documents or the navigation tree).

HasPart

HasPart('partTypeName')

Searches documents which have a part of the specified part type. This search is version-depedent.

HasPartWithMimeType

HasPartWithMimeType('some mimetype')

Searches documents having a part with the given mime type. This search is version-dependent. This uses a 'like' condition, thus the % wildcard can be used in the parameter. For example, to search all images: HasPartWithMimeType('image/%')

DoesNotHaveVariant

DoestNotHaveVariant(branch, language)

Searches documents that do not have the specified variant. See also the page on variants for more information.

Full text queries

For full text queries, the where part takes a special form. There are two possibilities: either only a full text search is performed, or the fulltext query is further restricted using 'normal' conditions. The two possible forms are:

... where FullText('word')
or
... where FullText('word') AND <other conditions>
for example:
... where FullText('word') AND $myfield = 'abc' AND InCollection('mycollection')

Note that the combining operator between the FullText condition and other conditions is always AND, thus the result of the full text query is further refined. The further conditions can of course be of any complexity, and can thus again contain OR.

If no order by clause is included when doing a full text query, the results are ordered according to the score assigned by the fulltext search engine.

The parameter of the FullText(...) function is a query which is passed on to the full text engine, in our case Lucene. See here.

The FullText() function can have 3 additional parameters which indicate if the search should be performed on the document name, document content or field content. By default, all three are searched. These parameters should be numeric: 0 indicates false, and any other value true.

For example:

FullText('word', 1, 0, 0)

Searches for 'word', but only in the document name.

Additionally, you can specify a branch and language as parameters to the FullText function, to specify that only documents of that branch/language should be searched. Thus the full syntax of the FullText function is:

FullText(lucene query, searchInName, searchInContent, searchInFields, branch, language)

Specifying the branch and language as part of the FullText function is more more efficient then using:

FullText(lucene query) and branch = 'my_branch' and language = 'my_language'

The order by part

The order by part is optional.

The order by part contains a comma separated listing of identifiers, each of these optionally followed by ASC or DESC to indicate ascending (the default) or descending order. The identifiers listed here have no connection with those in the select-part, i.e. it does not have to be subset of those.

"null" values are put at the end (when using ASC order).

The limit part

This can be used to limit the number of results returned from a query. This part is optional.

The option part

The option part allows to specify options that influence the execution of the query. The options are defined as:

option_name = 'option_value' (, option_name = 'option_value')*

Supported options:

name

value

default

include_retired

true/false

false

search_last_version

true/false

false

style_hint

(anything)

(empty)

include_retired is used to indicate that retired documents should be included in the result (by default they are not).

search_last_version is used to indicate that the last version of metadata should be searched and retrieved, instead of the live version. When using this, documents that do not have a live version will also be included in the query result (otherwise they are not included). Full text searches are always performed on the live data, regardless of whether this option is specified.

style_hint is used to supply a hint to the publishing layer for how the result of the query should be styled. The repository server does not do anything more then add the value of this option as an attribute on the generated XML query results (<searchResult styleHint="my hint" ...). It is then up to the publishing layer to pick this up and do something useful with it. For how this is handled in the DaisyWiki, see the page on Query Styling.

Example queries

List of all documents

select id, name where true

Search on document name

select id, name where name like 'p%' order by creationTime desc limit 10

Show the 10 largest documents

select id, name, totalSizeOfParts where true order by totalSizeOfParts desc limit 10

Show documents of which the last version has not yet been published

select id, name, versionState, versionCreationTime
  where versionState = 'draft' option search_last_version = 'true'

Overview of all locks

select id, name, lockType, lockOwnerName, lockTimeAcquired, lockDuration
  where lockType is not null

All documents having a part containing an image

select id, name where HasPartWithMimeType('image/%')

Full Text Indexer

Full text indexing in Daisy happens automatically when document variants are updated, so you do not need to worry about updating the index yourself. Technically, the full text indexer has a durable subscription on the JMS events generated by the repository, and it are these events which trigger the index updating.

Technology

Daisy uses Jakarta Lucene as full-text indexer.

Included content

Only document variants which have a live version are included in the full text index. Thus retired document variants or document variants having only draft versions are not included. It is the content of the live version which is indexed, thus full text search operations always search on the live content.

For each document variant, the included content consists of the document name, the value of string fields, and text extracted from the parts. For the parts, text extraction will be performed on the data if the mime type is one of the following:

Mime type

Comment

text/plain

text/xml

e.g. the "Daisy HTML" parts

application/xhtml+xml

XHTML documents

application/pdf

PDF files

application/vnd.sun.xml.writer

OpenOffice Writer files

application/msword
application/vnd.ms-word

Microsoft Word files

application/mspowerpoint
application/vnd.ms-powerpoint

Microsoft Powerpoint files

application/msexcel
application/vnd.ms-excel

Microsoft Excel files

Support for other formats can be added by implementing a simple interface. Ask on the Daisy Mailing List if you need more information about this.

Index management

It is possible to trigger optimisation of the index, and rebuilding of the index, through the JMX management interface (accessible through your web browser, runs by default on port 9264). Rebuilding the index can be useful for example when a new version of Daisy has support for new data formats. When rebuilding the index you can select the documents to be re-indexed using a query, or simply re-index all documents. For example, to re-index all PDF files you could enter the query "select id where HasPartWithMimeType('application/pdf')" (what you put in the select-part does not matter).

User Management

All operations done on the Daisy Repository Server are done as a certain user acting in a certain role(s). For this purpose, the Repository Server has a user management module to define the users and the roles. The authentication of the users is done by a separate component, allowing to plug in custom authentication techniques.

User Management

Users and roles are uniquely and permanently identified by a numeric ID, but they also have respectively a unique login and unique name.

A user has one or more roles. After logging in, it is both possible to have just one role active and let the user manually switch between his/her roles, or to have all roles of a user active at the same time (which is the behaviour traditionally associated with user groups). If a user has a default role, this role will be active after login. If no default role for the user is specified, all its roles will become active after login, with the exception of the Administrator role (if the user would have this role). This is because the Administrator role allows to do everything, which would then defeat the purpose of having other roles. If the user only has the Administrator role, then obviously that one will become active after login.

Users have a boolean flag called updateable by user: this indicates whether a user can update his/her own record. If true, a user can change its first name, last name, email and password. Role membership can of course not be changed, and neither can the login. It is useful to set this off for "shared users", for example the guest user in the Daisy Wiki application.

User and role

The Confirmed and ConfirmKey fields are used to support the well-known email-based verification mechanism in case of self-registration. If the Confirmed flag is false a user will not be able to log in.

The Administrator role

The repository server has one predefined role: Administrator (ID: 1). People having the role of Administrator as active role have a whole bunch of special privileges:

  • they can access all documents in the repository and perform any operation on them. Thus the access control system doesn't apply to them.
  • they can change the repository schema, manage users, manage collections, and manage the access control configuration.

Predefined users and roles

$system

$system is a bootstrap user internally needed in the repository. The user $system cannot log in, so its password is irrelevant. This user should not (and cannot) be deleted, nor should it be renamed. Simply don't worry about it.

internal

The user "internal" is a user created during the initialisation of the Daisy repository. The user is used by various components that run inside the repository server to talk to the repository. By default, we also use this user in the repository client component that runs inside Cocoon, which needs a user to update its caches.

The internal user has (and should have) the Administrator role.

During installation, this user gets assigned a long random generated password (you can see it in the myconfig.xml or cocoon.xconf).

guest user and guest role (Daisy Wiki)

The Daisy Wiki predefines a user called guest and a role called guest. This user has the password "guest". This is the user that becomes automatically active when surfing to the Daisy Wiki application, without needing to log in. After initialisation of the Daisy Wiki, the ACL is configured to disallow any write operations for users having the guest role.

registrar (Daisy Wiki)

The registrar user is the user that will:

  • create and update user accounts during the self-registration
  • reset passwords and do email-based lookup of logins in case of forgotten passwords or logins

During installation, this user gets assigned a long random generated password (you can see it in the cocoon.xconf).

Authentication Schemes

Daisy provides its own password authentication, but it is also possible to delegate the authentication to an external system. At the time of this writing, Daisy ships with support for authentication using LDAP and NTLM. It is possible to configure multiple authentication schemes and to have different users authenticated against different authentication schemes.

The authentication schemes are configured in the myconfig.xml file (which is located in <daisy-data-dir>/conf). Just search on "ldap" or "ntlm" and you'll see the apropriate sections. After making changes there, you will need to restart the repository server. To let users use the newly defined authentication scheme(s), you need to edit their settings via the user editor on the administration pages.

Daisy does not do automatic synchronisation of user information (such as updating the e-mail address based on what is stored in LDAP), but it is possible to auto-create users on first log in. This means that when a user logs in for the first time in Daisy, and does not yet exist in Daisy, an authentication scheme is given the possiblity to create the user (if it exist in the external system). To enable this feature, search in the myconfig.xml file for "authenticationSchemeForUserCreation".

To debug authentication problems, look at the log files in <daisy-data-dir>/logs/daisy-request-errors-<date>.log. Problems in the configuration of the authentication schemes do not ripple through over the HTTP interface of the repository, thus are not visible in the Daisy Wiki.

Implementing new authentication schemes

Implementing a new authentication scheme is done by:

  • making an implementation of the following inteface:
    org.outerj.daisy.authentication.AuthenticationScheme
  • making a corresponding factory class
  • declaring the new component in the block.xml file

For an example, you can look at the sources of the LDAP or NTLM authentication. The NTLM authentication in particular is splitted of from the main repository server sources and can be found in the directory services/ntlm-auth.

To create a new authentication scheme, you do not need to recompile Daisy from source. You will need the following jars in the classpath to compile the your new authentication scheme classes:

avalon-framework-api-<version>.jar
daisy-repository-api-<version>.jar
diasy-repository-server-spi-<version>.jar

Access Control

Introduction

This document explains Daisy's features for access control: the authorisation of document operations such as read and write.

While we usually talk about documents, technically the access control happens on the document variant level: a user is granted or denied access to a certain document variant.

In many systems, access control is configured by having access control lists (ACLs) attached to documents. These ACLs contain access control rules which tell for a certain users or roles (groups) what operations they can or cannot perform.

For Daisy, it was considered to be too laborious to manage ACLs for each individual document. Therefore, there is one global ACL, where you can select sets of documents based on an expression and then define the access control rules that apply to these documents.

Structure of the ACL

The structure of the ACL is illustrated by the diagram below.

ACL structure

In ACL terminology, an object is the protected resource, and a subject is an entity wanting to perform an operation on the object. The objects in our case are documents, selected using an expression. The subjects are users, which can be living organisms, usually humans, or programs acting on behalve of them.

As will become clear when reading about the evaluation of the ACL below, the order of the entries in the ACL is important.

Object specification

The expression used to select documents in the object specification uses the same syntax as in the where clause of an expression in the Daisy Query Language. However, the number of identifiers that are available is severely limited. More specifically, you can test on the following things:

  • the document type
  • collection membership (using the InCollection function)
  • document ID (to have rules specific to one document)
  • fields for which the ACL-allowed flag of the field type is set to true
  • the branch and language

Some examples of expressions:

InCollection('mycollection')

documentType = 'Navigation' and InCollection('mycollection')

$myfield = 'x' or $myotherfield = 'y'

For the evaluation of these expression, the data of the fields in the last version is used, not the data from the live version.

Access Control Entry

See diagram.

If the subject type is everyone, the subject value should be set to -1.

If you give 'read live' rights to someone, they are able to:

  • read 'live data' of documents, this means: all non-versioned data, and the data from the live version.
    Access to retired documents is denied.
    Getting the list of versions of the document is not allowed.
    In query results, documents without a live version will not appear (if the option search_last_version is specified, documents only appear if the last version is the live version)
  • add comments to documents

If you give 'read' rights to someone, they have full read access to the document (thus they can view all versions and the list of versions).

If you give 'write' rights to someone, they are able to:

    • create documents
    • update (save) documents
    • take a lock on the documents

The 'delete' right gives users the possibility to delete documents or document variants.

If you give 'publish' rights to someone, they are able to change the publish/draft state of versions of documents.

Staging and Live ACL

In Daisy, there are two ACLs: a staging ACL and a live ACL. Only the staging ACL is directly editable. So it is required to first edit the staging ACL, and then put it live (= copying the staging ACL over the live ACL). It is possible to first test the staging ACL: you can give a document id, a role id and a user id and get the result of ACL evaluation in return, including an explanation of which ACL rules made the final decission. In the Daisy Wiki front end, all these operations are available from the administration screens. It is recommended that after editing the ACL, you first test it before putting it live, so that you are sure there are no syntax errors in the document selection expressions.

Evaluation of the ACL: how is determined if someone gets access to a document

The determination of the authorisation of the various operations for a certain document happens as follows:

    1. If the user is acting in the role of Administrator, the user has read and write rights. The ACL is not checked.
    2. If the user is owner of the document, the user has read and write rights (the ACL is not checked). Publish rights are determined by the ACL.
    3. If the document is marked as private and the user is not the owner of the document, all rights are denied. The ACL is not checked.
    4. The ACL result is initialised to deny all access (read live, read, write, publish and delete), and the ACL is evaluated from top to bottom:
      • If an object expression evaluates to true for the document, the access control entries belonging to that object specification are checked
      • If the subject type and subject value of an access control entry matches, the permissions defined in that entry override any previous result
      • The evaluation of the ACL does not stop at the first matching object or subject, but goes further till the bottom.
    5. At the end of the ACL evaluation some further checks are performed:
      • if the user does not have 'read live' rights, any other rights are denied too
      • if the user does not have read rights, the write and publish rights are denied too
      • if the user does not have write rights, the delete right is denied too.

Further notes:

    • when saving a document, the ACL is always checked on the document currently stored, not on the newly edited document (unless it is a new document). This is because the ACL evaluation result can depend on the value of fields, and the user might have edited those fields to try to gain access to the document.
    • A user cannot change a document in such a way that the user itself has no write rights anymore to the document, e.g. by changing collection membership or field values.
    • The ACL is only concerned with authorisation of rights on documents. Other permissions, like who can manage users, change the ACL, create document types, etc... is simply managed via the Administrator role: users acting in the Administrator role can do all those, others can't.

Other security aspects

This document only discussed authorisation of operations on documents for legitimate users. Other aspects of security include:

    • authentication: see User Management
    • audit logging: since Daisy generates JMS events for all (write) operations happening on the repository, you could get a full audit log by logging all these events. The content of these events are XML descriptions of the changes (usually an XML dump of the entity before and after modification)
    • physical protection of the data: if someone can access the filesystem on which the parts are stored, or the relational database, they can see and/or modify anything
    • integrity: hasn't anyone been altering the data before delivery to the user. Here the use of https can help.

Email Notifier

General

Daisy can send out emails when changes are made to documents. To make use of this the SMTP host must be correctly configured, which is usually done as part of the installation, but can be changed afterwards (see below). In the Daisy Wiki, individual users can subscribe to get notifications by selecting the "User Settings" link, making sure their email address is filled in, and checking the checkbox next to "Receive email notifications of document-related changes.".

Users will only receive events of documents to which they have at least read (not 'read live') access rights. It is possible to receive notifications for individual documents, for all documents belonging to a certain collection, or for all documents. The mails will notify document creation, document updates or version state changes.

While we usually talk about documents, the actual notifications happen on the document variant level.

As you can see on the User Settings page, it is also possible to subscribe to other events: user, schema, collection and ACL related changes. However, for these events proper formatting of the mails is not yet implemented, they simply contain an XML dump of the event.

Configuration

Configuration of the email options happens in the <DAISY_DATA>/conf/myconfig.xml file. There you can configure:

  • the SMTP server
  • the from address for the emails
  • the URLs for documents, so that the URL of the changed document can be included in the emails

After making any changes to the myconfig.xml file, the repository server needs to be restarted.

Implementation notes

The email notifier is an extension component running inside the repository server. It is independent of the Daisy Wiki. The email notifier provides a Java API for managing the subscriptions, as well as additions to the HTTP+XML interface (logical, because that's how the implementation of the Java API talks to the repository).

Document Task Manager

The purpose of the Document Task Manager (DTM) is to perform a certain task across a set of documents. The DTM is an optional component running inside the Daisy Repository Server. Some of its features are:

Tasks are executed in the background, inside the repository server. Thus the user (a person or another application) starting the task does not have to wait until it is completed, but can do something else and check later if the task ended successfully.

The execution progress of the task is maintained persistently in the database. For each document on which the task needs to be executed, you can consult whether it has been performed successfully, whether it failed (and why), or whether it still has to be executed. Since this information is tracked persistently in the database, it is not lost in case the server would be interrupted.

Tasks can be interrupted. Since the task is performed on one document after another, it is easily possible to interrupt between two documents.

Tasks can be written in Javascript or be composed from built-in actions. Executing custom Javascript-based tasks is only allowed by Administrators, since there is a certain risk associated with it. For example, it is possible to write a task containing an endless loop which would only be interruptible by shutting down the repository server, or a task could call System.exit() to shut down the server.

The execution details of a task, which are stored in the database, are cleaned up automatically after two weeks (by default), and can of course also be deleted manually.

The DTM is accessible via the HTTP API and the Java API.

The Daisy Wiki contains a frontend for starting new tasks and consulting the execution details of existing tasks.

Ideas for the future:

  • scheduled execution of document tasks
  • adding more built-in actions (the ones currently available are mainly to support working with document variants)

Scripting the repository using Javascript

Introduction

Rhino, a Java-based Javascript implementation, makes it easy to use the Java API of the repository server to automate all kinds of operations. In other words: easy scripting of the repository server. It brings all the benefits of Daisy's high-level repository API without requiring Java knowledge or the setup of a development environment.

How does it work?

  1. Write a Javascript, save it in a ".js" file.
  2. Open a command prompt or shell, set the DAISY_HOME environment variable to point to your Daisy installation
  3. Go to the directory <DAISY_HOME>/bin
  4. Execute "daisy-js <name-of-scriptfile>"

Connecting to the repository server from Javascript

The basic code you need to connect to the repository server from Javascript is the following:

importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("testuser", "testuser"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));

Some explanation:

The importPackage and importClass statements are used to make the Daisy Java API available in the Javascript environment.

Then a RepositoryManager is constructed, this is an object from which Repository objects can be retrieved. A Repository object represents a connection to the Daisy Repository Server for a certain user. Typically, you only construct one RepositoryManager, and then retrieve different Repository objects from it if you want to perform actions under different users.

The first argument of the RepositoryManager constructor is the address of the HTTP interface of the repository server (9263 is the default port). The second argument is a username and password for a user that is used inside the implementation to fill up caches. Currently, this user has to be a user which has the Administrator role. (Inside the implementation, some often needed info like the repository schema and the collections is cached)

Then from the RepositoryManager a Repository for a specific user is retrieved. Here we use the same credentials as for the cache user of the RepositoryManager, but this doesn't have to be the case.

Repository Java API documentation

Reference documentation of the Daisy API is included in the binary distribution in the apidocs directory (open the file index.html in a web browser). See also Java API.

Examples

Creating a document (uploading an image)

This example uploads an image called "myimage.gif" from the current directory into the repository.

importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("testuser", "testuser"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));

var document = repository.createDocument("My test image", "Image");
var imageFile = new java.io.File("myimage.gif");
document.setPart("ImageData", "image/gif", new FilePartDataSource(imageFile));
document.save();

print("Document created, ID = " + document.getId());

See the API documentation for the purpose of the arguments of the methods. For example, the text "Image" supplied as the second argument of the createDocument method is the name of the document type to use for the document. Likewise, the first argument of setPart, "ImageData", is the name of the part.

It would be an interesting exercise to extend this example to upload a whole directory of images :-)

Performing a query

importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);
importPackage(Packages.java.util);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("testuser", "testuser"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));
var queryManager = repository.getQueryManager();

var searchresults = queryManager.performQuery("select id, name where true", Locale.getDefault());
var rows = searchresults.getSearchResult().getRows().getRowArray();
for (var i = 0; i < rows.length; i++) {
  print(rows[i].getValueArray(0) + " : " + rows[i].getValueArray(1));
}

print("Total number: " + rows.length);

Your example here

If you've got a cool example to contribute, just write to the Daisy mailing list.

Java API

Introduction

Daisy is written in Java and thus its native interface is a Java API. This Java API is packaged separately, and consists of two jars:

daisy-repository-api-<version>.jar
daisy-repository-xmlschema-bindings-<version>.jar

The second jar, the xmlschema-bindings, are Java classes generated from XML Schemas, and form a part of the API. To write client code that talks to Daisy, at compile you need only the above two jars in the classpath (at runtime, you need a concrete implementation, see further on).

There are two implementations of this API available:

  • a local implementation, this is the actual implementation in the repository server, which does the real work
  • a remote implementation, this is an implementation that talks via the HTTP+XML protocol to the repository server

This is illustrated in the diagram below.

Daisy Client API

To be workable, the remote implementation caches certain information: the repository schema (document, field and part types), the collections, and the users (needed to be able to quickly map user IDs to user names). To be aware of changes done by other clients, the remote implementation can listen to the JMS events broadcasted by the server to update these caches. This is optional, for example a short-running client application that performs a specific task probably doesn't care much about this, especially since the cached information is not the kind of information that changes frequently. Even when JMS-based cache invalidation is disabled, the caches of a certain remote implementation instance are of course kept up-to-date for changes done through that specific instance.

Examples of applications making use of the remote API implementation are the Daisy Wiki, and the installation utilities daisy-wiki-init and daisy-wiki-add-site. Especially the source of those last two can serve as useful but simple examples of how to write client applications. The shell scripts to launch them show the required classpath libraries.

Quick introduction to the Java API

The Daisy Java API is quite high-level, and thus easy-to-use. The start point to do any work is the RepositoryManager interface, which has the following method:

Repository getRepository(Credentials credentials) throws RepositoryException;

The Credentials parameter is simply an object containing the user name and password. By calling the getRepository method, you get an instance of Repository, through which you can access all the repository functionality. The obtained Repository instance is specific for the user specified when calling getRepository. The Repository object does not need to be released after use. It is a quite lightweight object, mainly containing the authentication information.

Let's have a look at some of the methods of the Repository interface.

Document createDocument(String name, String documentTypeName);

Creates a new document with the given name, and using the named document type. The document is not immediately created in the repository, to do this you need to call the save() mehod on the Document. But first you need to set all required fields and parts, otherwise the save will fail (it is possible to circumvent this, see the full javadocs).

Document getDocument(long documentId, boolean updateable) throws RepositoryException;

Retrieves an existing document, specified by its ID. If the flag 'updateable' is false, the repository will return a read-only Document object, which allows it to return a shared cached copy. In the remote implementation, this doesn't matter since it doesn't perform any caching, but in the local implementation this can make a very huge difference.

RepositorySchema getRepositorySchema();

Returns an instance of RepositorySchema, through which you can inspect and modify the repository schema (these are the document, part and field types).

AccessManager getAccessManager();

Returns an instance of AccessManager, through which you can inspect and modify the ACL, and get the ACL evaluation result for a certain document-user-role combination.

QueryManager getQueryManager();

Returns an instance of QueryManager, through which you can perform queries on the repository using the Daisy Query Language.

CollectionManager getCollectionManager();

Returns an instance of CollectionManager, through which you can create, modify and delete document collections.

UserManager getUserManager();

Returns an instance of UserManager, through which you can create, modify and delete users.

The above was just to give a broad idea of the functionality available through the API. For more details, consult the complete JavaDoc of the API.

Writing a Java client application

Let's now look at a practical example.

Here's a list of jars you need in the CLASSPATH to use the remote repository API implementation (this list was last updated for Daisy 1.3-M1):

DAISY_HOME/lib/daisy/jars/daisy-repository-api-1.3.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-client-impl-1.3.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-spi-1.3.jar
DAISY_HOME/lib/daisy/jars/daisy-util-1.3.jar
DAISY_HOME/lib/avalon-framework/jars/avalon-framework-api-4.1.5.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-common-impl-1.3.jar
DAISY_HOME/lib/commons-httpclient/jars/commons-httpclient-2.0-rc2.jar
DAISY_HOME/lib/xmlbeans/jars/xbean-20040211.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-xmlschema-bindings-1.3.jar
DAISY_HOME/lib/concurrent/jars/concurrent-1.3.2.jar
DAISY_HOME/lib/commons-logging/jars/commons-logging-1.0.3.jar
DAISY_HOME/lib/commons-collections/jars/commons-collections-3.1.jar
DAISY_HOME/lib/daisy/jars/daisy-jmsclient-api-1.3.jar
DAISY_HOME/lib/jms/jars/jms-1.0.2a.jar

So depending on your own habits, you could set up a project in your IDE with these jars in the classpath, or make an Ant project, or whatever.

Below a simple and harmless example is shown: performing a query on the repository.

package mypackage;

import org.outerj.daisy.repository.RepositoryManager;
import org.outerj.daisy.repository.Credentials;
import org.outerj.daisy.repository.Repository;
import org.outerj.daisy.repository.query.QueryManager;
import org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager;
import org.outerx.daisy.x10.SearchResultDocument;

import java.util.Locale;

public class Search {
    public static void main(String[] args) throws Exception {
        RepositoryManager repositoryManager = new RemoteRepositoryManager(
            "http://localhost:9263", new Credentials("testuser", "testuser"));
        Repository repository =
            repositoryManager.getRepository(new Credentials("testuser", "testuser"));
        QueryManager queryManager = repository.getQueryManager();

        SearchResultDocument searchresults =
            queryManager.performQuery("select id, name where true", Locale.getDefault());
        SearchResultDocument.SearchResult.Rows.Row[] rows =
            searchresults.getSearchResult().getRows().getRowArray();

        for (int i = 0; i < rows.length; i++) {
            String id = rows[i].getValueArray(0);
            String name = rows[i].getValueArray(1);
            System.out.println(id + " : " + name);
        }

        System.out.println("Total number: " + rows.length);

    }
}

The credentials supplied in the constructor of the RemoteRepositoryManager specify a user to be used for filling the caches in the repository client. This user currently needs to have the Administrator role.

Java client application with Cache Invalidation

For long-running client applications you may want to have the caches of the client invalidated when changes happen by other users. For a code sample of how to create a JMS client and pass it on to the RemoteRepositoryManager, see JMS Cache Invalidation Sample.

For this example to run, you'll need the JMS client implementation jars in the CLASSPATH, in addition to the earlier listed jars:

DAISY_HOME/lib/daisy/jars/daisy-jmsclient-impl-1.3.jar
DAISY_HOME/lib/exolabcore/jars/exolabcore-0.3.7.jar
DAISY_HOME/lib/openjms/jars/openjms-client-0.7.6.jar

More

It might be interesting to also have a look at the notes on scripting using Javascript, since there essentially the same API is used from a different language.

HTTP API

Introduction

Daisy contains a HTTP+XML interface, which is an interface to talk to the repository server by exchanging XML messages over the HTTP protocol. This interface offers full access to all functionality of the repository.

The HTTP protocol is a protocol that allows to perform a limited number of methods (Daisy uses GET, POST and DELETE) on an unlimited number of resources, which are identified by URIs. The GET method is used to retrieve a representation of the addressed resource, POST to trigger a process that modifies the addressed resource, and DELETE to delete a resource.

With HTTP, all calls are independent of each other, there is no session with the server.

The Daisy HTTP interface listens by default on port 9263. You can easily try it out, for example if Daisy is running on your localhost, just enter the URL below in the location bar of the browser, and press enter. The browser will then send a GET request to the server. The example given here is a request to execute a query (written in the Daisy Query Language). This request doesn't require an XML payload, all parameters are specified as part of the URL. Note that spaces in an URL must be encoded with a plus symbol.

http://localhost:9263/repository/query?q=select+id,name+where+true&locale=en

The browser will ask a user name and password, enter your Daisy repository username and password (e.g., the one you otherwise use to log in on the Daisy Wiki), or use the user name "guest" and password "guest" (only works if you installed the Daisy Wiki). The browser will show the XML response received from the server (in some browsers, you might need to do "view source" to see it).

Not all operations can be performed as easily as the above example: some require POST or DELETE as method, some require an XML document in the body of the request, and some even require a multipart-formatted request body (the document create and update operations, which need to upload the binary part data next to the XML message). If you have a programming language with a decent HTTP client library, none of this should be a problem.

Authentication

All requests require authentication. Authentication is done using BASIC authentication.

If you want to log in as another role then the default role of a user, append "@<roleid>" to the login (without the quotes). Note that it must be the id of the role, not its name. For example, if your default role is not Administrator (ID: 1), but you would like to perform the request as Administrator, and your login is "jules", you would use "jules@1". When the login itself contains an @-symbol, it must be escaped by doubling it (i.e. each @ should be replaced with @@). Multiple active roles can be specified using a comma-separated list, e.g. "jules@1,105".

Robustness

The current implementation doesn't do (many) checks on the XMLs posted as part of a HTTP request. This means that for example missing elements or attributes might simply cause little-descriptive (but harmless) "NullPointerExceptions" to occur.

The reason for this is that we use the HTTP API mostly via the repository Java client, which generates valid messages for us.

Since the XML posted to a resource is usually the same as the XML retrieved via GET on the same resource, it is easy to get examples of correct XML messages. XML Schemas are also available (see further on), though being schema-valid doesn't necessarily imply the message is correct.

Error handling

If a response was handled correctly, the server will answer with HTTP status code 200 (OK). If the status code has another value, it means something went wrong.

For errors generated explicitly, or when a Java exception occurs, an XML message is created describing the exception, and is returned with a status code 202 (Accepted). The XML message consists of an <error> root element, with as child either a <description> element or a <cause> element. The <description> element contains a simple string describing the error. The <cause> element is used in case a Java exception was handled, and contains further elements describing the exception (including stacktrace), and can include <cause> elements recursively describing the "causing" exceptions of that exception. To see an example of this, simply do a request for a non-exisisting resource, e.g.:

http://localhost:9263/repository/document/99999999

(assuming there is no document with ID 99999999)

When executing a method (GET, POST, DELETE, ...) on a resource that doesn't support that method you will get status code 405 (Method Not Allowed).

Incorrect or missing authentication information will give status code 401 (Unauthorized).

Missing request parameters, or invalid ones (eg. giving a string where a number was expected) will give status code 400 (Bad Request).

Doing a request for a non-existing resource will give status code 404 (Not Found)

Intro to the reference

The rest of this document describes the available URLs, the operations that can be performed upon them, and the format of the XML messages. The descriptions can be dense, the current goal of this document is just to give a broad overview, more details might be added later. You can always ask for more information on the Daisy Mailing List.

You can also investigate how things are supposed to work by monitoring the HTTP trafic between the Daisy Wiki and the Daisy Repository Server.

Sometimes XML Schema files are referenced, these can be found in the Daisy source distribution.

Core Repository Interface

Documents

On many document-related resources, request parameters called branch and language can be added (this will be mentioned in each case). The value of these parameters can be either a name or ID of a branch or language. If not specified, the branch "main" and the language "default" are assumed.

/repository/document

This resource represents the set of all documents. GET is not supported on this resource (you can retrieve a list of all documents using a query).

POST on this resource is used to create a new document, which also implies the creation of a document variant, since a document cannot exist without a document variant. The payload should be a multipart request having one multipartrequest-part (we use this long name to distinguish with Daisy's document parts) containing the XML description of the new document, and other multipartrequest-parts containing the content of the document parts (if any). The multipartrequest-part containing the XML should be called "xml", and should conform to the document.xsd schema. The part elements in the XML should have dataRef attributes whose value is the name of the multipartrequest-part containing the data for that part.

The server will return the XML description of the newly created document as result. This XML will, among other things, have the id attribute completed with the ID of the new document.

Example scenario: creating a new document

This example illustrates how to create a new document in the repository