4 Repository server
4.1 General
4.1.1 Documents
4.1.1.1 Introduction
The purpose of the Daisy Repository Server is managing documents. This document will describe the structure (or features) of such documents.
The diagram below gives an overview of the document structure, this will be further explained in the remainder of this document.

4.1.1.2 Documents & Document Variants
A document in itself has very little properties, the real meat is in the document variants. A document never exists without at least one document variant. On the other hand, making explicit use of variants is optional, in which case you could consider a document and a document variant to be the same (thus, each document then has exactly one variant).
The details of working with variants are described in another section. For now, it suffices to know that in a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given (Daisy Wiki: configured per site), and you'll only work with document IDs, so it is as if the existence of variants is transparent.
Many times when we speak about a document in Daisy, we implicitely
mean "a certain variant of a document" (a "document variant").
Refer to the diagram above to see if a certain aspect applies to a document, a
document variant, or a version of a document variant.
A document is always retrieved from the repository in combination with a document variant, a document in itself without a variant cannot be retrieved. This is, among other things, because the access rights to a document are based on information that is part of the document variant (it can thus be that a user has access to one document variant but not to another). Another way to look at this is that there are only document variants, and that certain properties of them are shared across the variants.
4.1.1.3 Document Types, The Repository Schema
The main "data" of a document is contained in its so-called parts and fields. Parts can contain arbitrary binary data, and fields contain 'simple' information of a certain type (string, date, decimal, ...). Which parts and fields a document can have is determined by the document's document type. A document type is actually a combination of zero or more part types and zero or more field types, which further describe these aspects. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

4.1.1.3.1 Common aspects of document, part and field types
Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.
Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for "fr-BE " would mean it is in french, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localisation, you can simply enter them for the empty locale.
Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.
4.1.1.3.2 Document types
A document type combines a number of part types and field types, and indicates for each of these if it is required or not.
4.1.1.3.3 Part types
Before going into the details of part types, it might make sense to justify their existence. In many document repository systems, each document has simply one 'content chunk'. For example, a resource addressable over webdav is one atomic piece of data. Daisy allows a document to consist of multiple parts. This makes these parts seperately addressable and retrievable. For example, suppose we have a document type consisting of a part "Abstract" and a part "Main Content". It is then simple to retrieve the abstracts of all documents conforming to this document type. As another example, for an "Image" document type we could have parts "ImageData" containing a rendered form of the image, "ImageSource" containing the original source (eg a Photoshop or CorelDraw file), and "Thumbnail" containing a small rendition of the image.
A part instance consists of some binary data (or if you wish, data which is treated as binary, it could be plain text of course), and the mime-type of the data. A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.
4.1.1.3.3.1 The Daisy HTML flag
A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limitted to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.
4.1.1.3.3.2 Link extraction
For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:
daisy:<document id> or daisy:<document id>@<branch id or name>:<language id or name>:<version id>
Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST". A link without a version specification denotes a link to the live version of the document. The branch, language and version parts are all optional. For example, daisy:15@:nl is a link to the dutch version of document 15.
4.1.1.3.4 Field types
4.1.1.3.4.1 Value Type
The most important thing a field type tells about a field is its Value Type. A Value Type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.
|
Value Type Name |
Corresponding Java class |
|---|---|
|
string |
java.lang.String |
|
date |
java.util.Date |
|
datetime |
java.util.Date |
|
long |
java.lang.Long |
|
double |
java.lang.Double |
|
decimal |
java.math.BigDecimal |
|
boolean |
java.lang.Boolean |
4.1.1.3.4.2 Multi-Value
A field type can specify that it concerns a multi-value field, thus that fields of that type can have multiple values. All of the values of the field should be of the same value type.
A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.
4.1.1.3.4.3 Selection Lists
It is possible to define a selection list for a field type. This is a list of possible values that an enduser can choose from when completing the field.
4.1.1.3.4.4 ACL allowed flag
In the access control system, it is possible to define access rules for documents by using an expression to select the documents to which the access rules apply. In these expressions, it is also possible to check the value of fields, but only of fields whose field types' ACL allowed flag is set to true. The ACL allowed flag also enables the front-end to indicate that changing the value of that particular field can influence the access control checks.
4.1.1.3.4.5 Size hint
A field can have have a size hint, this is simply an integer number. This information is used by the front end to display an input field of an appropriate width. The repository server doesn't associate any further meaning to it, it doesn't cause any validation to happen, nor does it specify the unit of the width (most likely to be "number of characters").
4.1.1.3.5 Document and document type association, how changes to document types are handled
Upon creation of a document, a document type must be supplied. When saving a document, the repository will check that the document conforms to its document type. Thus it will check that all required fields and parts are present, and that there are no parts and fields in the document that are not allowed by the document type.
The document type of a document can be changed at any time. This is useful if you start out with a generic document type but later want to switch to a more specialized document type.
The definition of a document type can be changed at any time. Part and field types can be added or removed from it, or can be made required. A logical question that pops up is what happens to exisiting documents in the repository that use the changed document type. The answer is basically "nothing". If for example a required field is added to a document type, then the next time a document of that type is edited, it will fail to save unless a value for the field is specified. The newly saved version of the document will then conform to the new state of the document type. Older versions of the document will remain unchanged however. When saving a document, it is also possible to supply an option that tells not to do the document type conformance check.
So basically the document type system doesn't give any guarantees about the structure of the documents in the repository, but rather hints at how the documents should be structured and interpreted.
4.1.1.4 Documents
A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards. In other words, it provides a history of who made what changes at what time.
When a document is saved for the first time, it is assigned a unique, numeric ID. The ID is just a sequence counter, so the first created document gets ID 1, then 2, and so on. The ID of a document never changes. The user who creates the document is the owner of the document. The date and time of document creation is also stored.
When creating a document, its document type must be specified. The document type can afterwards be changed.
Daisy has no directories like a filesystem. Everything is just in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching, or browsing through navigation trees.
4.1.1.4.1 Versioned Content
The versioned content of a document consists of the following:
- the document name
- the parts
- the fields
- the links
So if any changes are made to any of these, and the document is stored, a new version is created.
4.1.1.4.1.1 Version ID
Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.
4.1.1.4.1.2 Document Name
The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.
The name is usually also rendered as the title of the document.
4.1.1.4.1.3 Parts
The parts. Each part is associated with a part type and has a mime type and some data. There cannot be two parts of the same part type in one document.
Each part can optionally have a file name, this file name can be used as default file name when the content of the part is saved (downloaded) in a file.
4.1.1.4.1.4 Fields
The fields. Each field is associated with a field type and specifies the field value. There cannot be two fields of the same field type in one document.
4.1.1.4.1.5 Links
A document can contain two kinds of links: links can occur as content of a part (for example, an <a> element in HTML), and a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.
Out-of-line links are useful in case you want to link to related documents and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.
4.1.1.4.1.6 Version state & the live version
Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying.
4.1.1.4.2 Non-versioned properties
4.1.1.4.2.1 Collections and collection membership
Collections are sets of documents. A document can belong to one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way.
Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.
4.1.1.4.2.2 Custom fields
Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the ealier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.
4.1.1.4.2.3 Private
A document marked as private can only be read (and written) by its owner.
While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.
There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.
4.1.1.4.2.4 Retired
If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.
The retired flag can be set on or off at any time, retiring is not a one-time operation.
4.1.1.4.2.5 Lock
A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.
Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.
A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock then isn't really a lock, it is just an informational mechansism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.
A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.
For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them when the user keeps editing.
A lock can be removed either by the person who created it, or by an Administrator.
4.1.1.4.2.6 Owner
The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.
4.1.1.4.2.7 Last Modified and Last Modifier
Each time a document is saved, the user id of the person who saved it is stored as the last modifier, and the date and time of the save operation as the "last modified" time. Each document variant also has their own Last Modified and Last Modifier information. For document variants, this will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.
Previous