Website Downloads Documentation Knowledgebase Wiki Issue tracker Commercial support

Documents

Introduction

The purpose of the Daisy Repository Server is managing documents. This document will describe the structure (or features) of such documents.

The diagram below gives an overview of the document structure, this will be further explained in the remainder of this document.

Document Structure

Documents & Document Variants

A document in itself has very little properties, the real meat is in the document variants. A document never exists without at least one document variant. On the other hand, making explicit use of variants is optional, in which case you could consider a document and a document variant to be the same (thus, each document then has exactly one variant).

The details of working with variants are described in another section. For now, it suffices to know that in a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given (Daisy Wiki: configured per site), and you'll only work with document IDs, so it is as if the existence of variants is transparent.

Many times when we speak about a document in Daisy, we implicitly mean "a certain variant of a document" (a "document variant").

Refer to the diagram above to see if a certain aspect applies to a document, a document variant, or a version of a document variant.

A document is always retrieved from the repository in combination with a document variant, a document in itself without a variant cannot be retrieved. This is, among other things, because the access rights to a document are based on information that is part of the document variant (it can thus be that a user has access to one document variant but not to another). Another way to look at this is that there are only document variants, and that certain properties of them are shared across the variants.

Document Types, The Repository Schema

The main "data" of a document is contained in its so-called parts and fields. Parts can contain arbitrary binary data, and fields contain 'simple' information of a certain type (string, date, decimal, ...). Which parts and fields a document can have is determined by the document's document type. A document type is actually a combination of zero or more part types and zero or more field types, which further describe these aspects. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

Repository Schema Structure

Common aspects of document, part and field types

Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.

Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for "fr-BE " would mean it is in French, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localisation, you can simply enter them for the empty locale.

Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.

Document types

A document type combines a number of part types and field types, and indicates for each of these if it is required or not.

Part types

Before going into the details of part types, it might make sense to justify their existence. In many document repository systems, each document has simply one 'content chunk'. For example, a resource addressable over webdav is one atomic piece of data. Daisy allows a document to consist of multiple parts. This makes these parts separately addressable and retrievable. For example, suppose we have a document type consisting of a part "Abstract" and a part "Main Content". It is then simple to retrieve the abstracts of all documents conforming to this document type. As another example, for an "Image" document type we could have parts "ImageData" containing a rendered form of the image, "ImageSource" containing the original source (e.g. a Photoshop or CorelDraw file), and "Thumbnail" containing a small rendition of the image.

A part instance consists of some binary data (or if you wish, data which is treated as binary, it could be plain text of course), and the mime-type of the data. A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.

The Daisy HTML flag

A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limited to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.

Link extraction

For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:

daisy:<document id>
or
daisy:<document id>@<branch id or name>:<language id or name>:<version id>#fragment_id

Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST" (case insensitive). A link without a version specification denotes a link to the live version of the document. The branch, language and version and fragment ID parts are all optional. For example, daisy:15@:nl is a link to the Dutch version of document 15.

Field types

Value Type

The most important thing a field type tells about a field is its Value Type. A Value Type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.

Value Type Name

Corresponding Java class

string

java.lang.String

date

java.util.Date

datetime

java.util.Date

long

java.lang.Long

double

java.lang.Double

decimal

java.math.BigDecimal

boolean

java.lang.Boolean

link

org.outerj.daisy.repository.VariantKey

The link type is somewhat special: it defines a link to another document variant. Its value is thus a triple (document ID, branch ID, language ID). The branch ID and language ID are optional (value -1 in the VariantKey object) to denote they should default to the same as the containing document.

Multi-Value

A field type can specify that it concerns a multi-value field, thus that fields of that type can have multiple values. All of the values of the field should be of the same value type.

A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.

Selection Lists

It is possible to define a selection list for a field type. This is a list of possible values that an end user can choose from when completing the field. There are multiple available selection lists types:

  • static selection list: manual enumeration of the selection list items. For each list item, you can specify the value, and optionally a label which will be shown to the user instead of the value. If desired, the label can be shown for different locales.
  • query-based selection list: performs a query, typically selecting the value of some field, and takes the set of distinct values selected by the query as the content of the selection list.
  • query-based selection list for link-type fields: similar to the query-based selection list, but since a link-type field points to some document, and a query returns a set of documents, it is not necessary to select a specific value of which the distinct set is taken. Rather the documents returned from the query are the content of the selection list.

ACL allowed flag

In the access control system, it is possible to define access rules for documents by using an expression to select the documents to which the access rules apply. In these expressions, it is also possible to check the value of fields, but only of fields whose field types' ACL allowed flag is set to true. The ACL allowed flag also enables the front-end to indicate that changing the value of that particular field can influence the access control checks.

Size hint

A field can have a size hint, this is simply an integer number. This information is used by the front end to display an input field of an appropriate width. The repository server doesn't associate any further meaning to it, it doesn't cause any validation to happen, nor does it specify the unit of the width (most likely to be "number of characters").

Document and document type association, how changes to document types are handled

Upon creation of a document, a document type must be supplied. When saving a document, the repository will check that the document conforms to its document type. Thus it will check that all required fields and parts are present, and that there are no parts and fields in the document that are not allowed by the document type.

The document type of a document can be changed at any time. This is useful if you start out with a generic document type but later want to switch to a more specialized document type.

The definition of a document type can be changed at any time. Part and field types can be added or removed from it, or can be made required. A logical question that pops up is what happens to existing documents in the repository that use the changed document type. The answer is basically "nothing". If for example a required field is added to a document type, then the next time a document of that type is edited, it will fail to save unless a value for the field is specified. The newly saved version of the document will then conform to the new state of the document type. Older versions of the document will remain unchanged however. When saving a document, it is also possible to supply an option that tells not to do the document type conformance check.

So basically the document type system doesn't give any guarantees about the structure of the documents in the repository, but rather hints at how the documents should be structured and interpreted.

Documents

A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards. In other words, it provides a history of who made what changes at what time.

When a document is saved for the first time, it is assigned a unique, numeric ID. The ID is just a sequence counter, so the first created document gets ID 1, then 2, and so on. The ID of a document never changes. The user who creates the document is the owner of the document. The date and time of document creation is also stored.

When creating a document, its document type must be specified. The document type can afterwards be changed.

Daisy has no directories like a filesystem. Everything is just in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching, or browsing through navigation trees.

Versioned Content

The versioned content of a document consists of the following:

  • the document name
  • the parts
  • the fields
  • the links

So if any changes are made to any of these, and the document is stored, a new version is created.

Version ID

Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.

Document Name

The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.

The name is usually also rendered as the title of the document.

Parts

The parts. Each part is associated with a part type and has a mime type and some data. There cannot be two parts of the same part type in one document.

Each part can optionally have a file name, this file name can be used as default file name when the content of the part is saved (downloaded) in a file.

Fields

The fields. Each field is associated with a field type and specifies the field value. There cannot be two fields of the same field type in one document.

Links

A document can contain two kinds of links: links can occur as content of a part (for example, an <a> element in HTML), and a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.

Out-of-line links are useful in case you want to link to related documents and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.

Version state & the live version

Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying.

Non-versioned properties

Collections and collection membership

Collections are sets of documents. A document can belong to one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way.

Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.

Custom fields

Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the earlier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.

Private

A document marked as private can only be read (and written) by its owner.

While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.

There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.

Retired

If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.

The retired flag can be set on or off at any time, retiring is not a one-time operation.

Lock

A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.

Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.

A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock then isn't really a lock, it is just an informational mechanism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.

A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.

For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them when the user keeps editing.

A lock can be removed either by the person who created it, or by an Administrator.

Owner

The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.

Last Modified and Last Modifier

Each time a document is saved, the user id of the person who saved it is stored as the last modifier, and the date and time of the save operation as the "last modified" time. Each document variant also has their own Last Modified and Last Modifier information. For document variants, this will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.

Comments (0)
Advertisement

Daisy hosting, installation, support. Workshops and turnkey Daisy CMS projects. Get Daisy from its creators.

outerthought.org

Downloads provided by

SourceForge.net Logo

Open source stats