4.2 Repository schema
4.2.1 Overview
The repository schema controls the structure of
The repository schema defines part types, field types and document types. A document type is a combination of zero or more part types and zero or more field types. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

4.2.1.1 Common aspects of document, part and field types
Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.
Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for the locale "fr-BE " would mean it is in French, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localization, you can simply enter them for the empty locale.
Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.
4.2.1.2 Document types
A document type combines a number of part types and field types. The association with the part and field types, in the diagram shown as the "Part Type Use" and "Field Type Use", are not stand-alone entities but part of the document type.
The associations have a property to indicate whether or not the parts and fields are required to have a value.
The associations also have a property called 'editable'. This property is a hint towards the document editing GUI that the part or field should not be editable. This is just a GUI hint, not an access control restriction. This can for example be useful if the values of certain fields or parts are assigned by an automated process.
4.2.1.3 Part types
A part type defines a
4.2.1.3.1 Mime-type
A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.
4.2.1.3.2 The Daisy HTML flag
A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limited to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.
4.2.1.3.3 Link extraction
For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:
daisy:<document id> or daisy:<document id>@<branch id or name>:<language id or name>:<version id>#fragment_id
Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST" (case insensitive). A link without a version specification denotes a link to the live version of the document. The branch, language and version and fragment ID parts are all optional. For example, daisy:15@:nl is a link to the Dutch version of document 15.
The repository server also has link extractors for extracting links from
4.2.1.4 Field types
A field type defines a
4.2.1.4.1 Value Type
The most important thing a field type tells about a field is its value type. A value type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.
|
Value type name |
Corresponding Java class |
|---|---|
|
string |
java.lang.String |
|
date |
java.util.Date |
|
datetime |
java.util.Date |
|
long |
java.lang.Long |
|
double |
java.lang.Double |
|
decimal |
java.math.BigDecimal |
|
boolean |
java.lang.Boolean |
|
link |
org.outerj.daisy.repository.VariantKey |
The link type is somewhat special: it defines a link to another document variant. Its value is thus a triple (document ID, branch ID, language ID). The branch ID and language ID are optional (value -1 in the VariantKey object) to denote they should default to the same as the containing document (in other words, the branch and language are relative to the document). The branch and language will usually be unspecified, since this allows copying content between the variants while the links stay relative to the actual variant.
4.2.1.4.2 Multi-value
The multi-value property of a field type indicates whether the fields of that type can have multiple values. All the values of a multi-value field should be of the same value type.
A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.
In the Java API, a multi-value value is represented as an Object[] array, in which the entries are objects of the type corresponding to the field's value type (e.g. an array of String's, or an array of Long's).
4.2.1.4.3 Hierarchical
The hierarchical property of a field type indicates that the value of the fields of that type is a hierarchical path (a path in some hierarchy). A path is often represented as a slash-separated string, e.g. Animals/Four-legged/Dogs.
Hierarchical fields are technically quite similar to multi-value fields, because a hierarchical path is also an ordered set of values. It is however possible for a field type to be both hierarchical and multi-value at the same time.
In the Java API, a hierarchical value is represented by a HierarchyPath object:
org.outerj.daisy.repository.HierarchyPath
A multi-value hierarchical value is an array (Object[]) of HierarchyPath objects.
4.2.1.4.4 Selection Lists
It is possible to define a selection list for a field type. This is a list of possible values that an end user can choose from when completing the field. There are multiple available selection lists types:
-
static selection list: manual enumeration of the selection list items. For each list item, you can specify the value, and optionally a label which will be shown to the user instead of the value. If desired, the label can be shown for different locales. If the static selection list belongs to a hierarchical field type, the static list can be hierarchical (each item can itself contain child items)
-
query-based selection list: performs a query, typically selecting the value of some field, and takes the set of distinct values selected by the query as the content of the selection list.
-
query-based selection list for link-type fields: similar to the query-based selection list, but since a link-type field points to some document, and a query returns a set of documents, it is not necessary to select a specific value of which the distinct set is taken. Rather the documents returned from the query are the content of the selection list.
-
hierarchical childs-linked query selection list: this selection lists works by executing a query for the root values in the selection list, and then creates child items (the hierarchical items) by following specified (multi-value) link fields in the documents returned by the query.
-
hierarchical parent-linked query selection lists: performs a query to retrieve documents and arranges them in a hierarchy based on link-field pointing to the parent of each document. Documents without the parent link-field become the first level in the hierarchy.
The hierarchical selection lists can be used both for hierarchical and non-hierarchical fields. For hierarchical fields, to whole path leading to the selected node is stored, for non-hierarchical fields only the selected node.
4.2.1.4.5 ACL allowed flag
In the
4.2.1.4.6 Size hint
A field can have a size hint, this is simply an integer number. This information is used by the front end to display an input field of an appropriate width. The repository server doesn't associate any further meaning to it, it doesn't cause any validation to happen, nor does it specify the unit of the width (most likely to be "number of characters").
4.2.1.5 Document and document type association, how changes to document types are handled
Upon creation of a document, a document type must be supplied. When saving a document, the repository will check that the document conforms to its document type. Thus it will check that all required fields and parts are present, and that there are no parts and fields in the document that are not allowed by the document type.
The document type of a document can be changed at any time. This is useful if you start out with a generic document type but later want to switch to a more specialized document type.
The definition of a document type can be changed at any time. Part and field types can be added or removed from it, or can be made required. A logical question that pops up is what happens to existing documents in the repository that use the changed document type. The answer is basically "nothing". If for example a required field is added to a document type, then the next time a document of that type is edited, it will fail to save unless a value for the field is specified. The newly saved version of the document will then conform to the new state of the document type. Older versions of the document will remain unchanged however. When saving a document, it is also possible to supply an option that tells not to do the document type conformance check.
So basically the document type system doesn't give any guarantees about the structure of the documents in the repository, but rather hints at how the documents should be structured and interpreted.
See also the FAQ entry How do I change the document type of a set of documents?
Previous