<LML /> 2.0 - Archive Markup Module

specification

Created: 2015-05 2017-09-09

Author: Dr. O. Hoffmann (German web-page)

This module belongs to the Literature Markup Language Version 2.0

Element-Index

Complete Element and Attribute Index

Archive Structure and Structure of Document Collections

The main idea is to provide semantic information about the structure of an archive or work. This contains only structure information and references to content documents in literature documents and maybe other formats like raster images or SVG.
On the one hand this should be simpler than the related files in EPUB, but the semantic information more detailed and sophisticated.

Often literature is not only one document maybe embedding some more graphics. Media like digital books typically consist of more than one document. Therefore there is a need to describe the relations of the contained documents and the structure of such a document relation. A document collection here means a local group of documents of finite and known number, which can be put in an archive of type tar or zip or within a subfilesystem. The contained documents may reference other content outside of this group as well, but such external resources are not considered as part of the collection. Therefore within such a group or archive, there are only relative path references to other files of the collection or absolute path references to external content, not part of the collection.
Typically the content of media like digital books is split into different documents, containing fractions of content with a separate function and semantic meaning. Only this needs to have a specific markup. There is no urgent need to describe the fragments of documents by referencing fragment identifiers with the syntax defined in this section. However, another purpose of this approach is to construct a table of content or a navigation structure for the complete archive with direct access to all major structures. This is possible with additional fragment identifiers for navigation in larger documents within the archive as well, not only complete documents.
But it is expected, that there is already sufficient markup to indicate semantics within a single document. This element collection is only intended to indicate the meaning and relation of each document of such a book like collection or archive and to provide a navigation structure.
LML in version 1.0 has already the capability for this within the meta element. The alternative approach here follows more the tradition to provide navigation directly as content information, not as separated meta information.

To ensure automatic access to an archive and automatic detection of such an archive type, the following name structure should be used: *.lmla.?
? represents the typical archive indication like 'zip' or 'tar' or 'tgz' or 'tar.gz'
* represents the individual name part of the archive, such as 'DrOHoffmann-MyBookAboutEverything'
Examples: 'DrOHoffmann-MyBookAboutEverything.lmla.zip', 'MyBook.lmla.tar', 'AboutEverything-Hoffmann.lmla.tgz'
To indicate a subfilesystem to be of the such a type, the name of the parent directory should have the following structure: *.lmla
* represents the individual name part of the archive, such as 'DrOHoffmann-MyBookAboutEverything'
Example: 'DrOHoffmann-MyBookAboutEverything.lmla

Within an archive it is recommended to name the root document 'archive.xml' (or 'archive.lml' if there is a media type available for LML related to such a file extension). To ensure automatic access to the content of an archive the core archive document must have this name. Such a root document has another functionality than normal literature documents of LML. Instead to markup the content of a work directly, this type of document provides mainly directly displayed meta information about the collection or archive. Therefore it has another root element and another content model than normal documents with literature as root element. Within normal literature root element one can provide information about other documents as well, but the approach for archives and collections is especially focused on this issue to describe this local archive as an independent solitary digital object. The root element for such a description of the structure of a collection is the element archive.
Elements belonging to the content model of the archive collection belong to the type 'Archive'. Some of the other elements, not belonging to the type 'Archive' can appear as well, especially the element meta is available as well for the normal collection of elements, therefore it's content is normal content as well, no special archive collection content.

The root element archive and subelements have a content model, described here in short notation with the following symbols and meaning:

a
there has to be exactly one expression a
a?
there has to be none or exactly one a
a+
there has to be one or more a
a*
a is optional, this means as many a as you like
a, b
sequential arrangement of elements, first a than b
a|b
either a or b
a, (b|c)
parenthesis indicate a group, first evaluate the result within parenthesis, than go on with the expression around it, here this means after a there has to be either b or c.

archive

Description

Type: Archive

Root element for the archive collection, describing the structure of a document collection.

Content Model

title?, meta, manifest?, frontmatter?, (alternative | collection | series | book | anthology | encyclopaedia | dictionary | article), backmatter?

Attributes

It is recommended to set the attribute xml:lang on the root element, if the complete document is in one language, respectively to set it properly on subelements, if more than one language is used.

As for the other root element literature is is recommended to use the attribute version. Because archive is not available in version 1.0, the value is "2.0" for version 2.0.

media

Description

Type: Attribute

The value of the attribute media is the default media type for referenced documents. The attribute can be used with the same meaning for all elements of the Archive type. If an element like item or start directly references a document, this applies directly, else the default media type inherits to the children. As long as there is no registered media type for LML "application/x-lml+xml" is used.

numbering

Description

Type: Attribute

Indicate the default numbering method for titles within derived navigation list items.
Navigation list items are all subelements of a child element of an archive except meta and manifest.
Value 'no' (default) - no automatic numbering, if list items are not empty.
Value 'nt' numbering, first number than text from the content of the list item, typically useful, if the content of the ah provides information about the content of the structure for example 'Writing Novels', this results in something like '8.2. Writing Novels'
Value 'tn' numbering, first text from the content of the list item than number, typically useful, if the content of the list item provides only structural information like 'Chapter' or 'Section', this results for example in 'Chapter 4.' or 'Section 7.3.'
If a list item is empty, numbering is always applied and the only heading information.

The numbering has to take into account nesting automatically, this means, that numbering is done automatically according to the order of subelements and nesting. For example a capitulum is number '4.' and a children section is number '4.3' a children subsection can be '4.3.7' and so on.

The numbering method inherits to the children. The attribute can be used with the same meaning for all elements of the Archive type except manifest and children.

manifest

Description

Type: Archive

Within the manifest one can provide a complete list of documents belonging to the archive. This can help to test, whether all content is available or not or if there is more content in the archive as mentioned in the manifest. This element is optional in archive, if not provided, creators of the archive should take care, that especially nothing is missing and all documents in the archive are available to the audience somehow. There is no need to mention the archive document itself, the manifest is noted in.
manifest is no table of content and the order of referenced documents in it is arbitrary. This is not meaningful as a table of content, but might be used by viewers to indicate on demand, which documents the archive contains, it can be used as well to check completeness of the archive.
Comparable to the element no it is not directly presented within the normal flow.

Content Model

item+

Relations

item

Description

Type: Archive

item references an document within the manifest. Each item needs an identifier noted in the attribute xml:id, if this item is referenced by other content within the document.
The attribute link is required. The attribute media is recommended, if a media type is available for the referenced document.

Content Model

Empty (or alternative approach, see below)

Description

Type: Attribute

The value of the attribute link is a relative path to a content document of the archive collection.

Alternative approach 1: Attributes for simple links from XLink can be used as well, especially the attribute href.

Alternative approach 2: A content of an item element can be an element a from XHTML as well.

Relations

collection

Description

Type: Archive

Indicates, that the archive is a general collection of different works without explicitly indicated relations.

Content Model

(alternative | book | work | module | part | article)+

series

Description

Type: Archive

Indicates, that the archive contains a series of different works with some order and relation to each other.

Content Model

(alternative | book | work | part | volume | module| article)+

anthology

Description

Type: Archive

Indicates, that the archive contains an anthology of different works with some order and relation to each other. The works have a common theme.

Content Model

alternative | (( work | article), (work | article)+)

An anthology contains more than one work or article, therefore there have to be at least two of them as content.

volume

Description

Type: Archive

A series often has different volumes. The element volume provides such a separation mechanism.

Content Model

ah?, (alternative | book |article | work | part | division | capitulum | section | start | guide | fragment )+

module

Description

Type: Archive

A work sometimes is split in modules with related, but different themes. The element module provides such a separation mechanism.

Content Model

ah?, (alternative | article | work | part | division | capitulum | section | start | guide | fragment )+

part

Description

Type: Archive

Sometimes book like objects have more than one part to separate different issues or related, but different themes. The element part provides such a separation mechanism.

Content Model

ah?, (alternative | article | work | division | capitulum | section | start | guide | fragment )+

work

Description

Type: Archive

Indicates an independent solitary digital work within a collection, series, anthology or book. Typically such collections contain different works from different authors. Therefore one can separate such works clearly from each other.

Content Model

meta?, ah?, (alternative | division | capitulum | section | start | guide | fragment )+

book

Description

Type: Archive

Indicates one complete digital book as content. If this does appear directly as a children of the archive, there is no other content expected in the archive. In such a case, the subelement meta still is allowed, but it is recommended to use the meta element of the root element itself. The same applies for the subelements ah, frontmatter and backmatter.

Content Model

meta?, ah?, frontmatter?, (alternative | part | module | work | article | division | capitulum | section | start | guide | fragment )+, backmatter?

article

Description

Type: Archive

article is typically an independent solitary digital work with simpler substructure than a book. If this does appear directly as a children of the archive, there is no other content expected in the archive. In such a case, the subelement meta still is allowed, but it is recommended to use the meta element of the root element itself. The same applies for the subelements ah, frontmatter and backmatter.

Content Model

meta?, ah?, frontmatter?, (alternative | capitulum | section | start | guide | fragment )+, backmatter?

encyclopaedia

Description

Type: Archive

encyclopaedia is for encyclopaedic content. Typically such content consists of smaller articles without a specific order. This is non linear content with only minimalistic further structure, however, it is not really important, that the content is non linear, this means, at least in classical books this is somehow ordered, for example in alphabetical order.

Content Model

(alternative | part | article | division | start | guide | fragment )+

dictionary

Description

Type: Archive

dictionary is for dictionary content, for example to translate words from one language to another. Typically such content consists of smaller articles without a specific order. This is non linear content with only minimalistic further structure, however, it is not really important, that the content is non linear, this means, at least in classical books this is somehow ordered, for example in alphabetical order.

Content Model

(alternative | part | article | division | start | guide | fragment )+

division

Description

Type: Archive

A major structural division of a work, if one level of abstraction above capitulum is required.

Content Model

ah?, (alternative |capitulum | section | start | guide | fragment)+

capitulum

Description

Type: Archive

Indicates a chapter.

Content Model

ah?, (alternative | section | start | guide | fragment)+

section

Description

Type: Archive

Indicates a section.

Content Model

ah?, (alternative | subsection | start | guide | fragment)+

subsection

Description

Type: Archive

Indicates a subsection.

Content Model

ah?, (alternative | subsection | start | guide | fragment)+

fragment

Description

Type: Archive

fragments provide no information with of semantic relevant. They should be only used, if no other fragmentation according to semantic structures like chapters or sections is available. Typically one will not use single sentences or paragraphs as structures on a document level. But if a work has no further substructure, but nevertheless has content of several hundreds of megabyte, one may want to split this for technical reasons in two or more documents, but this splitting has not meaning for the content itself.
To indicate such a technical splitting, fragment can be used. It should not be used to reflect arbitrary splitting to pages for printed books however, this is for digital works no useful splitting at all. If such a fragment is done, it should be avoided to split within available structures like paragraphs or sentences, every fragment should contain only complete substructures like paragraphs or sentences. Because there is no more larger structure, typically elements within such a fragment document are on the same abstraction level, for there are only paragraphs as children of the element literature.

Content Model

ah?, (alternative | start | guide | fragment)+

ah

Description

Type: Archive (Block)

This represents a heading or title of a substructure like a capitulum, section, subsection or corresponding structures referencing content documents. These elements can be used to generate a table of content or a structure model of the archive.
Such a title can be narrative, but it can be a numbered order as well. For automatic numbering according to the nesting, one can add the attribute n. If only this numbering is required, the element can be empty as well.

Content Model

Interpreted Text only.

guide

Description

Type: Archive

guide provides a list of starting points into a work or a part of it. This is especially useful for non linear works with an own navigation mechanism. This is useful as well for a simple selection between alternatives. The list items are alternative starting points to start reading without any preference by the order of the items. An advanced model for alternatives is provided with the element alternative.

Content Model

ah?, start+

Relations

start

Description

Type: Archive

start indicates an item to start reading. Non linear works may have only one start and an internal navigation. For linear works it is expected, that every content document is referenced with an element start. It is possible as well to reference fragments of documents to get them in the navigation structure.
The content of the element is a heading, with the same meaning as ah.

Another use case for start is, if there is some content before other substructures, for example capitulum content before the first section. In such a case one may have the first content of the capitulum in one document, but the sections in other documents. In this case start is useful to reference the beginning of the capitulum with the start.
For a similar reason one may have some content of the structure after the last substructure, for example within a capitulum after the last section within another element. For this one can use the start as well.
Even more exotic will be to provide such additional content between substructure, as for example text on the capitulum level between to section.
It is not discussed here, whether it is useful or not to have such a content on the same level as present substructures or not, but if required on can arrange it with this element as well.

Content Model

Interpreted text or inline elements to note a heading for the referenced structure. If empty, the parent elements has to provide text or numbering with an element ah.

Description

Type: Attribute

The value of the attribute link can either reference the fragment identifier of the item referencing the document with the chapter, if a manifest is provided or it can reference with a relative path a content document of the archive collection with the chapter directly.

Alternative approach 1: Attributes for simple links from XLink can be used as well, especially the attribute href.

Alternative approach 2: A content of a start element can be an element a from XHTML as well.

Both alternative approaches can provide navigation and hyperlinking functionality, even if LML itself is not interpreted, but XLink, respectively XHTML is. With additional styling with CSS, this can be already sufficient to provide a navigation document for the archive content.

Relations

frontmatter

Description

Type: Archive

frontmatter is related to content or meta information, in classical printed books typically provided before the normal content of the book. But this can be the preface or foreword of a book as well or a dedication and so on.

Content Model

ah?, (alternative | matter)+

backmatter

Description

Type: Archive

backmatter is related to content or meta information, in classical printed books typically provided after the normal content of the book. But this can be the appendix or afterword of a book as well or a registry and so on.

Content Model

ah?, (alternative | matter)+

matter

Description

Type: Archive

matter references content for frontmatter, respectively backmatter.

Content Model

ah, (alternative | section | start | guide | fragment)+

type

Description

Type: Attribute

The value of the attribute type indicates the type of matter. Usually each type appears only once within one work. Due to the long history and different tradition of written word, several of the values cover similar or the same purposes, therefore often only a matter of taste how to call it, therefore here only attribute vales, no extra elements for frontmatter and backmatter

Only for frontmatter it is one of the following strings:

front
intended as the front(-cover), title-page of the work (can be both an image or a text document)
incipit
'beginning of a work', initial statement, summary
frontispiece
frontispiece or facsimile - some initial graphical representation, decorative illustration
logo, illustration representing the work, publishers, authors etc
introduction
introduction, purpose and goals of a work
preamble
preamble, introductory and expressionary statement in a document about purpose and underlying philosophy of a work
prolog
prolog(ue), background of a story, content of a work, not part of a work itself
preface
preface, explains motivation, reason to write the work, origins of the idea
foreword
foreword, explains for example different editions or changes in a work, in general comments about a work
dedication
dedication, usually by the author(s)
epigraph
a specific type of quotation, phrase, poem or inscription at the beginning of a work
abstract
abstract

Only for backmatter it is one of the following strings:

back
intended as the back(-cover) of the work (can be both an image or a text document)
postscriptum
postscriptum, some additional statement, often used in letters
afterword
afterword, explains for example different editions or changes in the work, in general comments about the work
epilog
epilog(ue), closing statement about what happened after the time range detailed in the book
bibliography
bibliography, citation of other works, used to realise this work or are assumed to be relevant else for the theme of the work
glossary
glossary, set of definitions of words of importance to a work
appendix
appendix, addendum, supplemental material
registry
registry of items discussed in a work
listoftables
listing of tables in a work
listoffigures
listing of figures in a work
colophon
colophon, metadata about work and author, production notes at the end of a book
extroduction
extro, opposite of introduction, final statement
explicit
'end of a work', final statement, colophon

Possible for both frontmatter and backmatter:

tableofcontents
list/table of contents
index
index of words or notions explained in the book
impressum
imprint, contact information to the persons or organisations responsible for the work
inscription
inscription, a specific type of quotation, phrase, poem
metadata
metadata, colophon about the work
license
license conditions and copyrights
sources
source references of a work or parts of it, for example authors of supplemental content like graphics, maybe including license conditions for such supplemental content as well
figure
figures supplemental for the complete work including tipped-in pages, maps etc
table
tables supplemental for the complete work
syntax
specification for specific syntax used in the work
notation
specification for specific notation used in the work
blurb
blurb, teaser, jacket text for a work
biography
biographic data about authors or in general contributors of the work
acknowledgment
acknowledgments for contribution to the work
errata
errata, corrigenda, list of known corrections for a work
notice
notice, note
annotation
marginalia, notices, comments about a work or parts of it, typically referencing the discussed fragments explicitly, here collected in a separate container, not within the content of the work (in digital works instead of footnotes)
affiliation
a person's present or past affiliation with some organisation, for example an employer or sponsor
[*]
own safe CURIE *
(*)
own string * (only in case of need and no CURIE can be defined, cannot have a defined semantic meaning)

alternative

Description

Type: Archive

Non linear works or those with different versions or reading, different languages need a switch between different alternatives or renditions. This element allows such a switch between the different renditions. The reason for switching is provided by the attribute alt. The content is intended as a list of alternatives in arbitrary order to select one item of the provided options, the opposite to an ordered list for reading it all.
Often there is only one switching with alternative per parent structure meaningful, but more than one is typically not excluded.

Content Model

meta?, ah?, rendition, rendition+

alt

Description

Type: Attribute

The value of the attribute alt is a string for the switching condition.
Available values are:

lang
language, use the attribute xml:lang on the children elements rendition to switch. A viewer may have an option to allow automatic selection due to the preferred language of the user, default behaviour is manual selection however. This applies as well, if no language alternative fits to the preferences.
media
core media selection, use the attribute media on the children elements rendition to switch. A viewer may have an option to allow automatic selection due to the preferred core media of the user or due to interpreted media by the viewer, default behaviour is manual selection however. This applies as well, if no media alternative fits to the preferences.
meta
the switching condition is detailed in the elements meta, in this case both required for the alternative and the rendition children elements
heading
the switching condition is detailed in the elements ah, in this case both required for the alternative and the rendition children elements
Attribute not noted, empty value or unknown value
the switch aligns to the concept of guide, this means ah or meta of the rendition or its content has to be explored, to get some information about the alternative.

rendition

Description

Type: Archive

An alternative rendition of a work or fragment.

Content Model

meta?, ah?, #

# The content is a list of elements of the same type, whatever the parent element of the parent alternative allows for itself, except the already enabled meta and ah. This means, there has to be not more than one meta and ah per rendition.

Archive examples