Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

EPUB (Electronic Publication) Version 3 Preservation. ISO/IEC TS 22424:2020

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name EPUB (Electronic Publication) Version 3 Preservation. ISO/IEC TS 22424:2020, Digital publishing -- EPUB 3 Preservation, Parts 1-2
Description

ISO/IEC TS 22424:2020, for EPUB3 Preservation, was published as two Technical Specifications in January 2020, titled "Principles" and "Metadata Requirements." The purpose of the specifications is to make it easier for producers and OAIS archives to preserve access to EPUB documents. ISO/IEC 22424-2:2020 provides a technical basis to meet the principles listed in the first document by specifying metadata required for long-term preservation, and a method for packaging this metadata with the original EPUB container using METS (Metadata Encoding & Transmission Standard) and the PREMIS Data Dictionary for Preservation Metadata.

These documents were prepared by ISO/IEC JTC 1/SC 34/JWG 7 and not by a group that is part of Publishing@W3C. The W3C EPUB 3 Community Group, which is part of Publishing@W3C, published the EPUB 3.2 specification in May 2019. See Notes below for more information on the activities at W3C relevant to EPUB and Disclosure for more information on JWG 7. The introduction to Part 1: Principles states, "The specification at hand covers EPUB 3 versions up to EPUB 3.0.1. EPUB 3.1 was the first major revision of EPUB 3.0.1, but there are no implementations of version 3.1 and therefore it is not covered in this document. The most widely used version of the standard is still 3.0.1. EPUB 3.2, was published in May 2019. Unlike 3.1, it is fully backwards compatible with 3.0.1. It will be covered in the next edition of this document."

The two documents do not specify particular digital formats, but make recommendations for constraints on EPUB 3 files in consideration of the challenges of digital preservation, but they also make recommendations for Submission Information Packages (SIPs, as defined in the OAIS Reference Model) and metadata to be included in the METS file which is recommended as a packaging standard by ISO/IEC TS 22424-1:2020.

ISO/IEC TS 22424-1:2020 (Principles) includes the following requirements (indicated by "SHALL") and recommendations (indicated by "SHOULD") for EPUB files. Notes enclosed in [ ] have been added by the compilers of this resource.

  • Submitted EPUB publications SHALL be conformant with EPUB requirements and conformance SHOULD be validated. [Note: This statement refers to Clause 3: Package Documents from the EPUB Publications 3.0.1 specification.]
    Listed separately are the following requirements and recommendations emphasizing compliance with EPUB specifications:
    • For each rendition of the EPUB content document, there SHALL be a manifest file, which identifies and describes a set of resources that collectively compose a given rendition of a document, and EPUB spine, which provides a default reading order for a given rendition. [Note: Required in the EPUB Publications 3.0.1 specification]
    • The structure of each EPUB ZIP archive SHALL be described using the EPUB container.xml file (which describes the locations of root files of available renditions of the EPUB publication, and the rendition’s package document and navigation document). [Note: Consistent with requirements in the EPUB Open Container Format (OCF) 3.0.1 and the EPUB Publications 3.0.1 specifications both of which govern any EPUB file.]
    • EPUB Package document and navigation document SHALL contain all metadata needed for rendering the publication, including the recommended reading system. [Note: The only obvious addition to the requirements in the EPUB Publications 3.0.1 specification is the need to identify a recommended reading system. This could be done using the <meta> element provided for in the EPUB Publications 3.0.1 specification.]
    • The minimum required descriptive metadata for EPUB publications are title, identifier, and language from the Dublin Core Metadata Element Set. Each rendition of a publication SHOULD also have at least the modified date property from DCMI Metadata Terms. Each rendition SHOULD also have the publication date encoded as DCMI date, if the publication date is required to distinguish between publications. [Note: The recommendations beyond the EPUB requirements relate to identification of intended replacement submissions or different editions.]
  • Requirements and recommendations related to possible external dependencies:
    • Submitted EPUB publications SHALL either contain or at least facilitate access to all the data and metadata required to render the content information successfully. [Note: This is a primary objective of the EPUB family of formats.]
    • Fonts SHALL be embedded into the EPUB publication in full and un-obfuscated, if permitted by the font license.
    • Remotely-hosted resources SHOULD be avoided.
    • Related resources such as audio and video SHOULD be embedded in the EPUB publication.
    • If an EPUB content document in a SIP contains scripting, the EPUB publication SHALL contain a fallback for the content in question. In the EPUB context scripting enables the use of JavaScript applications for e.g. image manipulation or enabling dynamic changes of the content. [Note: TS 22424-1 does not make this statement explicitly, but TS 22424-2 includes it in a summary of requirements from TS 22424-1. 2.4 Scripted Content Documents in the EPUB Content Documents 3.0 specification states that EPUB Content Documents may provide fallbacks for such content.]
    • Descriptive and other metadata SHOULD be embedded in the SIP. [Note: TS 22424:2020 does not require that such metadata be embedded in the EPUB file.]
  • Requirements and recommendations related to identifiers:
    • Identifiers SHALL be used in such a way that the OAIS archive will be able to link all versions of the publication and delete preview versions as appropriate.
    • International standard identifiers, such as ISBNs for books and DOIs for articles, SHALL be used as EPUB unique identifiers whenever possible.
    • It SHOULD be possible to express the identifiers (also) as actionable HTTP URIs. Use of persistent identifiers (Handles, DOIs, URNs, or ARKs) is recommended.
  • Constraints related to technical protection:
    • EPUB publications in SIPs SHOULD NOT be encrypted.
    • DRM protection, if any, SHOULD be removed before the document is submitted.
    • If data is compressed, the user [sic] of the compression method SHALL be specified using the Compression metadata element in the EPUB’s encryption.xml file. [Note: 3.2 ZIP File Requirements in the EPUB Open Container Format (OCF) 3.0.1 specification, requires that files in the ZIP archive be uncompressed or use Deflate compression. The compilers of this resource have not found a Compression element in a schema for the encryption.xml file. Comments welcome.
  • If there is a foreign resource embedded or linked to [in] a submitted EPUB publication, a fallback chain ending in a core media type resource SHOULD be provided even if the foreign resource is in an archivable format. This requirement is stricter than those in the EPUB 3.x specifications, which require a fallback only in certain situations.
  • If there are multiple renditions of a work in an EPUB publication, requirements in the EPUB Multiple-Rendition Publications 1.0 specification SHALL be followed. Each rendition of an EPUB publication in a SIP SHALL have its own identifier.
  • EPUB Fragment Identifiers SHOULD not be used in EPUB publications sent to a repository system. [Note: Fragment identifiers are designed for use in references from outside an EPUB file to permit opening an EPUB Publication at a particular location. This prohibition presumably applies to fragment identifiers within one EPUB Publication to a location in another EPUB Publication.]
  • Distributable objects (as defined in EPUB Distributable Objects 1.0 Draft Specification 23 July 2015) SHALL NOT be submitted individually. They MAY be embedded within EPUB publications. [Note: the draft specification for Distributable Objects defines a refinement to the EPUB 3.0.1 specification, based on a <collection> element in the EPUB Package Document with the role attribute of "distributable-object". An example of how this construct may be used is for an EPUB Package Document that represents an entire book, but includes markup and metadata that permits individual chapters to be extracted and transformed into separate EPUB Package Documents.]

TS 22424-2:2020 (Metadata requirements) provides requirements and recommendations for metadata associated with a SIP. Most of the recommendations are associated with the use of METS. The following requirements and recommendations relate to metadata in EPUB files. Notes enclosed in [ ] have been added by the compilers of this resource. There is some overlap with the list from TS 22424-1.

  • This document does not require copying of EPUB structural metadata to METS documents.
  • A publication identifier [e.g., ISBN] SHALL NOT be used as [a SIP] package identifier. A SIP can contain multiple EPUB publications; one EPUB publication can be submitted in multiple SIPs and even if a SIP contains just one publication it may be necessary to re-send the SIP with other package identifier. [Note: An example suggests that a publication identifier can be concatenated with a date and time of modification as an identifier for the SIP package using the @ sign as a separator. This is consistent with a recommendation in 4.1.2 Release Identifier in the EPUB Publications 3.0.1 specification.]
  • EPUB publications in SIPs SHOULD contain resources in file formats not suitable for preservation if and only if the same resource is also included in an acceptable file format using a fallback mechanism. [Note: TS 22424:2020 assumes that an archive specifies a list of acceptable formats as part of a general submission policy or specific submission agreement.]
  • The internal structure of an EPUB publication SHALL, according to the EPUB standard, be specified in an EPUB Navigation Document in both human and machine readable format. [Note: The specification for an EPUB Navigation Document is in 2.2 EPUB Navigation Documents in the EPUB Content Documents 3.0.1 specification.]
  • Each EPUB publication in a SIP SHALL contain the complete table of contents in the EPUB navigation document, covering all levels of the document hierarchy (see http://www.idpf.org/accessibility/guidelines/content/nav/toc.php). [Note: The cited document has been superseded. The link provided here is to an Internet Archive capture that appears to have the appropriate statement.]
  • The EPUB manifest file manifest.xml SHALL be compliant with the EPUB Open Container Format requirements.
Production phase An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users. It may also be used as a middle-state format for transfer between a publisher and an entity that makes eBooks available to end users.
Relationship to other formats
    Other EPUB_family, Electronic Publication (EPUB) File Format Family. These guidelines for preservation apply primarily to EPUB 3.0 and EPUB 3.0.1, but are also relevant to other versions of the EPUB File Format Family.
    Other EPUB_3_0_1, EPUB, Electronic Publication, Version 3.0.1 (2014). ISO/IEC 23736:2020. Recommends restrictions on EPUB 3.0.1 for using in a submission package to a preservation archive.
    Other EPUB_3_0, EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135:2014. Recommends restrictions on EPUB 3.0 for using in a submission package to a preservation archive.

Local use Explanation of format description terms

LC experience or existing holdings See EPUB_family for experience with EPUB documents.
LC preference See EPUB_family.

Sustainability factors Explanation of format description terms

Disclosure

The EPUB3 Preservation specification has been published as a Technical Specification by ISO/IEC. See Deliverables: The different types of ISO publications, which states, "A Technical Specification addresses work still under technical development, or where it is believed that there will be a future, but not immediate, possibility of agreement on an International Standard."

The specification was developed under the auspices of a special joint working group (ISO/IEC JTC 1/SC 34/JWG 7). JWG 7 spans several ISO and IEC committees: JTC 1/SC 34 (Document description and processing languages), ISO TC 46/SC 4 (Technical interoperability), and IEC/TC 100/TA 10 (Multimedia e-publishing and e-book technologies).

    Documentation The technical specifications are available for purchase from ISO, national standards organizations, and commercial suppliers of standards. See Format Specifications below for links to the ISO catalog.
Adoption ISO/IEC TS 22424:2020 does not define a format, but provides guidelines intended to make it easier for producers and OAIS archives to preserve access to EPUB documents. The compilers of this resource have not attempted to assess the degree to which preservation archives have submission policies that follow these recommendations in general or specifically for content in the EPUB 3 formats. One institution whose practices are cited in the technical specifications is the National Digital Preservation Services of Finland, which has published specifications, including one for Metadata Requirements and Preparing Content for Digital Preservation which is intended to apply to digital content in all acceptable formats, including EPUB. Comments welcome.
    Licensing and patents Not applicable directly. See EPUB_family.
Transparency Not applicable directly. See EPUB_family.
Self-documentation ISO/IEC TS 22424-2:2020 gives requirements and recommendations for metadata of various categories needed in an OAIS SIP: descriptive; structural; administrative; preservation; provenance; rights management. See also EPUB_family.
External dependencies

The EPUB 3.0 and 3.0.1 specifications permit audio and video resources to be stored remotely, but encourage the inclusion of such media files in the EPUB Container. See, for example, 5.3 Publication Resource Locations in the EPUB Publications 3.0 specification. EPUB 3.2, not formally covered by ISO/IEC TS 22424:2020, also allows fonts and resources retrieved by scripts to be located remotely. See 3.2 Resource Locations in the EPUB 3.2 specification.

ISO/IEC TS 22424:2020 makes a stronger recommendation, using the SHOULD term, for including all resources in the EPUB container.

Technical protection considerations An EPUB file that complies with the requirements and recommendations in ISO/IEC TS 22424:2020 will have no form of technical protection except obfuscation for any font with license terms that require it. The decryption process required to use the obfuscated fonts is described in the EPUB Open Container Format (OCF) specifications, e.g. at 4.2 Obfuscation Algorithm.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See EPUB_family.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See related format.  See EPUB_3_0_1

Notes Explanation of format description terms

General

Activities at W3C related to EPUB specifications: W3C has an activity known as Publishing@W3C, which has a number of subgroups. Formed in early, the W3C EPUB 3 Community Group was chartered to carry out "ongoing technical development of EPUB 3 and related extension specifications and ancillary deliverable." This group published the EPUB 3.2 specification as a Final Report in May 2019. A blog post on March 18, 2020, Listen to the People: The Future of EPUB and New Directions for Publishing @ W3C, announced the formation of a new EPUB 3 Working Group to focus on making EPUB 3 a W3C Recommendation by June 2022 and the continuing maintenance of EPUB 3. Also part of Publishing@W3C is a Publishing Business Group, formed to foster ongoing participation by members of the publishing industry and overall publishing ecosystem in the development of the Web for publishing, and serve as a conduit for feedback between the publishing ecosystem and W3C.

History

The creation of the principles in ISO/IEC TS 22424-1:2020 was inspired by the draft common requirements published by the E-ARK Project. See Common Specification for Information Packages (CSIP) v .17 (December 2016). This document has since been updated under the auspices of the Digital Information LifeCycle Interoperability Standards Board. See Common Specification for Information Packages (CSIP).


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/12/2020