PREMIS (Preservation Metadata, Data Dictionary Maintenance Activity)
Official Web Site  
Disclaimer

Tools for preservation metadata implementation

This document contains information about tools (e.g. software, scripts, stylesheets) which support the implementation of preservation metadata, particularly as defined in the PREMIS data dictionary. Tools may be categorized as doing one or more of the following. Tools listed were not necessarily developed specifically for PREMIS, but may be used for implementation of preservation metadata more generally, and their relationship to PREMIS is stated.

  • Tools for extracting technical metadata from objects
  • Tools for converting extracted metadata into the PREMIS XML schema elements
  • Tools for generating a METS object with appropriate slots for PREMIS metadata (i.e., amdSec with digiProv, techMD, etc.)
  • Tools for converting Jhove output to PREMIS elements
  • Tools for recording events and outcomes (e.g. format validation, fixity check, etc.)

Listings include what the tool does, who developed it, when and for what purpose it was developed. Please send additions with information as in the entries below to the Network Development and MARC Standards Office: ndmso@loc.gov.

Archivematica (Artefactual Systems, Inc.)

Description: Archivematica is a free and open-source (AGPLv3) digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects. Archivematica is packaged with the web-based content management system ICA-AtoM for access to your digital objects.

Archivematica uses a micro-servicesdesign pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Users monitor and control the micro-services via a web-based dashboard. Archivematica uses METS, PREMIS, Dublin Core and other best practice metadata standards. In the Format Policy Registry (FPR), Archivematica implements its default format policies based on an analysis of the significant characteristics of file formats. The FPR also offers an editable, flexible framework for format identification, package extraction and format normalization.

Availability: https://www.archivematica.org/

Tool URL: https://www.archivematica.org/ External Link

Documentation URL: https://www.archivematica.org/ External Link

Licensing: free and open-source (AGPLv3); source code https://github.com/artefactual/archivematica

Last update: March 24, 2014


Archivists' Toolkit (University of California, San Diego, New York University, and the Five Colleges, Inc.)

Description: The Archivists' Toolkit is an open source archival data management system to provide integrated support for accessioning, description, donor tracking, name and subject authority work, and location management for archival materials. It includes integrated support for managing archival materials from acquisition through processing, a customizable interface, ingest of legacy data in multiple formats (e.g. EAD and MARCXML), rapid data entry interface for creating container lists, generation of reports, export of EAD 2002, MARC XML, METS, MODS, and Dublin Core, and support for desktop or networked, single- or multi-repository installations.

Although not directly supporting PREMIS, the Archivists' Toolkit may be used to generate a METS file with descriptive, structural and rights metadata, which can then be enriched with technical metadata upon import into a repository.

Availability: The source code has not yet been made available generally. Contact info@archiviststoolkit for further information.

Documentation URL: http://www.archiviststoolkit.org/ External Link

Last update: July 25, 2007


DAITSS (Florida Center for Library Automation)

Description: DAITSS (Dark Archive in the Sunshine State) software is available under a GPL v 3 license.

DAITSS is a digital preservation repository application developed by the Florida Center for Library Automation (FCLA) with some support from the IMLS. DAITSS is used by the Florida Digital Archive, a long-term preservation repository service provided by FCLA for the use of the libraries of the eleven publicly-funded universities in Florida.

Although DAITSS first went into production in 2005, it was recently re-architected and rewritten to improve ease of implementation and maintenance, scalability, and extensibility.

DAITSS provides automated support for the functions of Submission, Ingest, Archival Storage, Access, Withdrawal, and Repository Management. It is architected as a set of RESTful Web Services and micro-services but enforces strict controls to ensure the integrity and authenticity of archived content. It implements active preservation strategies based on format-specific processing including, where necessary, normalization and forward migration. It is particularly well suited for materials in text, document, image, audio and video formats.

DAITSS was written for a multi-user environment and supports consortial as well as institutional preservation repositories.

For more information, see http://daitss.fcla.edu. This website provides access to source code, an installation manual, and an operations manual. For those who want an easy way to experiment with DAITSS without going through the trouble of a local installation, a fully configured VM version of DAITSS can be downloaded to run in the VMWare Player. A Quick Start Guide to the VM demo provides a brief walk-through of the most commonly performed operator functions.

Tool URL: http://daitss.fcla.edu/ External Link

Documentation URL: http://daitss.fcla.edu/content/documentation External Link

Last update: July 25, 2007


DROID (The National Archives (UK))

Description: DROID (Digital Record Object Identification) is an automatic file format identification tool developed in conjunction with the PRONOM online registry of technical information by the National Archives of the UK. Technical information about the structure of file formats, and the software and hardware environments required to support them is included in PRONOM, which was developed initially as an internal resource for National Archives staff, and subsequently as a public, web-based resource. DROID uses byte signatures stored in PRONOM to identify and report the specific file format versions of digital files. DROID detects the addition of new signatures to the PRONOM database and automatically downloads updates via the Web, ensuring that it is always up-to-date. It is designed for batch processing, and can be used via a GUI or a command line interface, to support integration with other systems. DROID is a standalone, platform-independent Java tool, and is freely available to download from the PRONOM website.

DROID could be used to extract file format information for use in preservation metadata. In the case of PREMIS an XSL transformation (not currently provided by the developer of this tool) could convert the DROID output to PREMIS specific elements (see also below entry for Statistics New Zealand Prototype PREMIS Creation Tool).

Tool URL: http://www.nationalarchives.gov.uk/aboutapps/pronom/ External Link

Documentation URL: http://www.nationalarchives.gov.uk/aboutapps/fileformat/pdf/droid_api_1.rtf External Link

Last update: July 25, 2007


Echodep (University of Illinois Urbana/Champaign)

Description: ECHO DEPository is a digital research/development project at the University of Illinois Urbana-Champaign in partnership with OCLC and funded by Library of Congress under the National Digital Information Infrastructure Preservation Program (NDIIPP). The HandS tool suite is a package comprised of various components that provide open source tools in the context of the Echodep METS profiles.

A Jhove utilities API will run the Jhove utility on an item, which will generate a PREMIS object and, depending on the MIME type, will return file format specific metadata (i.e. MIX for images). This API is extensible so that new "applicators" can be written to support other technical metadata easily. The HaSMETSProfile class generates a METS object with appropriate slots for PREMIS metadata, designed to work with a METS file as described in the registered Profile; it does validation against the profile as well. A tool for recording events and outcomes (e.g. format validation, fixity check, etc.) is built in to the HaSMETSProfile class for embedding these outcomes; the events themselves are initiated in various routines (workflow, validation, packaging, etc.).

Tool URL: http://sourceforge.net/projects/echodep/ External Link

Documentation URL: http://dli.grainger.uiuc.edu/echodep/HnS/JavaDocs/ External Link

Last update: July 25, 2007


IngestList (Landesarchiv Baden-Württemberg, Germany)

Description: IngestList facilitates transfers of digital records to an archive in a secure and trustworthy way. Before transfer, significant properties of the original objects will be detected and recorded. IngestList can later re-detect these properties and compare them with the recorded values. Every step of this process will be recorded and controlled in an intermediate XML format. It was developed primarily for the State Archives of Baden-W rttemberg, but is used in other archives too.

IngestList is suitable for all kinds of digital content, but has special support for database archiving (JDBC interface). It has integrated JHOVE and DROID and, through its GUI, can also be used as a convenient means for detecting significant properties. Even non-experts can configure IngestList for use at their organisation and in their own language!

IngestList is an open source project. Please feel free to enhance or integrate IngestList, especially with community standards like METS or PREMIS.

Tool URL: http://ingestlist.sf.net External Link

Documentation URL: http://sourceforge.net/projects/ingestlist/files/IngestList/2009-10-21/flash External Link

Last update: April 06, 2011


JHOVE (JSTOR/Harvard Object Validation Environment)

Description: JSTOR and the Harvard University Library collaborated on this project to develop an extensible framework for object validation. Representation information (format type) is important to all digital repositories, since ingest, storage, access, and preservation decisions may be made depending upon the format, and it is necessary to automate the process of identifying and validating formats of digital objects. JHOVE performs format-specific identification, validation, and characterization of digital objects. Such actions are performed by modules for various format types and the output from the process is controlled by output handlers, using an extensible plug-in architecture. JHOVE is a format-specific digital object validation application program interface (API) written in Java. It is available for downloading as either a command line interface or a GUI interface.

The output of JHOVE can be configured at the time of its invocation to include whatever specific format modules and output handlers that are desired. Representation information output is in XML and output handlers format the information according to the specification for each module (depending upon format type). For instance, JPEG2000 and TIFF use the NISO Z39.87 (Technical metadata for digital still images) standard.

Although not specifically an implementation of PREMIS, JHOVE is a tool that could be used to automatically generate format information and an XSL transformation could be used to transform the output to PREMIS schema elements (and format-specific metadata specifications). See also below entry for Statistics New Zealand Prototype PREMIS Creation Tool.

Tool URL: http://hul.harvard.edu/jhove/distribution.html External Link

Documentation URL: http://hul.harvard.edu/jhove/documentation.html External Link

Last update: July 25, 2007


JHOVE2

Description: The JHOVE2 project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment. The target of this characterization is not a simple digital file, but a (potentially) complex digital object that may be instantiated in multiple files. Produces output in an \"intermediate\" XML format which can be transformed to MIX, PREMIS and potentially other XML formats. Note: currently in BETA release.

Documentation URL: https://confluence.ucop.edu/display/JHOVE2Info/Home External Link

Last update: August 19, 2010


METS Java Toolkit (Harvard University Library)

Description: This tool uses Java to construct, validate, and process METS objects. It allows for reading in a METS document and using it as a Java object, where it can be modified and the resulting METS written out. The toolkit is a Java binding framework in which each particular schema element of a METS file (e.g. techMD, @LABEL) is represented in memory by an instantiated object where nodes and values can be set and then it can be added to the content of model of its parent. The toolkit supports both local and global validation of METS files.

The METS Java Toolkit is a general METS maker, which could be used to provide a slot for including or referencing PREMIS descriptions. It allows for the inclusion of an MDTYPE attribute with the value "PREMIS". However, it does not fill in the values of the PREMIS elements.

Tool URL: http://hul.harvard.edu/mets/download.html External Link

Documentation URL: http://hul.harvard.edu/mets/doc/ External Link

Last update: July 25, 2007


New Zealand metadata extractor (National Library of New Zealand)

Description: The Metadata Extraction Tool was developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats. It is designed to automatically extract preservation-related metadata from digital files and output that metadata in XML formats for use in preservation activities. It is now available as open source software.

The Metadata Extraction Tool is based on a library of adapters. Each adapter knows how to recognise and extract metadata from a different type of file. Adapters can handle dependencies within and between objects of varying levels of complexity, ranging from single, simple objects like TIFF files through to complex web sites or databases.

Extracting preservation metadata is a two-stage process. In the first phase each incoming file is processed by the adapters until one of them recognises the file type and extracts data from the header fields of the file, generating an internal XML file. In the second phase an XSL transformation converts the internal XML file into an output XML format, currently the NLNZ preservation metadata data model schema. Output using the PREMIS XML schemas is also possible as transformations are developed. See also below entry for Statistics New Zealand Prototype PREMIS Creation Tool.

Tool URL: http://meta-extractor.sourceforge.net/ External Link

Documentation URL: http://meta-extractor.sourceforge.net/documentation.htm External Link

Last update: July 25, 2007


PREMIS in METS Toolbox (Florida Center for Library Automation)

Description: The PREMIS in METS Toolbox is a set of open-source tools developed to support the implementation of PREMIS in the METS container format. It works on the following format types: text, image, audio, video and software and provides the following:
1. Validates a PREMIS in METS document against applicable schema and best practice guidelines.
2. Converts between PREMIS alone and PREMIS in METS in both directions.
3. Describes a file in PREMIS metadata using the DAITSS 2 Description Service

Availability: Tool source code available here: http://sourceforge.net/projects/pimtoolbox/

Tool URL: http://pim.fcla.edu External Link

Documentation URL: http://pim.fcla.edu/resources External Link

Programming language: Schematron + XSLT + Ruby

Operating system/runtime environment: Any OS that supports these; tested on linux but any flavor of unix should do

Licensing: no restrictions

Version: 0.2.1.2 (beta)

Last update: November 04, 2009


Rosetta (ExLibris)

Description: Rosetta is a commercial product developed by ExLibris for the management of digital assets in libraries and academic environments, enabling institutions to create, manage, preserve, and share locally administered digital collections. Rosetta consists of a number of modules, each designed to address different needs, functions, and workflows pertaining to the life cycle of a digital object, including ingestion and metadata extraction, creation of a METS object, ability to edit metadata (both descriptive and technical).

Rosetta supports preservation metadata, including PREMIS objects, PREMIS events in terms of tracking the history of changes to an object, and PREMIS rights for authorization and access rights.

Availability: From Ex Libris as a commercial product.

Tool URL: http://www.exlibrisgroup.com/category/RosettaOverview External Link

Documentation URL: http://www.exlibrisgroup.com/category/RosettaOverview External Link

Last update: November 04, 2009


Statistics New Zealand Prototype PREMIS Creation Tool

Description: This tool is a set of programs using XSL and VBScript that takes output from Jhove, the New Zealand Metadata Extractor, and DROID and produces PREMIS object records. It can run on single or multiple files. To create PREMIS output, an XSL stylesheet is run to bring all outputs together. The resulting file consists of a stream of multiple PREMIS object records, which may be split into separate files using a script which splits them. The PREMIS object schema has been slightly modified to allow for keeping information on the source of the values in each element.

Availability: Requires login and password; see: http://www.loc.gov/standards/premis/pigInfo.jpg

Tool URL: http://pigpen.lib.uchicago.edu:8888/pigpen/40 External Link

Documentation: Requires login and password; see: http://www.loc.gov/standards/premis/pigInfo.jpg

Documentation URL: http://pigpen.lib.uchicago.edu:8888/pigpen/40/Creating_premis_object_records.doc External Link

Last update: July 25, 2007