The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Proposals List


MARC PROPOSAL NO. 2022-08

DATE: May 26, 2022
REVISED:

NAME: Recording Persistent Identifiers and File Formats in Field 856 of the MARC 21 Formats

SOURCE: ISSN International Centre, Paris, and the National Library of Finland

SUMMARY: This paper proposes redefining two obsolete subfields in field 856 (Electronic Location and Access) to record persistent (PIDs) in $g and track non-functional URIs in $h. It also proposes making subfield $q repeatable, and amending $u to substitute the term "PID" for "URN".

KEYWORDS: Field 856 (All formats); Electronic location and access (All formats); Access to online information resources (All formats); Persistent identifier (PID) (All formats); Archival Resource Key (ARK) (All formats); Handle system (All formats); Uniform Resource Name (URN) (All formats); Uniform Resource Locator (URL) (All formats); Electronic file format type (All formats); Internet Media Type (MIME Type) (All formats); MIME Type; URN; URL

RELATED: 2022-DP02; 2020-DP01; 2022-06; 2018-DP11; 93-4; 97-1; 99-06; 2019-01; DP 49; DP 54; DP 69; Guidelines for the Use of Field 856, Revised August 1999;Guidelines for the Use of Field 856, Revised March 2002

STATUS/COMMENTS:
05/26/22 – Made available to the MARC community for discussion.

06/28/22 – Results of MARC Advisory Committee discussion: Approved, with the following editorial amendments:

10/27/22 – Results of MARC Steering Group review - Agreed with the MAC decision.


Proposal No. 2022-08: Recording Persistent Identifiers and File Formats in Field 856

1. BACKGROUND

1.1. Previous Developments

Discussion paper 2022-DP02, written jointly by the ISSN International Centre and the National Library of Finland, proposed several improvements to existing field 856 (Electronic Location and Access), including:

MAC meeting response to the discussion paper was positive. Suggested changes were supported, but a clear majority of the MAC members preferred that a new MARC field for Electronic Archive Location and Access be defined in a separate paper instead of adding all new data elements to 856. Two separate papers have thus been prepared: this proposal covering the redefined subfields in field 856, and a discussion paper covering the creation of field 857.

1.2. Current and Proposed Redefined Subfields in Field 856

Field 856 is structured identically in the MARC Bibliographic, Authority, Holdings, Classification, and Community Information formats, although the definition and scope differ slightly from format to format. The Bibliographic field 856 currently has the following subfields, with new subfields from Proposal 2022-06 to be activated in 2022 in italics, and those introduced by this proposal in bold and underlined:

$a - Host name (R)
$c - Compression information (R)
$d - Path (R)
$f - Electronic name (R)
$g - Persistent identifier (PID) (NR)
$h - Non-functioning Uniform Resource Identifier (URI) (R)
$l - Standardized information governing access (R)
$m - Contact for access assistance (R)
$n - Terms governing access (R)
$o - Operating system (NR)
$p - Port (NR)
$q - Electronic format type (NR)
$r - Standardized information governing use and reproduction (R)
$s - File size (R)
$t - Terms governing use and reproduction (R)
$u - Uniform Resource Identifier (R)
$v - Hours access method available (R)
$w - Record control number (R)
$x - Nonpublic note (R)
$y - Link text (R)
$z - Public note (R)
$2 - Access method (NR)
$3 - Materials specified (NR)
$6 - Linkage (NR)
$7 - Access status (NR)
$8 - Field link and sequence number (R)

Recording persistent identifiers (PIDs) in $g separately from functional URLs in $u would allow libraries to transfer PID - URL mappings to PID resolvers using MARC. 

If link rot or content drift has rendered a URL non-functional, it can be recorded in the reactivated subfield 856 $h, which will allow user interfaces to perform searches in Web archives using the non-functional URL.

This paper also proposes to make 856 $q repeatable and to provide additional guidelines for its use in order to more accurately specify electronic file formats where possible.

2. DISCUSSION

This proposal provides the ability to record persistent identifiers (PIDs) separately from URLs in field 856 with the potential that automated processes for link checking will benefit from having distinct 856 subfields. Additionally, it supports more accurate specification of file formats and their versions by expanding the 856 $q definition and making the subfield repeatable.

2.1. PIDs and URLs in Field 856

In present MARC formats, field 856 provides just one option, subfield $u, for URNs and URLs. In recent years, the concept of persistent identifiers has evolved. Therefore, a redefined subfield is being proposed to accommodate PIDs. Currently, subfield $u may only be repeated if both URN and URL are provided but it is not possible to specify multiple URLs in the same 856 field. Such a limitation on URLs is necessary only if copies of the described resource have different rights metadata or access methods. Removal of the limitation on multiple URLs in a single 856 field in this proposal will make it possible to specify a PID and all the URLs it resolves to in a single field.

Providing the possibility for recording non-functioning URIs in 856 subfield $h will allow libraries to build automated processes to check active URIs and change their status programmatically when needed. At the ISSN International Centre and the National Library of Finland, link checking is done by software but currently there is no defined place within the MARC format to store the resulting non-functional URIs.

Additionally, PID – URI mappings in a revised field 856 will enable MARC records to be used as a tool for transferring these mappings to resolvers if there is no alternative method available for this purpose.

2.2. Internet Media Type and Other Codes in 856 $q

Currently, the file format of the cataloged resource can be specified in subfield $q. Adding the file format version specification in $q may be essential in order to support access and long-term preservation. Internet Media Type codes indicate file types, but a more granular system such as PRONOM PUIDs (https://www.nationalarchives.gov.uk/aboutapps/pronom/puid.htm) will allow digital archivists to accurately determine hardware and software needed for rendering the document.

Since PRONOM PUIDs are machine understandable but not familiar to most library metadata users, it may be necessary to provide both the Internet Media Type code and the PRONOM PUID. File format and version specification is required for determining the need for migrating the resource to a more modern file format or subsequent version of the same format. Although such decisions will typically be made in dedicated digital archive repositories, library systems may be responsible for providing technical metadata about the resources submitted to a repository.

856 $q is currently defined as follows:

$q - Electronic format type (NR)
Identification of the electronic format type, which is the data representation of the resource, such as text/HTML, ASCII, Postscript file, executable application, or JPEG image. Electronic format type may be taken from enumerated lists such as registered Internet Media Types (MIME types).

Intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). The electronic format type also determines the file transfer mode, or how data are transferred through a network. (Usually, a text file can be transferred as character data which generally restricts the text to characters in the ASCII (American National Standard Code for Information Interchange (ANSI X3.4)) character set (i.e., the basic Latin alphabet, digits 0-9, a few special characters, and most punctuation marks) and text files with characters outside of the ASCII set, or non-textual data (e.g., computer programs, image data) must be transferred using another binary mode.)

The latter half of the second paragraph ("The electronic format type also determines the file transfer mode…") is out-of-date and goes into unnecessary detail. We are proposing that the 856 $q description be changed to encourage use of codes from controlled vocabularies. Stronger prescriptive language would ensure the data is machine readable and understandable. We also propose that 856 $q be repeatable to allow the use of multiple code sets.

3. PROPOSED CHANGES

This paper proposes to reactivate two obsolete subfields and redefine two existing subfields in field 856. Changes to existing definitions are underlined in bold.

3.1. Reactivate Subfield $g

Reactivate 856 subfield $g and redefine it as follows:

$g – Persistent identifier (PID) (R)
Persistent identifier assigned to the resource for automated access and other resolution services by a PID resolver. PIDs should be provided as actionable hyperlinks (e.g., HTTP URI format).

If a PID resolves to more than one URL, these URLs may be provided in the same 856 field. 

3.2. Reactivate Subfield $h

Reactivate 856 subfield $h and redefine it as follows:

$h – Non-functioning Uniform Resource Indicator (URI) (R)
Uniform Resource Indicator (URI), which is no longer functional, for example, due to link rot, content drift, etc.

Subfield $h may be repeated if there is more than one non-functioning URI. A note on the status change (including the date) may be added either in subfield 856 $x or 856 $z, depending on the local policy.

3.3. Revise Definition of Subfield $q

Revise the defintion of subfield $q as follows

$q - Electronic format type (R)
Identification of the electronic format type and version, such as HTML, EPUB 3.2, executable application, or JPEG. Electronic format type should be specified with a code taken from the list of registered Internet Media Types (MIME types). If necessary (e.g., in order to specify a file format version to support access or digital preservation) additional information such as PRONOM Unique Identifier (PUID) codes may be used to complement the information provided by the MIME Type.

An up-to-date list of Internet Media Type codes is available at https://www.iana.org/assignments/media-types/media-types.xhtml

An up-to-date list of PRONOM PUIDs is available at https://www.nationalarchives.gov.uk/aboutapps/pronom/puid.htm

Subfield $q should be repeated if two or more codes are provided.

3.4. Revise Definition of Subfield $u

Revise the definition of subfield $u to clarify usage, as follows:

$u - Uniform Resource Identifier (URI) (R)
Uniform Resource Identifier (URI), which provides standard syntax for locating an object using existing Internet protocols. Field 856 is structured to allow for the creation of a URL from the concatenation of other separate 856 subfields. Subfield $u may be used instead of those separate subfields or in addition to them.

Subfield $u may be repeated if more than one URI is recorded.

Used for automated access to an electronic item using one of the Internet protocols or by resolution of a PID. [Former information about repeatability has been removed.]

URLs which no longer function to provide access to the described resource may be transferred to 856 $h.

4. EXAMPLES

4.1. A Journal with a non-functional URL and two functional URLs


leader 01531cas a2200325 i 4500
007 cr
008 080328c19979999ko f||p| |||||||||a0mul
022 0 # $a 1229-0645 $l 1226-9433 $2 27
210 1 # $a J. Korea soc. ind. appl. math. $b (Online)
222 # # $a Journal of the Korea Society for Industrial and Applied Mathematics $b (Online)
260 # # $a Daejeon $b Korea Society for Industrial and Applied Mathematics
710 2 # $a Han-gug san-eob jeongbo eung-yong suhaghoe
776 0 # $t Journal of the Korea Society for Industrial and Applied Mathematics (Print) $x 1226-9433
856 4 0 $h http://mathnet.kaist.ac.kr $u http://www.koreascience.or.kr/journal/E1TAAE/v2n1.page $u http://www.ksiam.org/

4.2. URN in 856 $g and URL in $u


LDR xxxxxnam a22yyyyyuc 4500
007 cr||||||||||||
008 180212s2017  gw |||||o|||| 00||||eng
024 7 # $a urn:nbn:de:bsz:25-freidok-146567 $2 urn
041 # # $a eng
100 1 # $a Radeva,Zornitsa
245 0 0 $a From reconstruction to reformation: Jacob Thomasius's use of Aristotle in the debate on the origin of the human soul
264 # 1 $a Freiburg $b Universität $c 2018
500 # # $a Recherches de théologie et philosophie médiévales. 84, 2 (2017), 427-463, DOI 10.2143/RTPM.84.2.3269053, issn: 1783-1717
856 4 0 $g http://nbn-resolving.de/urn:nbn:de:bsz:25-freidok-146567 $u http://d-nb.info/1152210440/34
856 4 0 $l(star)Unrestricted online access $l http://purl.org/coar/access_right/c_abf2 $n Open access $r (cc)CC BY-NC-ND 4.0 $r https://creativecommons.org/licenses/by-nc-nd/4.0/ $t Attribution-NonCommercial-NoDerivatives 4.0 International $q application/pdf $u https://freidok.uni-freiburg.de/data/14656  $7 0

4.3. PDF/A version A-3b file format specified using both Internet Media Type code and PRONOM PUID


020 # # $a 9789521241291
024 7 # $a urn:isbn:978-952-12-4129-1 $2urn
100 1 # $a Byanjankar, Ajay
245 1 0 $a Predicting risk and return in peer-to-peer lending with machine learning
347 # # $a text file $b PDF $c 1.626 MB $2r daft
856 4 0 $g https://urn.fi/URN:ISBN:978-952-12-4129-1 $u https://www.doria.fi/handle/10024/182600 $q application/pdf $q https://www.nationalarchives.gov.uk/PRONOM/fmt/276

5. BIBFRAME DISCUSSION

The implications of these proposed changes on BIBFRAME will need to be considered together in order to prevent inadvertent data loss and conversion inconsistencies.

6. SUMMARY OF PROPOSED CHANGES

In field 856 (Electronic Location and Access) (identical in the MARC Bibliographic, Authority, Holdings, Classification, and Community Information Formats):

6.1. Reactivate and redefine the following subfields, as described in sections 3.1 and 3.2:

$g - Persistent identifier (PID) (R)
$h - Non-functioning Uniform Resource Identifier (URI) (R)

6.2. Revise the definitions of the following subfields, as described in sections 3.3 and 3.4:

$q - Electronic format type (R)
$u - Uniform Resource Identifier (URI) (R)


HOME >> MARC Development >> Proposals List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 10/27/2022 )
Legal | External Link Disclaimer Contact Us