The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Discussion Paper List


MARC DISCUSSION PAPER NO. 2021-DP06

DATE: December 22, 2020
REVISED:

NAME: Recording Data Provenance in the MARC 21 Formats

SOURCE: MARC/RDA Working Group

SUMMARY: This paper discusses the potential for encoding data provenance in the MARC 21 Formats.

KEYWORDS: Data Provenance (All formats); Metadata Statement (All formats); Metadata Description Set (All formats); RDA Toolkit Restructure and Redesign Project; RDA

RELATED:

STATUS/COMMENTS:
12/22/20 – Made available to the MARC community for discussion.

01/28/21 – Results of MARC Advisory Committee discussion: Though the paper in general was positively received, MAC expressed a preference not to pursue several options for expanding the coverage of data provenance in MARC 21. The option for making distinctions between vocabulary encoding schemes and string encoding schemes will not be taken forward at this time. Likewise, the option for creating a new data provenance format or formats will not be further explored at present. A follow up proposal or discussion paper from the MARC/RDA Working Group may set out the case for developing field 883 in order to better accommodate data provenance information; equally, it may consider the introduction of a new data provenance specific subfield to the formats such as $_ or adapt the hitherto underused subfield $7.


Discussion Paper No. 2021-DP06: Recording Data Provenance in the MARC Formats

1. BACKGROUND

The new RDA Toolkit glossary defines data provenance as: "Information about the metadata recorded in an element or set of elements. Metadata about metadata, or metametadata." The Toolkit guidance chapter on data provenance explains that: "This information can be used to infer the context and quality of the metadata". Data provenance information, as conceived by RDA, can already be recorded in MARC 21 at the record, field and subfield level using a variety of elements. However, coverage at the field and subfield level is often sparse and uneven. Furthermore, the full range of  methods which RDA makes available to record data provenance is generally not supported. This paper sets out the parameters of data provenance in RDA and its current coverage in MARC 21. The paper then goes on to make a case for expanding MARC 21's accommodation of data provenance in order to better support established and emerging applications. Finally, it puts forward several alternatives for how MARC 21 might be adapted as regards its coverage of data provenance. RDA does not make the recording of data provenance information a required aspect of resource description.

2. DISCUSSION

2.1. RDA and Data Provenance

Data provenance information can be recorded in various ways using the new RDA Toolkit. It offers a number of elements, levels of granularity and recording methods for doing so.

2.1.1. RDA and data provenance elements

RDA supports the recording of data provenance information using a range of different elements. Some elements can only be used for the purpose of recording data provenance information, while others can be used for this and other purposes. The following elements can be used exclusively to record data provenance information and may, collectively, be referred to as "meta-elements" :

The following elements can be used exclusively to record data provenance information for Nomen entity-related elements :

The following elements can be used to record data provenance information in addition to other aspects of resource description. In each case the circumstance under which an element may be used for recording data provenance is given in parentheses:

2.1.2. RDA and data provenance granularity

Data provenance information can be recorded using different levels of granularity in RDA. It does this by specifying two subcategories of metadata work: metadata statements and metadata description sets.

A metadata statement is defined in the glossary as:

"A piece of metadata that assigns a value to an RDA element that describes an individual instance of an RDA entity."

In a MARC 21 context, examples of metadata statements include the values recorded in individual subfields.

A metadata description set is defined in the glossary as:

"One or more metadata statements that describe and relate individual instances of one or more RDA entities."

In a MARC 21 context, examples of a metadata description sets include the values recorded  collectively in records and in the fields belonging to those records which are composed of more than one subfield.

2.1.3. RDA and data provenance recording methods

Besides providing a range of elements and degrees of specificity for recording data provenance information, RDA also offers several methods with which to record it. These are listed below:

2.2. RDA Data Provenance Information in MARC 21

At the present time, MARC 21 can be used to record all of the RDA elements previously mentioned, using varying degrees of granularity and selective recording methods. Some examples are given below. In each case, the RDA element label, granularity and recording method of the data provenance information is provided for context.

Example 1

005 20190129073611.0

Element : date of publication (recording a timespan when metadata are published)
Granularity : metadata description set (any MARC 21 format)
Recording method : structured

Example 2

040 ## $d DLC

Element : author agent (recording an agent who records metadata)
Granularity : metadata description set (any MARC 21 format)
Recording method : identifier

Example 3

LDR / 18 a

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata description set (MARC 21 bibliographic format)
Recording method : identifier

Example 4

667 ## $a Machine –derived non-Latin script reference project.

Element : context of use
Granularity : metadata description set (MARC 21 authority format)
Recording method : unstructured

Example 5

382 0# $a clarinet $n 1 $a piano $n 1 $s 2 $2 lcmpt

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata description set (MARC 21 bibliographic and authority formats)
Recording method : identifier

Example 6

338 ## $a volume $2 rdact

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata statement (MARC 21 bibliographic and holdings formats)
Recording method : identifier

2.2.1. Benefits of recording data provenance in MARC 21

Recording data provenance information in a resource description context is beneficial from the perspectives of managing the metadata creation process and supporting end users. It serves library staff engaged in collection-related cataloging activities as well as library patrons whose goal it is to access holdings. For example, a cataloger may choose to reuse one shared record in preference to another based on the organization which created it. Equally, a researcher may choose to order one holding in preference to another based on identifying features such as the source of transcribed information on a manifestation. Apart from these more traditional functions, data provenance information also supports the development of emerging products and services which are based on the selective transformation of cataloging metadata into non-MARC formats. For example, a dataset may be generated from a library’s catalog using individual field or subfield contents rather than whole MARC records. If data provenance information can be included in such a dataset, then this may lead to a better understanding of what it contains.

2.2.2. Recording data provenance and granularity in MARC 21

More value can be derived from data provenance information when it is recorded to a greater rather than lesser degree of granularity. Specificity reduces the need for interpretation on the part of a human user. It also increases the machine actionability of data provenance information. Conversely, recording data provenance information at the record or field rather than at the subfield level can result in its being of limited value. This has a bearing on both the utility of data within and outside the library catalog. Hence, the following example of an 040 field demonstrates an instance in which multiple organizations have contributed to the same MARC 21 record in the NACO authority file over time:

040 ## $a DLC $b eng $e rda $c DLC $d  DLC $d MH $d OCoLC $d NjP $d IU $d WU $d DLC $d OCoLC $d WU $d DLC $d DLC-OK $d Uk $d SG-SiILA $d CU-HE $d Uk $d DLC $d InU $d OCoLC $d CSt $d OCoLC $d CSt $d UPB $d DFo $d WU $d DFo $d DLC $d MdRoLAC $d CaOONL $d DeU

Under these circumstances, the last organization sequenced in the 040 subfield $d string may be regarded as the latest author agent. Equally, it may be understood that the author agent subscribes to a shared set of policies for creating metadata. However, without subfield level linkage, it may prove difficult to establish what exactly has been contributed by the latest author agent as opposed to other author agents listed in the same 040 string. If shared policy has been applied incorrectly over time or is open to question, then feedback may be harder to deliver and clarification may be harder to seek.

To take another example, a data set may be composed of field values derived from bibliographic records in a catalog which has been created using different content standards. The MARC 21 records which were mined to produce this data set may carry either of the following values:

LDR / 18 a

OR

040 ## $e rda

Equally, if some records were created using AACR and later enhanced using RDA, then they may contain the following values in combination:

LDR / 18 a

AND

040 ## $e rda

Where such different sources consulted are reflected in the same data set, then the values recorded in selected subfields are likely to show a significant degree of variation: Latinizations, abbreviations and corrections will occur in some, but not in others; general material designations will occur in some, but not in others, etc. Without the contextualization provided by a source consulted at the field level, any discrepancies in cataloging policy may appear random and incomprehensible to the researcher or linked data practitioner. Equally, the detection and resolution of these discrepancies by a program / computer script may prove all the more challenging.

The MARC 21 formats do provide some coverage of data provenance information at the field and subfield as well as record level. In the case of an author agent, this may be located in subfield $5. The following provides one such example in an authority record:

680 ## $i May be combined with geographic name in the form $a Baroque sculpture-Germany. $5 CaQMCCA

However, the presence of $5 is piecemeal across the formats.

In the case of a source consulted, this may be recorded at the field and subfield level using $2. The following provides two such examples in the bibliographic record:

650 7 $a JUVENILE FICTION / Historical / General. $2 bisacsh
655 7 $a Juvenile works. $2 fast

However, $2 is confined to usage in the context of vocabulary encoding schemes.

The current scope of subfield $2 raises an additional issue of granular data provenance as regards the distinction which RDA makes between vocabulary encoding schemes and string encoding schemes. The RDA glossary defines a vocabulary encoding scheme (VES) as follows:

"A named structured list of representations of controlled values for elements. A vocabulary encoding scheme includes an RDA list of terms or their corresponding value vocabularies in the RDA Registry, an ISO code list, a standard terminology, an authority control system, etc. Simple keyword indexes are excluded."

Meanwhile, the RDA glossary defines a string encoding scheme (SES) as follows:

"A set of string values and an associated set of rules that describe a mapping between that set of strings and a value of an element."

The distinctions between VES and SES cannot currently be expressed in MARC 21. Access points may be assigned the same $2 coding (or indicator values) whether or not they have been authorized. This ambiguity may be considered unproblematic in the context of a library system that provides look-up functionality from its bibliographic catalog to an authority file. However, such functionality is not universally the case. Furthermore, such contextualization may also be lacking if bibliographic data has been transformed into non-MARC formats for applications outside the library environment. If, for example, an access point carries the name and title authority source code "naf", then the end user may conclude that the value attached to it belongs to the vocabulary encoding scheme LCNAF. In fact, the access point may only have been constructed according to the string encoding scheme which is used to formulate values in LCNAF.

2.2.3. Recording methods and data provenance in MARC 21

MARC 21 does not currently support the full range of RDA recording methods for each element which can be used to record data provenance. Most of the data provenance related elements found in RDA can be expressed using any recording method: unstructured description, structured description, identifier and IRI. Providing support for this range of recording methods would better support human interpretation and machine actionability. For example, to take one recording method, the language of expression used to generate a MARC 21 based description can currently only be encoded as an identifier. The following provides an example of a language identifier for German recorded in the 040 field:

040 ## $b ger

If such an identifier could be recorded as a human readable value in the form of an unstructured or structured description, then it may more readily assist the recipient of a dataset in understanding the information which it contains.

Conversely, it is currently only possible to encode a recording source as an unstructured description. The following provides an example recorded in the 500 field:

500 $a Title from cover.

If such information could be recorded in a structured way using a VES or as linked data using an IRI, (both of which RDA now supports) then this would render it more conducive to automated processing and linked data applications.

As regards the wider issue of linked data, many elements which RDA provides in the context of data provenance may be recorded using an IRI. At present, however, only one of these elements can be recorded as such in MARC 21. For illustration, in the following example, the content of subfield $1 represents a source consulted:

100 1# $a Obama,Michelle, $d1964- $1 http://viaf.org/viaf/81404344

2.2.4. Recording data provenance and efficiency in MARC 21

It may be considered that expanding the coverage of data provenance information in the MARC formats will result in a more labor intensive process of resource description. However, this could be minimized at the start of the cataloging process by using templates which prepopulate data provenance information in authority, bibliographic and other format records. In cases where records are derived from an external source and enhanced locally, then data provenance information could be added using macros which enter standardized strings of metadata. In the current cataloging environment, these processes may already be used to populate data provenance information at the record level and, selectively, at the field level. Standardized LDR values and values contextualized by a source code in subfield $2 are two current examples of this approach.

2.2.5. Recording data provenance and interoperability in MARC 21

It may also be considered that the full range of data provenance elements and recording methods available for recording them are surplus to any single, local need. However, it should be noted that the recording of data provenance information is not a requirement in RDA. Rather, RDA sets out the practical benefits of recording data provenance information and a variety of options for doing so. One cataloging agency may choose to record a specific type or types of data provenance information, while another chooses to record others. The key requirement is that the data which they share remains interoperable. In order to avoid conflicts or duplicated effort in the recording of additional data provenance information, cataloging cooperatives may need to agree upon a set of best practises for doing so.

2.3. Options for Expanded Data Provenance Coverage in MARC 21

There are several ways in which the existing coverage of data provenance information in MARC 21 could be expanded from its present state. At one end of the spectrum, selective changes could be made which serve individual communities. At the other end, more general changes could be made in support of the community as a whole. The following options present three possible means by which data provenance information might be better supported in MARC 21. In each case, the relative benefits and drawbacks are considered. One or a combination of these different approaches may be considered preferable. Alternatively, the community may consider that no expansion to the existing coverage of data provenance in MARC 21 is necessary; if a more granular and flexible approach to data provenance is thought beneficial, then this might be realised outside the formats in a linked data context. Equally, it could be achieved using the MARCXML schema which is based on the MARC 21 standards.

2.3.1. OPTION 1 – Selective changes

Selective changes could be made to the existing MARC 21 formats over time which expand their coverage of data provenance information in ways which serve individual communities as and when these changes are required. At its most recent meetings held in Summer 2020, the MARC Advisory Committee considered several recommendations for changes to MARC 21 which were felt to have a bearing on data provenance. Papers dealing with manifestation statements, special coded dates and the date of assignment for Dewey Decimal numbers all raised questions as regards to how data provenance information should be recorded in these contexts.

An advantage of addressing data provenance issues on a case-by-case basis would be that each recommendation for change could be considered on its own merits. Changes could be introduced gradually over time, meaning that the burden of implementation would be spread across successive MARC 21 updates rather than concentrated into a single MARC 21 development cycle. Minimizing the changes which are made to MARC 21 in support of data provenance would reduce the need for system configurations which are considered unnecessary for the maintenance or development of local business processes or services. Limiting the scope of any changes which relate to data provenance would also help preserve the backwards compatibility of legacy data.

A disadvantage of the selective, gradual approach described above may be that it ends up perpetuating the uneven coverage of data provenance information which is already present across the MARC 21 formats. Under these circumstances, certain aspects of bibliographic description, authority control, etc. could be contextualized by data provenance at the subfield level, whereas others could not. Likewise, certain recording methods could be used to represent a category of data provenance, whereas others  could not. An absence of the ability to express granular data provenance information in a variety of ways would maintain a status quo in which MARC 21 data may be considered ambiguous and inflexible in both traditional and emerging cataloging contexts.

2.3.2. OPTION 2 – Generalized changes

Generalized changes could be made to the existing MARC 21 formats in support of better recording data provenance at the field/subfield level. Approaches could include the deployment of capital letters, punctuation marks, mathematical symbols or other non-standard characters for use as content designation. Alternatively, a variety of standard subfield characters could be used for the same purpose across different fields in order to provide the necessary coverage. An additional solution could involve the use of field 883 (Metadata Provenance) in combination with subfield $8 (Field link and sequence number). Each of these possibilities is considered below:

2.3.2.1.  Deployment of Non Standard Subfield Delimiters

MARC 21’s record structure is partially based upon the Format for Information Exchange (ISO 2709) which, in combination with the American Standard Code for Information Interchange (ASCII), allows for the use of capitalization, punctuation and mathematical symbols as identifiers within data fields. A range of such characters could be defined as subfields across the formats in order to record data provenance information.

An advantage of using non-standard characters in order to record data provenance information is that it would offer field/subfield level coverage. Sufficient subfield codes would be available to represent all the subcategories of data provenance information set out by RDA and their full range of recording methods. Such an approach would also provide a consistent solution across the MARC 21 formats. In addition, adopting this as a method of recording data provenance information would avoid the requirement to implement $8, a subfield which remains unconfigured within many MARC 21 based library systems.

A disadvantage of introducing non-standard characters for subfield coding purposes is that it would go beyond the current scope of what the MARC 21 Formats: Background and Principles state may be used in this context. In setting out the range of delimiters which may be used, section 8.4.2 of the document states the following:  

"Subfield codes in the MARC 21 formats consist of two characters--a delimiter [1F(16), 8-bit], followed by a data element identifier. Data element identifiers may be a lowercase alphabetic or a numeric character."

Even if it were decided to broaden the range of characters which can be used for subfield coding purposes in the MARC 21 formats, this may still result in problems for library management systems which are not set up to allow for a greater variety than that which is already possible.

2.3.2.2. Deployment of Various Subfields to the Same Purpose Across Different Fields

Although no lower case alphabetic or numeric character remains wholly undefined in the MARC 21 formats, there are some characters which have rarely been used up until this point. For example, subfield $7 has hitherto only been deployed in two fields belonging to the Authority format and only twenty-two fields belonging to the Bibliographic format. In both cases alternative subfield characters are still free for use in fields where $7 has already been defined. The use of different subfields to record equivalent values is generally avoided in the MARC 21 formats, but is not entirely without precedent. For example, subfields $e and $j in the Bibliographic format may both carry a value for a relator term in circumstances where that term describes the relationship between a name and a work.

An advantage of using various coding across the formats for recording data provenance information is that it would offer field/subfield level coverage. Adopting this as a method of recording data provenance information would also avoid the requirement to implement $8. It may be argued that defining only one new subfield at the field level would not support the full range of categories and recording methods associated with data provenance information. However, subfield values for data provenance could be designed in such a way that they make up for this shortfall by including standardized character strings which denote the category and method applicable.

A disadvantage of using various subfields to record data provenance at the field level across the MARC 21 formats is that it would make the machine processing of such information a more complex task. Introducing standardized character strings to represent data provenance information may also require the overhead of developing and maintaining a new MARC 21 code list which supports this approach.

2.3.2.3. Deployment of Additional Subfields in Field 883

Field 883 is already defined to carry data provenance information in all of the MARC 21 formats. It is currently defined as follows:

"Used to provide information about the provenance of metadata in data fields in the record. Field 883 contains a link to the field to which it pertains."

Although some of the subfield codes which field 883 currently contains do not align with RDA's categories of data provenance information, other subfields do. For example, a correspondence exists between subfield $q (Assigning or generating agency) on the one hand and an author agent on the other. Further subfields could be added to field 883 in order to record RDA's other categories of data provenance information on the proviso that it is not necessary to record either subfield $a (Process), $b (Confidence value) or $u (Uniform Resource Identifier) in the same string as these. The concept of a process, a confidence value, and a URI used to identify a process do not correspond to RDA's present element set or its guidance on the recording of data provenance information.

An advantage of expanding field 883 with subfields which support the recording of additional data provenance information is that it would offer the means to provide coverage at the field/subfield level. In addition, the same approach could be used consistently across the MARC 21 formats. Sufficient alphabetic subfield codes are still available to represents all the subcategories of data provenance information set out by RDA and sufficient numeric subfield codes are available to represent its full range of recording methods.

A disadvantage of using field 883 to record data provenance information at field level is that it would require implementation of subfield $8. If usage of the field were limited to a situation in which a process and confidence value must be recorded in subfields $a, $b and $u, then it would be out of scope for application in an RDA context.

2.3.3. OPTION 3 – New formats

A set of mirror formats could be created in support of recording data provenance information at the field/subfield level in the existing MARC 21 formats. Each mirror format could be linked to its equivalent, existing format using an identifier recorded in subfield $0 (Authority record control number or standard number) of the appropriate field. Each mirror format's field structure would be analogous to that of an authority record, including:

An advantage of creating new mirror formats in order to record data provenance information is that it would offer widespread field and subfield level coverage in the existing formats. Existing instances of subfield $0 in the current formats could provide links to their mirror counterparts, although additional $0 subfields would need to be defined in fields which do not currently hold them. Depending on local preference, an established heading in the mirror format could correspond to a particular recording method. See also from tracing fields could list the alternative recording methods which are available for a particular category of data provenance information. Adopting this as a method of recording data provenance information would also avoid the requirement to implement subfield $8.

A disadvantage of creating mirror formats in order to record data provenance information is that they would only be suitable in the context of data which is entered in a structured form, identifier or IRI. If data is unstructured, then it is ill suited for purposes of authority control; a separate approach would need to be taken in order to support this recording method for data provenance information. An approach to recording data provenance information at the field/subfield level which requires the creation of additional formats would also carry with it development and maintenance overheads which would not arise to the same extent if a solution were adopted in the existing formats. It would be insufficient to define only one new format to handle aspects of data provenance in the other formats. This is because the existing formats often use the same tags and subfields, but for different purposes. Hence, multiple new formats would be required to align with those which already exist.

2.4. Adapting Subfield $2 to Support the Distinction Between VES and SES

Any one of the options set out above may include a consideration of how MARC 21 makes distinctions between vocabulary encoding schemes (VES) and string encoding schemes (SES) in the future. In order to address this issue, an additional, string encoding scheme specific code list could be created for use in the context of subfield $2. As regards amending the existing formats, it would be necessary to make $2 a repeatable subfield so that vocabulary encoding schemes and string encoding schemes could be recorded discretely.

3. EXAMPLES

The following examples model the changes which are discussed under each of the options set out above. These are intended to be illustrative rather than prescriptive. In each case, the RDA element label, granularity and recording method of the data provenance information is provided for context. Additional notes are supplied where necessary.

3.1. OPTION 1 – Selective changes

Example 1

881 ## $c The opening of heauen gates, or The ready way to euerlasting life, deliuered in a most familier dialogue, betweene reason and religion, touching praedestination, Gods word, and mans free-will, to the vnderstanding of the weakest capacitie, and the confirming of the more strong.$d The second edition. $c By Arthvr Dent, preacher of the word of God, at Southshoobery in Essex. $e Imprinted at London$ffor Iohn Wright,$gand are to bee sold at his shop at Christ-Church gate. $f1611 $z statement found on title page.

Element : recording source
Granularity : metadata description set (MARC 21 bibliographic format)
Recording method : unstructured

Additional notes:
A value for recording source is modeled in 881 $z.

Example 2

046 ## $k 1959 $2 edtf $v Britannica online, April 16, 2020

Element : source consulted
Granularity : metadata description set (MARC 21 bibliographic format)
Recording method : unstructured

Additional notes:
A value for source consulted is modeled in 046 $v.

Example 3

082 00 $a 909.07 $e 20190523

Element : related timespan of work (recording a timespan for validity of metadata)
Granularity: metadata statement (MARC 21 bibliographic / authority / community formats)
Recording method : structured

Additional Notes:
A value for related timespan of work is modeled in 082 $e.

3.2.  OPTION 2 – Generalized changes

Making generalized changes to existing MARC 21 formats in support of data provenance:

3.2.1.  Deployment of Non-Standard Subfield Delimiters

Example 1

370 ## $aRadzimyn, Poland $b Surfside, Fla. $! DLC $A identifier

Element : author agent (recording an agent who records metadata)
Granularity : metadata description set (MARC 21 authority format)
Recording method : identifier

Additional Notes:
A value for author agent is modeled in 370 $! ; a value for the recording method is modeled in $A.

Example 2

500 1# $a Bland, Robin $& 20190904105359.0 $A structured

Element : date of publication (recording a timespan when metadata are published)
Granularity : metadata statement (MARC 21 authority format)
Recording method : structured

Additional Notes:
A value for date of publication is modeled in 500 $& ; a value for the recording method  is modeled in $A.

Example 3

373 ## $a Faculty of Life Science, Manchester University $s 2005 $! Uk $A identifier $+ s

Element : author agent (recording an agent who records metadata)
Granularity : metadata statement (MARC 21 authority format)
Recording method : structured

Additional Notes:
A value for author agent is modeled in 370 $! ; a value for the recording method is modeled in $A ; a value for the subfield specified is modeled in $+.

3.2.2. Deployment of Various Subfields to the Same Purpose Across Different Fields

Example 1

245 00 $a Thorium, preparation and properties / $c J. F. Smith ... [et al.]. $7 dpesc/dprst[AACR 2]

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata description set (MARC 21 bibliographic format)
Recording method : structured

Additional Notes:
A code for source consulted, a code for the recording method and an associated value are modeled in 245 $7. The code "dpesc" represents the data provenance element "source consulted"; the code "dprst" represents the recording method "structured" ; "[AACR 2]" represents the value associated with the element and its recording method. Other standardized abbreviations would be required to represent different data provenance elements and recording methods. Punctuation is added to the string in support of machine processing.

Example 2

500 $a Title should read : Hierarchy in organizations. $7 dpesc/dprid[rda]

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : identifier
 
Additional Notes:
A code for source consulted, a code for the recording method and an associated value are modeled in 245 $7. The code "dpesc" represents the data provenance element "source consulted"; the code "dprst" represents the recording method "structured" ; "[rda]" represents the value associated with the element and its recording method. Other standardized abbreviations would be required to represent different data provenance elements and recording methods. Punctuation is added to the string in support of machine processing.

Example 3

245 10 $a Songs, duets, trios, &c. in Fontainbleau; or, our way in France. $b A [sic] comic opera. As performed at the Theatre-Royal in Covent-Garden. Written by Mr. O'Keeffe. $7 dpesc/dprst/dpsfb[AACR 2]

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : structured

Additional Notes:
A code for source consulted, a code for the recording method, a code for the subfield specified and an associated value are modeled in 245 $7. The code "dpesc" represents the data provenance element "source consulted"; the code "dprst" represents the recording method "structured" ;  the code "dpsfb" represents the subfield specified; "[AACR2]" represents the value associated with the element and its recording method. Other standardized abbreviations would be required to represent different data provenance elements and recording methods. Punctuation is added to the string in support of machine processing.

3.2.3. Deployment of Additional Subfields in Field 883

Example 1

100 1# $8 3\p $0 (DE-588)1215943776 $0 https://d-nb.info/gnd/1215943776    $0 (DE-101)1215943776 $a Tolkiehn, Niels $d 1985- $e Verfasser $4 aut $2 gnd
883 ## $8 3\p $q DE-101 $e ger $3 identifier  

Element : language of expression (recording a language of description)
Granularity : metadata description set (MARC 21 bibliographic format)
Recording method : identifier

Additional Notes:
A value for language of expression is modeled in 883 $e; a value for the recording method is modeled in $3.

Example 2

041 ## $81\p $a eng
883 0# $81\p$a aep-lc $c 1,00000 $d 20190913 $q DE-101$u https://d-nb.info/provenance/plan#aep-lc $f MARC Code List for Languages $3 unstructured

Element : source consulted (recording a content standard used for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : unstructured

Additional Notes:
A value for source consulted is modeled in 883 $f ; a value for the recording method is modeled in $3.

Example 3

700 1# $84\p $a Rapp, Christof $e Akademischer Betreuer $ 4dgs
883 1# $84\p $a npi$d20200824 $q DE-101 $ $e ger $3 identifier $7 e

Element : language of expression (recording a language of description)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : identifier

Additional Notes:
A value for language of expression is modeled in 883 $e ; a value for the recording method is modeled in $3; a value for the subfield specified is modeled in $7.

3.3. OPTION 3 – New formats


Example 1

(MARC 21 Authority Format)

100 0# $aGautama Buddha$vEarly works to 1800 $0 0123456789

(Mirror format : script element record)

008 34 / s ; 008 35 / i
001 0123456789
100 ## $a (B $v (B
500 ## $a Latin $v Latin

Element : script (recording a script of description)
Granularity : metadata description set (MARC 21 authority format)
Recording method : identifier (established heading) ; structured (see also from reference)

Additional Notes:
A record control number which links to the data provenance mirror format record is modeled in 100 $0 of the Authority format record. A record control number which links to the Authority format record is modeled in 001 of the mirror format record;  a code for script is modeled in 008/34 of the mirror format record ; a code for the recording method established is modeled in 008/35 of the mirror format record; an established heading value for script is modeled in 100 $a and $v of the mirror format record; a see also from tracing value for script is modeled in 500 $a and $v of the mirror format record. Other 008 codes would be required to represent different data provenance elements and recording methods.

Example 2

(MARC 21 Bibliographic Format)

245 00 $a [Seventeen poems]. $0 0123456789

(Mirror format : source consulted element record)

008 34 / c ; 008 35 / i
001 0123456789
245 ## $a i 
545 ## $a ISBD

Element : source consulted (recording a transcription standard used for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : identifier (established heading) ; structured (see also from reference)   

Additional Notes:
A record control number which links to the data provenance mirror format record is modeled in 100 $0 of the Bibliographic format record. A record control number which links to the Bibliographic format record is modeled in 001 of  the mirror format record;  a code for source consulted is modeled in 008/34 of the mirror format record ; a code for the recording method established is modeled in 008/35 of the mirror format record ; an established heading value for source consulted is modeled in 245 $a of the mirror format record; a see also from tracing value for source consulted is modeled in 545 $a of the mirror format record. Other 008 codes would be required to represent different data provenance elements and recording methods. 

3.4. Adapting Subfield $2 to Support the Distinction Between VES and SES


Example 1

100 1# $a Quiery, Greg. $2 nafses

Element : source consulted (recording a string encoding scheme for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : unstructured

Additional Notes:
A code for string encoding scheme is modeled in the first iteration of 100 $2 for an unauthorized access point.

Example 2

100 1# $a Zhao, Yingzhu. $2 nafses $2naf

Element : source consulted (recording a string encoding scheme for metadata)
Granularity : metadata statement (MARC 21 bibliographic format)
Recording method : unstructured

Additional Notes:
A code for string encoding scheme is modeled in the first iteration of 100 $2 for an authorized access point.

4. BIBFRAME DISCUSSION

BIBFRAME allows the recording of provenance-type information and when the change to MARC is determined, further analysis of any needed changes would be made.

5. QUESTIONS FOR DISCUSSION

5.1. Is the case for expanding MARC 21’s accommodation of data provenance to better support established and emerging applications sufficiently articulated? (See 2.2.1. - 2.2.3.)

5.2. Have the overall challenges and mitigating strategies for expanding accommodation been sufficiently articulated? (See 2.2.4. - 2.2.5.)

5.3. Of the three options listed are there any advantages or disadvantages which have not been addressed? (See 2.3.)

5.4. Of the three options listed at 2.3.1 - 2.3.3 which is considered preferable and why?

5.5. If option two, which of its subcategories is considered preferable and why?

5.6. If option two or three, is a field or subfield level approach preferable and why?

5.7. If option two or three, is an approach which specifies the recording method preferable and why?

5.8. If another option is preferred, then what would this be?

5.9. Is the suggested approach to distinguish between vocabulary encoding schemes and string encoding schemes an acceptable solution? (See 2.4.)

5.10. If another approach to distinguishing between vocabulary and string encoding schemes is preferred, then what would this be?

5.11. Is there anything else which should be taken into account?


HOME >> MARC Development >> Discussion Paper List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 04/28/2021 )
Legal | External Link Disclaimer Contact Us