MARC DISCUSSION PAPER NO. 2018-DP07

DATE: May 25, 2018
REVISED:

NAME: Designating Sources for Names in the MARC 21 Bibliographic Format

SOURCE: PCC Task Group on URIs in MARC

SUMMARY: This paper explores some reasons for extending the use of $2 for source vocabulary to the 1XX and 7XX name entry fields in the Bibliographic format, and the implications of doing so.

KEYWORDS: Subfield $2, in name fields (BD); Subfield $2, in title fields (BD); Source of heading or term (BD)

RELATED:

STATUS/COMMENTS:
05/25/18 – Made available to the MARC community for discussion.

06/24/18 – Results of MARC Advisory Committee discussion: There was general support from MAC for the use case demonstrated in the paper, though there was consensus that title complexities should be explored separately from names. Most felt that $2 should be optional. The discussion paper will return as a proposal, focusing on names, with perhaps the addition of field 130 and 730 for titles in those fields. MAC expressed the desirability for the complexities of the authority and RWO URIs to be better explored in the narrower context of names.

Discussion Paper No. 2018-DP07: Designating Sources for Names

1. BACKGROUND

Library cataloging practice acknowledges the use of alternative source vocabularies for a wide range of data elements. MARC provisions for indicating source vocabulary have long existed for subjects, and have also been made available for other fields, referencing standards and vocabularies external to MARC. Data elements covered by these provisions include geographical entities, languages, dates, demographic categories, and genre/form data. MARC allows libraries to specify the source vocabulary by giving either an appropriate second indicator value or a MARC source code in $2.

This information is used to manage controlled vocabulary terms in bibliographic records. For example, specifying the source vocabulary allows authority maintenance applications to validate terms against the appropriate source file, and allows linked data conversion algorithms to output URIs representing the entities given in the descriptions.

Names are a conspicuous exception to these provisions, with the sole exception of geographic names, MARC field 751. Historically the operating assumption appears to have been that a library would apply a single authority file, or in any case a single set of heading construction rules to its entire catalog. However, identity management for persons and organizations has been a major area of effort for libraries in recent years. Alternatives have emerged to national library authority files that to an increasing extent co-exist with them in library catalogs. They include VIAF, ISNI, ULAN, and ORCID; there is also a growing number of locally or regionally administered identity management systems such as Opaque Namespace or UNT Names.

2. DISCUSSION

MARC employs two methods to specify the source vocabulary used with a given field. Both are in use for the 6XX subject heading block. One is to define a second indicator value designating the source vocabulary. The other is to give the MARC source code for the relevant vocabulary in $2. (If the latter method is used in 6XX, a second indicator value of 7 is given.) The $2 method has the advantage of being able to accommodate an arbitrary number of source vocabularies. It has been adopted for a wide range of MARC fields, including many of the 01X-09X coded data fields and 3XX descriptive fields. This is the method under discussion in this paper. If implemented in MARC, it would entail defining $2 as follows, at least for 100, 110, 111, 700, 710, and 711, and potentially also (see discussion below) for 130, 240, 730, 758, 800, 810, 811, and 830:

$2 - Source of heading or term (NR)

URIs and identifiers

Names from linked data sources such as VIAF or ORCID are often given in conjunction with URIs. An argument could be made that when a URI is given with a name, it is not necessary to give the source in $2 because (1) a linked data URI can be dereferenced to RDF that will identify the source, and (2) the source can be identified by parsing the link path.

However, neither of these assumptions is reliable. Existing applications that handle MARC data are rarely able to incorporate RDF lookup routines into their authority maintenance procedures. For the foreseeable future, explicit identification of the source vocabulary will greatly facilitate maintenance of authorities data.

Although it is often possible to read the source vocabulary from the link address, additional parsing rules are needed to accomplish this task. With the variety of URI conventions in use, such rules would have to be complex. For example, many sources have canonical URIs with a pattern similar to the following: http://isni.org/isni/[identifier]

However, others use a special prefix: http://sws.geonames.org/[identifier]/

Still others, such as the following widely used sources, embed the designation of the vocabulary further down the address path:
http://id.loc.gov/authorities/[vocabulary code]/[identifier]
http://vocab.getty.edu/[vocabulary code]/[identifier]

Sometimes an identifier is not given, although there is a source vocabulary the name is intended to validate against. For example, many libraries employ workflows where names are entered manually by staff, without reference to an authority file, for later reconciliation in batch. Such workflows may include a subsequent step where names that do not find a match in the relevant vocabulary are then submitted as new entries. The ability to specify the relevant source vocabulary would be helpful in such workflows.

Finally, it is important to remember that in some vocabularies the terms are not in fact associated with a stable authority or identifier. This is often true of local schemes. Providing for these in $2 will allow these terms to be used (or retained) without the risk of confusion with other vocabularies.

Multiple occurrences of $0/$1

In general, MARC defines $2 to refer to a term rather than to the identifier or identifiers given in $0 (Authority record control number or standard number) or $1 (Real World Object URI). Although it is implied that the $0 or $1 will refer to the same entity as the term, MARC is not specific about how an identifier given in $0 or $1 relates to the textual label for the entity given in the same field. Ambiguities can arise when multiple values of $0 or $1 are given corresponding to different source vocabularies, particularly in implementations where identifiers are used to facilitate headings maintenance. For example, in a case such as the following, which identifier “controls” the heading?

110 2# $a Camerata Academica Salzburg $0 http://id.loc.gov/authorities/names/n81070862 $1 http://d-nb.info/gnd/1212457-6

Defining $2 for name fields would resolve the ambiguity in these cases by allowing the library to specify which source the heading in $a should conform to.

Titles

In addition to names, the MARC format provides for controlled name-title access points in 100, 110, 111, 700, 710, and 711. Although these are not the main focus of this discussion paper, the same considerations about source vocabularies that apply to names would also apply to names given in conjunction with titles. Introducing $2 in the aforementioned fields would allow source vocabularies to be specified for name-title headings as well.

If sources can be specified for name-title headings, it should arguably be possible also to specify them for title fields. The latter would include 130, 730, and 830. In addition, series titles in 800, 810, 811, and 830 are constructed similarly to 7XX titles. Falling outside traditional models for controlled access points, but also raising the question of how to indicate source vocabulary, is the recently introduced 758 field. It may be worth considering the possibility of defining $2 for each of these fields.

Uniform titles coded in 240 present a complication. Name-title headings with a 240 title are always given in conjunction with a 100, 110, or 111. The presence of $2 in 1XX could be viewed as precluding the necessity of defining $2 for 240. However, it will sometimes be desirable to reconcile or convert the name in 1XX independently of the title, and defining $2 separately for 240 would make this easier.

Source codes

A MARC source code list exists for name and title authorities (https://www.loc.gov/standards/sourcelist/name-title.html) but these codes see limited use, since there is currently no dedicated subfield for them in the MARC 1XX and 7XX fields. If $2 were to be defined for 1XX and 7XX, it would be logical to use the existing MARC source code list for names and titles. However, it may be desirable to expand the existing source code list to include widely used name vocabularies that are not currently listed there, such as ISNI, ORCID, and ULAN. Many of the relevant vocabularies are already present in the MARC list of standard identifier source codes (https://www.loc.gov/standards/sourcelist/standard-identifier.html).

Legacy data

We recognize that there is a large body of legacy data that does not explicitly identify the source vocabulary for names. Making $2 mandatory would invalidate a large number of legacy records, and in many scenarios it is unlikely there will be a reliable way to populate $2 retrospectively. It would also make the use of name fields quite restrictive, perhaps unduly so, since there are often good reasons to capture name data in designated MARC fields without reference to a specific rule set. Defining $2 as a non-mandatory subfield would allow long-established uses of the field to continue without interference, while permitting needed context to be given for name data from newer sources. The absence of $2 from a name field could be interpreted to mean - as it has in the past - that the data follows the relevant agency’s policies for records at the specified encoding level for a given rule set.

Alternative approach

An alternative means of supporting multiple name vocabularies in MARC would be to reserve the 100, 110, 111, 700, 710, and 711 fields for names taken from traditional name authority files already in use and to use a different field for names from non-traditional sources. If this approach were taken, two possibilities could be considered:

Define a new field, or fields, to carry non-traditional names.
Expand the scope of 720, currently for uncontrolled names, to include names authorized by sources other than traditional authority files.

This approach would have the advantage of minimizing disruption to existing records. However, it has a number of significant drawbacks:

It does not address cases where more than one authority file is in use for 100, 110, 111, 700, 710, or 711. Such cases do exist, for example where libraries maintain a local authority file that supplements a national one, or where different name sources are used for certain categories of material, such as dissertations.
If a new field or fields are defined, or the definition of 720 is expanded significantly, it would likely require substantial changes both to indexing in existing systems and to conversion algorithms in order to allow handling of substantially the same kind of data that existing 1XX and 7XX fields already support.
It could inhibit exchange of data between national communities. A GND heading, for example, would be “traditional” in relation to a record in a German catalogue but not in relation to a record in a predominantly English-language catalogue. Would 1XX/7XX fields have to be recoded when exchanged across national boundaries?
Additional granularity would be needed in 720 or any new field or set of fields if existing distinctions among name types (personal, family, corporate body, geographical) are to be preserved.

3. EXAMPLES

The examples below may include, for purposes of illustration, source vocabularies drawn from the MARC Standard Identifier Source Codes list but not currently listed in the MARC source code list for names and titles. The last example illustrates the use of $2 for a series title field. Note that each of the headings shown below is formulated differently in the LC/NACO Authority File, or is not found in the NAF at all.

100 1# $a Ockerbloom, John Mark. $1 https://orcid.org/0000-0001-6568-3357 $2 orcid

100 0# $a Edgar Jones. $1 http://www.wikidata.org/entity/Q2343352 $2 wikidata

700 1# $a Creede, Thomas $d (1593-1617), $e printer. $2 cerl

710 2# $a スタジオジブリ $2 wndla

110 2# $a Iraq Museum, $e issuing body. $0 http://vocab.getty.edu/ulan/500306574 $2 gettyulan

830 #0 $a Serie nuestro mundo (Grupo Anaya) $2 cantic

4. BIBFRAME DISCUSSION

Recording the source vocabulary for names could be useful for reconciling a name with the appropriate source vocabulary during conversion to BIBFRAME and supplying the relevant URI if it is not already in the field.

5. QUESTIONS FOR DISCUSSION

5.1. Is the need to indicate source vocabulary for names sufficiently demonstrated?

5.2. Should $2 be mandatory or optional?

5.3. Should $2 also be defined for title fields? If so, which specific fields should be included?

5.4. Should a new field or fields be defined (or the scope of 720 expanded) for names drawn from vocabularies other than traditional authority files?

HOME >> MARC Development >> Discussion Paper List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
(08/17/2018)

Legal | External Link Disclaimer