DISCUSSION PAPER NO.: 2004-DP04

DATE: May 24, 2004
REVISED:

NAME: Use of ISBNs and LCCNs in MARC 21 Bibliographic Records

SOURCE: The MARC of Quality; Karen Anspach Consulting

SUMMARY: This paper discusses the use of ISBNs and LCCNs in systems and the problems when ISBNs or LCCNs is recorded in the bibliographic record for a manifestation other than the one being cataloged. It suggests the possibility of defining a new subfield $y for inappropriate ISBN/LCCN in fields 020 and 010 respectively or expanding the definition of subfield $z to allow for the recording of these numbers.

KEYWORDS: Field 020 (BD,HD); Field 010 (BD, HD); International Standard Book Number; Library of Congress Control Number

RELATED:

STATUS/COMMENTS:

5/24/04 – Made available for the MARC 21 community for discussion.

06/27/04 - Results of the MARC Advisory Committee discussion - Participants discussed whether a new subfield should be defined in fields 010 and 020 to record "inappropriate" LCCN and ISBN numbers. Some participants felt that a new subfield was not necessary, however, they did maintain that more specific instructions on how to code incorrect numbers were needed in the formats. After a straw poll of the participants was conducted, it was decided that Option 1 (Addition of a new MARC 21 subfield to identify LCCNs and ISBNs that do not relate to the manifestation being described) should come back as a proposal for the 2005 midwinter meeting.


Discussion Paper 2004-DP04: Use of ISBNs and LCCNs in MARC 21 Bibliographic Records

1. BACKGROUND

The ISBN and LCCN are the most universally available identifiers for systems that manage and share bibliographic records. Because these numbers are ostensibly intended to uniquely identify specific manifestations of a work, they are used for a wide variety of automated processes to detect matching records in both bibliographic and non-bibliographic databases.

Systems and processes using ISBNs and/or LCCNs as the primary identifier include:

In the absence of a single unique identifier for bibliographic records, ISBNs and LCCNs are relied upon in a wide variety of situations for the matching of bibliographic records. The fact that this matching depends more and more on automated processes that have few crosschecks and/or little human intervention makes the accuracy and use of these numbers critical.

2. DISCUSSION

2.1. Problems with the Current Handling of ISBNs and LCCNs

If ISBNs and LCCNs are to perform the above tasks properly, they must be entered accurately in records. Unfortunately, neither ISBNs nor LCCNs are completely dependable as unique identifiers, and they are becoming increasingly undependable. We are all aware of existing problems resulting from ISBNs and LCCNs inappropriately assigned to different resources by publishers who mistakenly use a number supplied to them for one resource for another resource that is different enough to merit a separate bibliographic record. There may be little we can do to change publisher practice in this regard. Other common problems, however, are caused by inconsistent handling of LCCNs and ISBNs when they are entered in MARC 21 records. This issue is something we can address and resolve with the addition of a new subfield to the fields used for these numbers.

Problems with ISBN and LCCN assignment include:

2.2. ISBN

The AACR2 cataloging rules call for including an ISBN for the item being described. They provide for optionally including other numbers, with appropriate qualification. This provision has been extended by many cataloging agencies to including ISBNs for manifestations other than the one being described. Currently, the MARC 21 Format for Bibliographic Data does not provide a means of distinguishing between ISBNs that are appropriate to the manifestation being described and those that are not but are included in the record for the manifestation being described.

The 020 subfield $z is defined as an 'Invalid ISBN', but the description of this subfield in the format does not make it clear whether or not an ISBN for a manifestation other than than the one being described can be considered 'invalid' for the current record when it is valid for another record. Because of the lack of clear instructions, current practice is inconsistent and some catalogers use 020$z (invalid ISBN), while others use 020$a (valid ISBN) for ISBNs for other manifestations that are recorded in a record for the manifestation being described.

However, if an ISBN for another manifestation(s) is entered in the 020$a in a bibliographic record, automated processes may consider that record identical to other records that contain the same number in an 020$a. This can result in improperly matched records and the unfortunate overlay of one bibliographic record with a record for a different resource. In this case, one of the records is lost from the database, and its holdings are incorrectly linked to whatever record is retained. Such improper overlaying, which is unfortunately quite common, affects cataloging, circulation, and public access.

2.3. Option 1: Addition of a new MARC 21 subfield to identify ISBNs that do not relate to the manifestation being described (020$y)

One solution for this problem is to define a new subfield ($y) for the 020 field to be used for ISBNs for manifestations other than the one being described. This new subfield will generally parallel the subfield $y (incorrect ISSN) used in the 022 field for entering incorrect ISSNs. The definition for 020$a could be clarified to restrict the use of the 020$a subfield to only those ISBNs that are valid and appropriate to the manifestation that is described by the bibliographic record that contains multiple 020 fields. Note that in the case of ISSN, subfield $y is used for those ISSN that are incorrect for a variety of reasons, including inappropriate to the particular serial being cataloged, as well as invalid. (In 022, subfield $z is defined only as “cancelled ISSN”, and does not include invalid ones.)

Library automation systems would then be able to index and search on 020$y data for access and retrieval, but exclude this subfield when performing processes like the matching and deduping described above (i.e., an ISBN recorded in 020$y should be disregarded by an automated matching processes).

2.3.1. Current definition of 020$a:
Subfield $a contains a valid ISBN for the item. Parenthetical qualifying information, such as the publisher/distributor, binding/format, and volume numbers, is not separately coded.

2.3.2 Proposed definition of 020$a
Subfield $a contains a valid ISBN (the length, structure, and check digit are correct) that is appropriate to the manifestation being described. Parenthetical qualifying information, such as the publisher/distributor, binding/format, and volume numbers, is not separately subfielded. A single record can contain more than one ISBN appropriate to the manifestation being described, e.g., an ISBN appropriate to the set and ones appropriate to the volumes in the set, which may or may not be represented by separate records.

2.3.3. Proposed 020$y definition:
Subfield $y contains a valid ISBN (the length, structure, and check digit are correct) that is not appropriate to the manifestation being described. Parenthetical qualifying information, such as the publisher/distributor, binding/format, and volume numbers, is not separately subfielded. Each ISBN appropriate to a different manifestation is contained in a separate (repeatable) subfield $y or a separate 020 field (this needs to be determined). If no ISBN appropriate to the manifestation being described is available, subfield $y may be used alone in a 020 field. ISBN (other manifestation) and the embedded hyphens may be generated.

An ISBN is considered to be inappropriate to a particular manifestation when it appears on a resource but it is known, through research or other means, that that same number is also assigned to a different resource.

Because 020 subfield $y contains ISBNs that might be searched, this subfield should be indexed and displayed. However, as the same ISBN is likely to appear in one record in subfield $a, and in a different record in subfield $y, subfield $y should not be used by record-matching processes.

2.4. Option 2: Expansion of MARC 21 subfield 020$z for inappropriately assigned ISBN

An alternate solution for the matching problem would be to expand the definition for 020$z to explicitly state that subfield $z could also be used for ISBNs not appropriate to the manifestation being described. This solution would also require that the use of the 020$a be restricted to valid (the length, structure, and check digit are correct) ISBNs that are appropriate to a particular manifestation being described Library automation systems would then know that 020$z should be searchable for access and retrieval, but that 020 $z should not be used for record-matching processes.

2.4.1. Current 020$z definition:
Subfield $z contains a canceled or invalid ISBN and any parenthetical qualifying information. Each canceled/invalid ISBN is contained in a separate subfield $z. If no valid ISBN exists, subfield $z may be used alone in the record. ISBN (invalid) and the embedded hyphens may be generated.

2.4.2. Proposed change to the 020$z definition:
Subfield $z contains a canceled or invalid (length, structure, or check digit is incorrect) ISBN or an ISBN for a manifestation other than the one being described and any parenthetical qualifying information. Each canceled/invalid or other ISBN is contained in a separate (repeatable) subfield $z. If no valid/appropriate ISBN exists, subfield $z may be used alone in the 020 field. ISBN (other/invalid) and the embedded hyphens may be generated.

An ISBN is considered to be canceled when a publisher designates it as such.

An ISBN is considered to be invalid when its length or structure is incorrect or its check digit does not agree with the formula for calculating such.

An ISBN is considered to be inappropriate to a particular manifestation when it appears on a resource but it is known, through research or other means, that that same number is also assigned to a different resource.

Because 020 subfield $z contains ISBNs that might be searched, this subfield should be indexed and displayed. However, as the same ISBN is likely to appear in one record in subfield $a, and in a different record in subfield $z, subfield $z should not be used by record-matching processes.

2.5. LCCN

LC assigns LCCNs to manifestations judged to receive separate bibliographic records according to the guidelines developed for making this determination. When publishers secure an LCCN through LC’s Preassigned Control Number program for a particular manifestation, they sometimes print that same LCCN in a subsequent manifestation that requires a separate record. (The Preassigned Control Number Program enables LC to assign LCCNs prior to publication in order to facilitate cataloging and other book processing activities when the publisher prints the control number in the book.) As with ISBNs, the MARC 21 Format for Bibliographic Data does not provide a means of distinguishing between LCCNs that are appropriate to the manifestation being described and those that are not (LCCNs appearing on resources that do not match the records to which the LCCNs were originally assigned). Currently the 010$z is listed as used for invalid LCCN, but the instructions provided do not make it clear whether or not an LCCN not appropriate to the manifestation being described can be considered 'invalid' in a record which does not match the record to which LC originally assigned that LCCN.

Because of the lack of clear instructions, current practice is inconsistent and some catalogers use 010$z while others use 010$a for LCCNs for other manifestations that are recorded in a record for the manifestation being described. However, if an LCCN for another manifestation is entered in the 010$a in a bibliographic record, automated processes will consider that record identical to other records that contain the same number in an 010$a. This can result in improperly matched records and the unfortunate overlay of one bibliographic record with a record for a different resource. In this case, one of the records is lost from the database, and its holdings are incorrectly linked to whatever record is retained. Such improper overlaying, which is unfortunately quite common, affects cataloging, circulation, and public access.

2.6. Option 1: Addition of a new MARC 21 subfield 010$y for inappropriately assigned LCCNs

As with the ISBN, a solution for this problem is to define a new subfield ($y) for the 010 field to be used for LCCNs for manifestations other than the one being described. This new subfield will parallel the subfield $y used in the 022 field for entering incorrect ISSNs. The definition for 010$a could be clarified to restrict the use of the 010$a subfield only to those LCCNs that are valid for one specific record.

Library automation systems would then be able to index and search on 010$y data for access and retrieval, but exclude this subfield when performing processes like the matching and deduping described above (i.e., an LCCN recorded in 010$y should be disregarded by an automated matching processes).

2.6.1.Current definition of 010$a:
Subfield $a contains a valid LC control number (see explanation of structure of this number

2.6.2. Proposed definition of 010$a
Subfield $a contains a valid LC control number (see explanation of structure of this number given below) assigned to the manifestation being described. A single record cannot contain more than one valid/appropriate LCCN and an LCCN cannot be valid/appropriate for more than one different record.

2.6.3. Proposed 010$y definition:
Subfield $y contains a valid LCCN that is not appropriate to the manifestation being described. Each LCCN appropriate to a different manifestation is contained in a separate (repeatable) subfield $y. If no valid LCCN appropriate to the manifestation being described is available, subfield $y may be used alone in an 010 field. LCCN (other manifestation) and the embedded hyphens may be generated.

An LCCN is considered to be inappropriate to a particular manifestation when it appears on a resource or in a bibliographic record that does not match the manifestation to which the LCCN was originally assigned, or if it retrieves multiple non-LC records when there is no LC record for the LCCN.

Because 010 subfield $y contains LCCNs that might be searched, this subfield should be indexed and displayed. However, as the same LCCN is likely to appear in one record in subfield $a, and in a different record in subfield $y, subfield $y should not be used by record-matching processes.

2.7. Option 2: Expansion of MARC 21 subfield 010$z for inappropriately assigned LCCN

An alternate solution for this problem would be to expand the definition for 010$z to explicitly state that this subfield can also be used for LCCNs not appropriate to the manifestation being described, restricting the use of the 010$a to LCCNs that are valid/appropriate for the specific manifestations to which they were originally assigned. Library automation systems would then know that 010$z should be searchable for access and retrieval, but should not be used by record-matching processes.

2.7.1. Current 010$z definition:
Subfield $z contains a canceled or invalid LC control number, including invalid NUCMC numbers.

2.7.2. Proposed change to the 010$z definition:
Subfield $z contains a canceled or invalid (length or structure incorrect) LCCN or an LCCN for a manifestation other than the one to which an LCCN was originally assigned. This subfield also includes invalid NUCMC numbers. Each canceled/invalid or inappropriate LCCN is contained in a separate (repeatable) subfield $z. If no valid/appropriate LCCN exists for a resource, subfield $z may be used alone in the 010 field. LCCN (inappropriate/invalid) and the embedded hyphens may be generated.

An LCCN is considered to be canceled when LC designates it as such.

An LCCN is considered to be invalid when when its length or structure are incorrect.

An LCCN is considered to be inappropriate to a particular manifestation when it appears on a resource or in a bibliographic record that does not match the manifestation to which the LCCN was originally assigned, or if it retrieves multiple non-LC records when there is no LC record for the LCCN.

Because 010 subfield $z contains LCCNs that might be searched, this subfield should be indexed and displayed. However, as the same LCCN is likely to appear in one record in subfield $a, and in a different record in subfield $z, subfield $z should not be used by record-matching processes.

Note that whether it is necessary to develop a means of indicating LCCNs assigned to one manifestation appearing in another manifestation is open to question.

2.8. Application.

Note that the options above assume that the cataloger will know which ISBN/LCCN do not relate to the manifestation in hand. This may not always be the case. For example, if the publisher has assigned the same ISBN to the first and second editions or reuses an ISBN, the cataloger will record the ISBN(s) on the piece and will not stop to check their validity for the item in hand.

For the Library of Congress Cataloging-in-Publication program, it may not be possible to determine which copy will be acquired at the stage of preparing the record; which to record in 020$y versus $a may be a problem that LC will have to deal with. In addition, catalogers may not be able to recognize when the 010 is incorrect.

3. CONCLUSIONS

It would be convenient to believe that cataloging is now a simple matter of entering a number into a source of records and retrieving a matching record with little if any human intervention. Unfortunately, this is sometimes not the case for current copy cataloging and machine matching processes.

The fact that publishers do not assign ISBNs correctly is outside of the scope of MARBI, but MARBI can address the problem created by the recording in bibliographic records of ISBNs and LCCNs for manifestations other than the one represented by that record. Providing a new subfield for the handling of these other LCCNs and ISBNs, or expanding the definition of the existing subfield for invalid numbers to cover this situation will be a significant step forward in improving the accuracy of record-matching processes, whether automated or manual, in any system utilizing MARC 21 records.

The changes proposed will not ‘fix’ ISBNs for other manifestations present in the 020$a of existing records. They will, however, provide catalogers and system vendors with a consistent method for handling this problem for new records, and for any current database cleanup efforts.


Go to:


Library of Congress Library of Congress
Library of Congress Help Desk ( 11/04/2004 )