PROPOSAL NO.: 2001-06

DATE: May 7, 2001
REVISED:

NAME: Accommodating Non-MARC Language Codes in Field 041 of the Bibliographic and Community Information Formats

SOURCE: Library of Congress; OCLC CORC

SUMMARY: This paper proposes changes to accommodate non-MARC language codes in field 041 (Language code). Changing the repeatability of both field 041 and subfields $a-$g is suggested. Defining the second indicator position (Source of code) and subfield $2 (Source of code) is also proposed to identify the source of the language codes used.

KEYWORDS: Field 041 (BD)(CI); Subfield $2, in field 041 (BD)(CI); Language codes (BD)(CI); Non-MARC language codes (BD)(CI); Source of code (BD)(CI)

RELATED: DP 2001-DP02

STATUS/COMMENTS:

05/07/01 - Made available to the MARC 21 community for discussion.

06/16/01 - Results of the MARC Advisory Committee discussion - Approved.
The practice of stacking codes will be considered obsolete rather than eliminated.

08/07/01 - Results of LC/NLC review - Approved.


PROPOSAL NO. 2001-06: Accommodating Non-MARC Language Codes in Field 041

1. BACKGROUND

Field 041 (Language code) contains three-character MARC alphabetic codes for languages associated with an item when field 008/35-37 (Language) is insufficient to convey full information for a multilingual item or an item that involves translation. The source of the language codes is the MARC Code List for Languages. Several non-repeatable subfields in 041 are used to designate other language aspects, such as language in summaries (subfield $b), or in sung or spoken text (subfield $d). Use of multiple MARC language codes in all of the subfields in field 041 is accommodated by "stacking" them in their appropriate subfields. For example, a text in original Greek with an English translation would be coded as, "enggrc," by stacking the code for English (eng) onto the code for Greek (grc).

Field 041 is currently defined as:

First Indicator - Translation indication

0 - Item not a translation/does not include a translation
1 - Item is or includes a translation.

Second Indicator - Undefined

Subfield Codes

$a - Language code of text/sound track or separate title (NR)
$b - Language code of summary or abstract/overprinted title or subtitle (NR)
$d - Language code of sung or spoken text (NR)
$e - Language code of librettos (NR)
$f - Language code of table of contents (NR)
$g - Language code of accompanying material other than librettos (NR)
$h - Language code of original and/or intermediate translations of text (R)
$6 - Linkage (NR)
$8 - Field link and sequence number (R)

Because of the increasing use of diverse metadata standards for description of information resources, it is desirable to provide for types of standard language codes in field 041. This would facilitate the conversion from various metadata standards to the MARC 21 format and would thus, enhance libraries' participation in the organization of online web products, e-books and other media that may use other metadata standards. Allowing field 041 to include non-MARC language codes would also assist in the increasing internationalization of the MARC 21 formats by allowing for the use of other language code schemes in MARC 21 records.

2. DISCUSSION

2.1 Repeatability in field 041

The discussion of DP2001-02 resulted in a consensus that it would be useful to accommodate non-MARC language codes such as ISO 639-1 (Codes for the representation of names of languages-- Part 1: alpha-2 code) or ISO 639-2 (Codes for the representation of names of languages-- Part 2: alpha-3 code) with the addition of a country code. Because of the possibility of using different language code schemes in one record and the need for library systems to differentiate between these code schemes for processing purposes, it is proposed that field 041 be made repeatable to allow for different language code schemes to be coded in separate occurrences of the field.

2.2 Definition of the second indicator

To record the language code scheme used in field 041, it is proposed that the second indicator be defined as Source of Code with values blank # (MARC language code) and 7 (Source specified in subfield $2) to record the code scheme used. These indicator values would assist in the use of older records using stacked codes and also help with sorting by identifying the types of language codes used in field 041.

The indicator position could be defined as follows:

Second Indicator - Source of Code

The second indicator position contains the source of the language code used in the field.

# - MARC language code
7 - Source specified in subfield $2

2.3 Definition of subfield $2

To indicate the scheme of the language code used in field 041, it is proposed that a non-repeatable subfield $2 be defined as:

$2 - Source of code

Subfield $2 contains a code that identifies the source of the language code scheme used in the field. The source of the code is the MARC Code Lists for Relators, Sources, Description Conventions, that is maintained by the Library of Congress. If a non-MARC code is used to express the predominate language in an item, field 008/35-37 is coded with three fill characters (| | |).

If more than one code scheme is used in a record, repeat the field.

2.4 Unstacking Codes in Field 041

Although field 041 has sufficiently expressed the various languages found in items using the MARC language codes, problems arise when one considers coding it using non-MARC language codes. The practice of "stacking" non-MARC language codes could make it difficult for systems to adequately parse the codes because of their varying lengths (even assuming that all codes in a given field occurrence are from the same list). ISO 639-1 codes are two characters long and both ISO 639-1 and 639-2 standards allow for the addition of ISO country codes to distinguish national linguistic differences, resulting in five or six characters which may make parsing non-MARC language codes possibly even more difficult.

In January 2001, Discussion Paper DP2001-02 discussed four alternatives for using non-MARC language codes in field 041. The consensus felt that unstacking the codes was the best approach for it would allow systems to accurately handle language codes of variable lengths with relative ease. It may also improve system processing of the language codes. For example, because many systems have been able to index only the first code in a group of stacked codes, unstacking them may assist systems in more fully indexing them. Moreover, systems would not need to parse the codes to interpret them, as many are currently doing now with the stacked codes. It is therefore proposed to eliminate the practice of stacking language codes in field 041.

To indicate multiple languages in an item, it is also proposed that subfields $a - $g be changed to repeatable (subfield $h is already defined as repeatable).

2.4.1 Disadvantages of Unstacking Language Codes

Readability
It was noted in the January MARC Advisory Committee discussion that unstacked codes in complex records may be more difficult to visually read, which may prevent catalogers from coding the field under complex situations (This would, of course, be partially dependent on the way a system presented the field for input).

The following examples illustrate field 041 coded with both stacked and unstacked language codes.

Stacked

041 0# $dengfregerrus$eengfregerrus$hengfregerrus$gengfreger$heng

Unstacked

041 0# $deng$dfre$dger$drus$eeng$efre$eger$erus$heng$hfre$hger$hrus$geng $gfre$gger$heng

However, because the same criticism over readability has also been made for stacked codes, and because unstacking the codes would facilitate greater system functionality using the language codes, the readability of unstacked codes does not seem be a large problem at this time.

Because subfield $h (Language code of original and/or intermediate translations of text) is often repeated to follow its related subfield $a, $d, $e, or $g, unstacking codes may make this meaningful placement ambiguous and not interpretable to systems. For example:

041 0# $dglgpro$eengfreglgpro$hglgpro$gengfre$heng

To achieve the same interpretation as with the stacked codes, the $h subfields would need to be considered "applicable" to all of the preceding subfields of the same type, e.g., all of the preceding subfields $e and $g.

Legacy Records
Unstacking codes would result in a lack of consistency in existing records containing stacked codes in bibliographic databases. Depending on how stacked codes were treated, these records could cause indexing difficulties. Either all 041 fields would need to be changed or vendors would need to handle both.

3. EXAMPLES

008/35-37 |||
041 07 $aen$afr$ait$2 [Code for ISO 639-1]
  [All of the codes come from the ISO 639-1 standard]

008/35-37 eng
041 0# $aeng$afre$aita$bfre$bita
  [The code comes from the MARC Code List for Languages]

008/35-37 eng
041 0# $aeng$afre
041 07 $aen$afr$2 [Code for ISO 639-1]
      [Two language code schemes are used and field 041 is repeated]

4. PROPOSED CHANGES

In the MARC 21 Bibliographic and Community Information formats:


Go to:


Library of Congress Library of Congress
Library of Congress Help Desk (08/07/01)