The Library of Congress >> Especially
for Librarians and Archivists >> Standards
HOME >> MARC Development >> Discussion Paper List
DATE: May 23, 2024
REVISED:
NAME: Tagging Transliteration Schemes and BCP 47 in Data Provenance Subfields in the MARC 21 Authority and Bibliographic Formats
SOURCE: ALA-LC Romanization Table Review Board, PCC Standing Committee on Standards, ALA Core Committee on Cataloging: Asian and African Materials.
SUMMARY: This paper proposes the addition of category codes for use with MARC data provenance subfields to accommodate transliteration scheme codes and BCP 47 tags in the MARC 21 Authority and Bibliographic Formats,
KEYWORDS: Data provenance (AD, BD); Data provenance category codes (AD, BD); Subfield $7 (Data provenance) (AD, BD); Subfield $e (Data provenance) (AD, BD); Subfield $l (Data provenance) (BD); Subfield $y (Data provenance) (BD); Transcription standard (AD, BD); Transliteration scheme (AD, BD); Best Current Practice 47 (AD, BD); BCP 47 (AD, BD)
RELATED: 2022-05, 2021-DP10, 2021-DP06, 2016-DP26, DP109
STATUS/COMMENTS:
05/23/24 – Made available to the MARC community for discussion.
06/25/24 – Results of MARC Advisory Committee discussion: MAC members' views on the paper were generally divergent regarding the use case presented and the proposed solutions, with many expressing reservations regarding both. Wide ranging points were raised during discussion, although there was general support for addressing the need for field-level encoding, particularly for authority records where variant access points may be recorded using different transliteration schemes. In terms of how $7 (and other subfields by exception) are defined to carry data provenance in the MARC formats, usage of category codes is not mandated. The community can choose to deploy them or not based on local use cases; their utility rests on being able to support a greater degree of machine actionability than would be available without them. The current code values were implemented with the input and needs of constituencies (i.e., the RSC and the German language cataloging community); it would be inadvisable to deprecate them. There was further discussion around the place of transliteration scheme as a prospective element to RDA. A key dynamic is that linked data editors can currently semi-automate the construction of valid BCP 47 tags and their addition to RDF data, greatly reducing the likelihood of error in tag formation and application. It would likely be possible that such tools could also be created for MARC-oriented interfaces. The question of backwards conversion of this tagging from BIBFRAME into MARC may be stretching the limits of current commitment for lossless conversion. Sentiments among the broader MAC constituency seemed to coalesce around the creation of two codes: one specific to BCP 47, and one specific to transliteration scheme codes only. The paper will return as a proposal.
MARC Proposal 2022-05 defined a set of data provenance subfields sharing common characteristics. For most MARC fields, data provenance is recorded in $7, with exceptions for fields where $7 had already been defined. MARC Proposal 2022-05 further defined a set of data provenance category codes, currently:
dpeaa
Data provenance element author agent
dpecou
Data provenance element context of use
dpeloe
Data provenance element language of expression
dpenmw
Data provenance element note on metadata work
dpermw
Data provenance element related manifestation of work
dpertow
Data provenance element related timespan of work
dpes
Data provenance element script
dpesc
Data provenance element source consulted
While these codes allow for the tagging of language and script of a string value in a MARC field or subfield, they currently do not accommodate tagging of a transliteration scheme used to derive a string value from a different script. In addition, the IETF standard BCP (Best Current Practice) 47, which defines a syntax for complex tags which incorporate language, script, transliteration, and other possible attributes of a string value, cannot be accommodated by the current category codes. BCP 47 is currently the only standard permitted in RDF for tagging string values to identify language, script, transliteration, etc.
Data provenance subfields (generally, $7) in MARC encourage the use of a category code along with a value. With regards to tagging language and script in a metadata field or subfield, catalogers can use the category code dpeloe to tag the language of (metadata) expression for a given field or subfield, and the category code dpes to record the script of a given field or subfield.
Currently, there is no support for recording a transliteration scheme used in a MARC field or subfield. This appears to have been a matter of some contention in MARC Proposal 2022-05, which states that "There is also a long standing, currently unimplemented desire on the part of the German community to record the transliteration standard which has been used during the creation of a Latin script string in the Bibliographic Format. From an RDA data provenance perspective, such a standard could be characterized as a 'source consulted.'"
The authors of this discussion paper recognize that the first example provided in MARC Appendix J-Data Provenance Subfields uses the code "dpesc" (source consulted) to cite a transliteration scheme used to generate a romanization of data in a 245 $a. However, the code "dpesc" seems much broader in scope than just transliteration / transcription schemes, and we feel like it would be beneficial to have a dedicated category code for transliteration / transcription scheme. Furthermore, Original RDA defines "source consulted" as "A resource used in determining the name, title, or other identifying attributes of an entity, or in determining the relationship between entities," which does not seem to apply to the ways that transliteration tables are used in cataloging and metadata creation. Furthermore, the definition of the relationship element "source consulted" in Official RDA, "a manifestation in which there is evidence for a metadata work," would not seem to apply at all to a transliteration scheme, which is not intended to provide "evidence."
Official RDA similarly states that a source of metadata "may be a manifestation that is being described or another manifestation. A manifestation that is being described may carry textual content that can be transcribed or otherwise used as a source of information for a metadata work about the manifestation. A manifestation may carry content that can be used as a source of information for a metadata work about any RDA entity." Again, describing a transliteration scheme as "textual content that can be transcribed" or as a "source of information" for transcribed data does not seem to reflect how transliteration schemes are used in cataloging practices. In short, conceptions of "source consulted" in RDA at best seem much broader than the concept of transliteration scheme, and using the category code dpesc to record the transliteration scheme used to transcribe a string might make it difficult to parse the data in a machine-actionable way or otherwise use the transliteration scheme code in a meaningful way.
Transcription standard is defined in the glossary of Official RDA as "A work that provides guidance and instructions for transcribing the value of a metadata statement" and is further discussed in the data provenance guidelines. Specific examples of transcription standards described in Official RDA include Basic transcription and Normalized transcription. The American library community typically follows Normalized transcription in cataloging practice, meaning that we normalize the capitalization of text found on a resource. Transliteration schemes are not described in Official RDA as types of transcription schemes, but they do serve a similar purpose—they describe a method of transcribing data from resources. It is worth noting, however, that in many instances the MARC standard does not align perfectly with RDA, and there are also many examples of MARC incorporating data elements which are outside the scope of RDA or otherwise not covered by the RDA standard.
Bibliographic and authority records can contain romanized data of various types, some romanized according to the expected standard in our environment (the ALA-LC tables) and some not. In a bibliographic record, ALA-LC romanization will normally be "paired" with the original-script data, and in an authority record it will be used in the authorized access point when that is based on an original-script form, but it is not explicitly identified as such. In cases where catalogers want to add romanizations from other transliteration schemes in addition to ALA-LC data, or where they have only non-preferred romanizations available and do not have the expertise to create ALA-LC romanizations themselves, it would be useful for data processing and helpful for future potential upgrades of the description to be able to indicate at a field or subfield level whether the romanization provided was produced by the application of a particular system, or none. In addition, in an environment like OCLC/WorldCat, different communities of practice may use different transliteration schemes for transliterating data. RDA specifically identifies "Different transliteration" as a type of variant title and a type of variant name (see See Original RDA 6.2.3.4, 9.2.3.9, 10.2.3.4, 11.2.3.6, and 16.2.3.7), and also identifies "Different script" as another type of variant title and variant name.
Supporting conversion between different romanization schemes is another possible use case. For example, a European (non-English-speaking) library might collect extensively in the area of Mongolian studies, but the institution's preferred romanization scheme for Mongolian is different from the ALA-LC romanization. If romanized string values in bibliographic records from the European library were explicitly tagged, it would in theory be possible to automate transliteration from that European scheme to another scheme such as ALA-LC. This could aid many libraries which can find cataloging copy in a non-preferred language of cataloging, but which do not have the needed language expertise to do original cataloging and romanization from scratch.
Leveraging on the internationalization of WorldCat, script and transliteration codes could be utilized to identify various romanization schemes used by participating agencies around the world. This would permit us to remodel bibliographic records of resources in non-roman script languages according to ALA-LC Romanization standards. Currently there are already some tools that support the automated conversion, or transliteration between different scripts. One example is the Aksharamukha Script Converter which automates the conversion between different tables with various degrees of success. Similar tools are expected to multiply and to forge ahead through the application of Large Language Models (LLM) based on generative AI. LLM models are currently being tested to support transliteration of non-roman script languages from one transliterating scheme to another. Such tools, paired with language, script, and transliteration codes in MARC, will aid many libraries that do not have the needed language expertise to perform original cataloging and romanization of resources in some non-roman script languages.
The PCC Standing Committee on Standards Task Group on Evaluation Guidelines for Non-Latin Script References in NARs is currently looking at options for tagging preferred variant access points in non-Latin scripts. The task group has been testing ways to label a preferred non-Latin script variant access point and its language(s) and script. This could be done using existing provenance data category codes "dpeloe" and "dpes." While not specifically part of its charge, the group has also considered what would be needed to identify transliterated access points. One possibility would be to establish a new category code, perhaps "dpets", for "Data provenance element transcription standard." This would correspond to the RDA concept of transcription standard. If a more limited definition were needed to specifically identify transliteration schemes, the code could be defined as "Data provenance element transliteration scheme." By recording data provenance subfields for language of expression, script, and transcription standard (or transliteration scheme), catalogers could identify all three pieces of provenance data that apply to the string or text recorded in a particular MARC field or subfield. Examples of this, using MARC language codes, ISO script codes, and MARC transliteration scheme codes, would be:
100 1# $a Asi︠e︡i︠e︡va, Svitlana Antonivna $7 (dpeloe)ukr $7 (dpes)Latn $7 (dpets)ala-lc
[Authorized access point in Ukrainian transliterated into Latin script using ALA-LC romanization]100 1# $a Zhang, Bangju $7 (dpeloe)chi $7 (dpes)Latn $7 (dpets)ala-lc
400 1# $w nne $a Chang, Pang-chü $7 (dpeloe)chi $7 (dpes)Latn $7 (dpets)wade
[Authorized access point in Chinese transliterated into Latin script using ALA-LC romanization.
Variant access point in Chinese transliterated into Latin script using Wade-Giles]
A further use case is the need to support the BCP 47 standard, which is the only permitted standard for tagging the language, script, and transliteration scheme (along with other attributes) of string values in RDF, and therefore in BIBFRAME. BCP 47 defines a syntax for complex tags incorporating attributes for language, script, transliteration scheme, as well as other possible parameters, but since the category codes split language and script into two categories, the current category codes cannot easily accommodate the use of BCP 47. For the sake of clarity, we should note here that BCP 47 does not express anything about the character encoding of a string, for example Unicode/UTF-8, MARC-8, ASCII, etc.
BCP 47 tags can be simple or complex. The most common and basic BCP 47 tags include just a language code. For example, "en" is a valid BCP 47 tag to designate a string as being in the English language. It would be possible to add additional elements to this tag, but BCP 47 prefers a minimal approach. As an example, while one could use the tag "en-Latn" to indicate that a string value is in English and represented in Latin script, this approach is discouraged as English is not di- or polygraphic—the script value is implied by the language tag.
Indicating transliteration is more complex. Although this could in theory be expressed via a shorter tag, the following example illustrates best practices to tag a value of Korean in Latin script transliterated from Korean in Hangul + Han script according to ALA/LC romanization tables: ko-Latn-t-ko-Kore-m0-alaloc.
Another possible way of accommodating standards like BCP 47, which cut across different category codes, would be to develop a syntax for combining category codes in a single data provenance subfield. This could perhaps be as simple as allowing for delimiting multiple category codes with a mark of punctuation, such as - (dash) (see Question 5 below). This could be useful also for data provenance relationship codes, as in some cases a data provenance value may apply to multiple subfields in a given MARC field. In this type of implementation, rather than adding two new data provenance codes, we may be able to add just one for transcription standard / transliteration scheme. It may, however, still be preferable to have a separate code which combines language-script-transliteration scheme, at least for use case of accommodating BCP 47, as the complex BCP 47 syntax includes extension codes which, when taken in isolation, may fall a bit outside of the narrow definition of language, script, and transliteration scheme / transcription standard. In the example cited in the previous paragraph, the portion of the tag beginning with -t- follows the transliteration extension defined for BCP 47, but in addition to information on the transliteration scheme used, it provides additional information on the source languages and scripts. As such, the transliteration portion of the BCP 47 tag is perhaps a bit too complex to fit neatly into a category code narrowly defined to "transliteration scheme" or "transcription standard."
BCP 47 can permit fine-grained tagging of language variants, going beyond what many controlled lists of language codes commonly used in our community can express. For example, regional variants of languages and various dialects can effectively be tagged using BCP 47. However, in the view of the authors of this paper, these sorts of finer-grained language tags are already accommodated by the data provenance category code dpeloe, and of course the broader category code proposed below (dpelst) could also accommodate them.
Tagging language, script, and transliteration is in general straightforward, but there are more complex situations that may arise in which a single subfield may contain multiple languages, scripts, and transliteration schemes, or when a subfield tag is repeated, making it unclear which of the identically tagged subfields a data provenance relationship code is referencing. While managing these cases may be somewhat out of the scope of the current paper, please see example 5 below for a possible method of tagging language, script, and transliteration under such circumstances.
To accommodate the tagging of transliteration schemes used to generate a string value, as well as to accommodate standards like BCP 47, two options could be considered.
Two additional category codes must be added to the Data Provenance category codes to accommodate the tagging of transliteration schemes and standards like BCP 47. Note that BCP 47 tags (including BCP 47 tags with extensions, such as the transliteration extension) cannot be easily broken down into component parts, and likewise it is not easy to assemble complex BCP 47 tags from component parts. As such, we feel strongly that two new codes are needed to support different use cases. The first code, dpets, would support metadata creation workflows which wish only to tag the transliteration scheme used to generate a string value. The second proposed category code, dpelst, seeks to accommodate standards like BCP 47 which may combine codes for language, script, and/or transliteration scheme into a single inseparable tag. The definition for this tag also avoids endorsing a single standard like BCP 47, leaving that up to local communities to decide.
dpets Data provenance element transliteration scheme (or transcription standard)
dpelst Data provenance element language, script, and transliteration (which would accommodate standards like BCP 47)
Since BCP 47 is the only standard possible for tagging language, script, and transliteration in RDF, another possibility would be to create a single new category code dpebcp for tagging language, script, and transliteration of a string. As a reminder, it is possible to use BCP 47 to tag a string with just a language code, with a language and script code, and/or with additional extensions tags such as codes for transliteration schemes, in addition to other possible parameters. As such, it may also be advisable to deprecate or remove the category codes dpeloe and dpes to avoid confusion in applying the codes, since the single new proposed category code would encompass language, script, and transliteration. Although this approach would depart somewhat from RDA's approach, which separates language and script, it would not conflict with RDA's approach. Furthermore, MARC does not always perfectly align with RDA in many other areas, so this approach would not be unprecedented. If it is desirable to avoid close alignment with a specific standard like BCP 47, we could add, instead of dpebcp, the code proposed in scenario 1, dpelst.
dpebcp Data provenance element language, script, transliteration of string / BCP 47 code
dpeloe Data provenance element language of expression [DEPRECATE]
dpes Data provenance element script [DEPRECATE]
Option 1:100 1# $a Navalʹnyĭ, Alekseĭ $7 (dpeloe)rus $7 (dpes)Latn $7 (dpets)ala-lc
[Authorized access point from the LC/NAF tagged with a MARC language code, ISO script code, and MARC transliteration scheme code]Option 2:
100 1# $a Navalʹnyĭ, Alekseĭ $7 (dpebcp)ru-Latn-t-ru-m0-alaloc
[Authorized access point from the LC/NAF tagged with a MARC language code, ISO script code, and MARC transliteration scheme code]
Option 1:245 10 $a Folḳsṭimlekhe geshikhṭn / $c Y.L. Perets ; ḳriṭisher araynfir fun Shmuel Niger. $7 (dpelst)yi-Latn-t-yi-m0-alaloc
[Title and statement of responsibility tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Yiddish using the ALA-LC romanization tables]Option 2:
245 10 $a Folḳsṭimlekhe geshikhṭn / $c Y.L. Perets ; ḳriṭisher araynfir fun Shmuel Niger. $7 (dpebcp)yi-Latn-t-yi-m0-alaloc
[Title and statement of responsibility tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Yiddish using the ALA-LC romanization tables]
Option 1:Example 3a:
130 #0 $a Chi jiao yi sheng cong shu $7 (dpeloe)chi $7 (dpes)Latn $7 (dpets)ala-lc
430 #0 $w nne $a Chʻih chiao i sheng tsʻung shu $7 (dpeloe)chi $7 (dpes)Latn $7 (dpets)wade
430 #0 $a 赤脚医生丛书 $7 (dpeloe)chi $7 (dpes)Hans
[Authorized access point in the LC/NAF tagged with a MARC language code, ISO script code, and MARC transliteration scheme code. First variant access point in Wade-Giles tagged with a MARC language code, ISO script code, and MARC transliteration scheme code. Second variant access point in the vernacular tagged with a MARC language code and ISO script code]Example 3b:
130 #0 $a Chi jiao yi sheng cong shu $7 (dpelst)zh-Latn-t-zh-Hans-m0-alaloc
430 #0 $w nne $a Chʻih chiao i sheng tsʻung shu $7 (dpelst)zh-Latn-t-zh-Hans-m0-wadegile
430 #0 $a 赤脚医生丛书 $7 (dpelst)zh-Hans
[Authorized access point in the LC/NAF tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Chinese in Simplified Chinese script using the ALA-LC romanization tables. First variant access point tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Chinese in Simplified Chinese script using Wade-Giles. Second variant access point tagged with a BCP 47 tag indicating that it is in Chinese in Simplified Chinese script]Option 2:
130 #0 $a Chi jiao yi sheng cong shu $7 (dpebcp)zh-Latn-t-zh-Hans-m0-alaloc
430 #0 $w nne $a Chʻih chiao i sheng tsʻung shu $7 (dpebcp)zh-Latn-t-zh-Hans-m0-wadegile
430 #0 $a 赤脚医生丛书 $7 (dpebcp)zh-Hans
[Authorized access point in the LC/NAF tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Chinese in Simplified Chinese script using the ALA-LC romanization tables. First variant access point tagged with a BCP 47 tag indicating that it was transliterated to Latin script from Chinese in Simplified Chinese script using Wade-Giles. Second variant access point tagged with a BCP 47 tag indicating that it is in Chinese in Simplified Chinese script]
Option 1:Example 4a:
490 1# $a ʼĀʻemād Maṣāheft ; $v no. 11 $7 (dpeloe/dpsfa)amh $7 (dpes/dpsfa)Latn $7 (dpets/dpsfa)ala-lc
[Series statement recorded in subfield $a is in Amharic transliterated into Latin according to the ALA-LC romanization tables; MARC language and transliteration codes and ISO script codes used]Example 4b:
490 1# $a ʼĀʻemād Maṣāheft ; $v no. 11 $7 (dpelst/dpsfa)am-Latn-t-am-Ethi-m0-alaloc
[Title of series is in Amharic transliterated into Latin from Ethiopic script according to the ALA-LC romanization tables using a BCP 47 tag]Option 2:
490 1# $a ʼĀʻemād Maṣāheft ; $v no. 11 $7 (dpebcp/dpsfa)am-Latn-t-am-Ethi-m0-alaloc
[Title of series is in Amharic transliterated into Latin from Ethiopic script according to the ALA-LC romanization tables using a BCP 47 tag]
Option 1:490 1# $a Sifriyat ha-Yahadut ha-ḥilonit = $a Library of secular Judaism $7 (dpelst/dpsfa)he-Latn-t-he-Hebr-m0-alaloc $7 (dpenmw)Applies to first subfield $a $7 (dpelst/dpsfa)en $7 (dpenmw)Applies to second $a
[Title of series (in the first subfield $a) is in Hebrew transliterated into Latin according to the ALA-LC romanization tables. Parallel title of series (in the second subfield $a) is in English]Option 2:
490 1# $a Sifriyat ha-Yahadut ha-ḥilonit = $a Library of secular Judaism $7 (dpebcp/dpsfa)he-Latn-t-he-Hebr-m0-alaloc $7 (dpenmw)Applies to first subfield $a $7 (dpebcp/dpsfa)en $7 (dpenmw)Applies to second $a
[Title of series (in the first subfield $a) is in Hebrew transliterated into Latin according to the ALA-LC romanization tables. Parallel title of series (in the second subfield $a) is in English]
Option 1:245 10 $a Yahadut le-lo El : $b Yahadut ke-tarbut ṿe-Tanakh ke-sifrut = Judaism without God : Judaism as culture and Bible as literature / $c Yaʻaḳov Malkin. $7 (dpelst/dpsfa/dpsfb/dpsfc)he-Latn-t-he-m0-alaloc $7 (dpenmw)Applies to subfields $a and $c and “Yahadut ke-tarbut ṿe-Tanakh ke-sifrut” in subfield $b $7 (dpelst/dpsfb)en $7 (dpenmw)Applies to “Judaism without God : Judaism as culture and Bible as literature” in subfield $b
[Title proper and other title information and statement of responsibility relating to title proper is in Hebrew transliterated into Latin according to the ALA-LC romanization tables. Parallel title proper and parallel other title information is in English]Option 2:
245 10 $a Yahadut le-lo El : $b Yahadut ke-tarbut ṿe-Tanakh ke-sifrut = Judaism without God : Judaism as culture and Bible as literature / $c Yaʻaḳov Malkin. $7 (dpebcp/dpsfa/dpsfb/dpsfc)he-Latn-t-he-m0-alaloc $7 (dpenmw)Applies to subfields $a and $c and “Yahadut ke-tarbut ṿe-Tanakh ke-sifrut” in subfield $b $7 (dpebcp/dpsfb)en $7 (dpenmw)Applies to “Judaism without God : Judaism as culture and Bible as literature” in subfield $b
[Title proper and other title information and statement of responsibility relating to title proper is in Hebrew transliterated into Latin according to the ALA-LC romanization tables. Parallel title proper and parallel other title information is in English]
Option 1:245 00 $6 880-01 $a Nenreibetsu jinkō idō tōkei to idō patān : $b Nihon, Kankoku, Tai ni okeru Rojāsu moderu no tekiyō / $c Ajia Keizai Kenkyūjo Tōkei Chōsabu = Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies. $7 (dpelst/dpsfa/dpsfb/dpsfc)ja-Latn-t-ja-m0-alaloc
$7 (dpenmw)Applies to subfields $a, $b, and “Ajia Keizai Kenkyūjo Tōkei Chōsabu” in subfield $c $7 (dpelst/dpsfc)en $7 (dpenmw)Applies to “Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies” in subfield $c
880 00 $6 245-01/$1 $a 年齢別人口移動統計と移動パターン : $b 日本、 韓国、 タイにおけるロジャースモデルの適用 / $c アジア経済研究所統計調查部 = Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies.
[Title proper, other title information, and statement of responsibility relating to title proper is in Japanese transliterated into Latin from Kanji, Katakana, and Hiragana scripts according to the ALA-LC romanization tables. Parallel title proper, parallel other title information, and parallel statement of responsibility relating to title proper is in English]Option 2:
245 00 $6 880-01 $a Nenreibetsu jinkō idō tōkei to idō patān : $b Nihon, Kankoku, Tai ni okeru Rojāsu moderu no tekiyō / $c Ajia Keizai Kenkyūjo Tōkei Chōsabu = Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies. $7 (dpebcp/dpsfa/dpsfb/dpsfc)ja-Latn-t-ja-m0-alaloc
$7 (dpenmw)Applies to subfields $a, $b, and “Ajia Keizai Kenkyūjo Tōkei Chōsabu” in subfield $c $7 (dpebcp/dpsfc)en $7 (dpenmw)Applies to “Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies” in subfield $c
880 00 $6 245-01/$1 $a 年齢別人口移動統計と移動パターン : $b 日本、 韓国、 タイにおけるロジャースモデルの適用 / $c アジア経済研究所統計調查部 = Migration rates by age group and migration patterns : application of Roger's migration schedule model to Japan, the Republic of Korea and Thailand / Statistical Research Department, Institute of Developing Economies.
[Title proper, other title information, and statement of responsibility relating to title proper is in Japanese transliterated into Latin from Kanji, Katakana, and Hiragana scripts according to the ALA-LC romanization tables. Parallel title proper, parallel other title information, and parallel statement of responsibility relating to title proper is in English]
BIBFRAME has not treated the $7 yet. This is in part because the $7 is relatively new and sample data are rare, and in part because each $7 code potentially needs specific handling rules, which increases the level of effort and requires more time to map. That said, for something like a BCP 47 code, the rule is relatively simple and straightforward. Since RDF natively supports BCP 47, BIBFRAME should be able to readily make use of BCP 47 data when describing the language, script, and/or transliteration scheme used for literals. "Should" was used because some constructs in the examples of this paper introduce considerable variation, which can impede implementation and foster ambiguity, when it comes to $7 handling.
6.1. Which of the options of proposed changes is preferable, Option 1 or Option 2, and why? If adopting Option 2, would it be necessary to deprecate the existing category codes dpeloe and dpes? For either scenario, would it be preferable to use the standard-agnostic category code dpelst or the category code dpebcp, which references a particular standard BCP 47?
6.2. Do the proposed new category codes adequately accommodate use cases for tagging transliteration schemes used to generate string values?
6.3. Which is preferable as a label for proposed code "dpets": "Data provenance element transliteration scheme" or "Data provenance element transcription standard"? Should the proposed code dpets represent the broader concept of "Transcription standard" as defined in Official RDA? Or should we opt instead for the narrower category definition of "Transliteration scheme" which matches the use cases outlined in this discussion paper? As a third option, should we establish separate category codes for "transcription standard" and "transliteration scheme" to avoid possible ambiguity?
6.4. Are there other use cases for tagging transliteration schemes via the data provenance subfields that should be taken into consideration?
6.5. Examples 5-7 illustrate problems with using provenance relationship codes when the provenance data does not apply to all of the text in a particular subfield. What is the best solution to deal with this? The examples illustrate one possible way using provenance category code "dpenmw" (Data provenance element note on metadata work), but is this an acceptable solution? The order of the subfield $7's is crucial if this method is used. Is there a better way?
6.6. Would it be useful to propose a syntax for combining data provenance category codes and data provenance relationship codes, so that they could be concatenated? Currently, the guidance on the MARC Data Provenance Subfields page states that "If the value is preceded by both a MARC Data Provenance Category code and a MARC Data Provenance Relationship code, then these are separated by a forward slash and enclosed by parentheses." However, this leaves some ambiguity—is the forward slash intended only to separate data provenance category codes from data provenance relationship codes, or can a forward slash be used also to delimit multiple instances of data provenance category codes (or data provenance relationship codes) from each other? For the sake of example, let's reformulate Example 6 using a dash (-) to separate distinct data provenance category codes and data provenance relationship codes, reserving the forward slash as a separator between the data provenance category codes and the data provenance relationship codes:
245 10 $a Yahadut le-lo El : $b Yahadut ke-tarbut ṿe-Tanakh ke-sifrut = Judaism without God : Judaism as culture and Bible as literature / $c Yaʻaḳov Malkin. $7 (dpeloe-dpes-dpets/dpsfa-dpsfb-dpsfc)he-Latn-t-he-m0-alaloc $7 (dpenmw)Applies to subfields $a and $c and “Yahadut ke-tarbut ṿe-Tanakh ke-sifrut” in subfield $b $7 (dpeloe-dpes-dpets/dpsfb)en $7 (dpenmw)Applies to “Judaism without God : Judaism as culture and Bible as literature” in subfield $b
[Title proper and other title information and statement of responsibility relating to title proper is in Hebrew transliterated into Latin according to the ALA-LC romanization tables. Parallel title proper and parallel other title information is in English]
Does the syntax shown in the example above make sense? Does the current format already allow the use of multiple codes separated by a slash? Or would it be better to add instructions on how to combine category codes and/or relationship codes?
HOME >> MARC Development >> Discussion Paper List
The Library of Congress >> Especially
for Librarians and Archivists >> Standards (10/31/2024) |
Legal | External Link Disclaimer | Contact Us |