Background
In this environment of reduced budgets, the Library of Congress (LC) continues to reexamine practices to assure that they continue to offer sufficient benefit for the work, time, and resources devoted to those practices. One area is romanization of data in bibliographic records. This brief document captures the essence of the romanization process and why it merits continued application and support. LC will re-assess romanization practices as systems and underlying technologies change in the future.
Importance of original scripts in bibliographic and authority data
LC collects materials in many languages and scripts. Users who can read those languages and scripts appreciate our providing bibliographic information in those scripts.
Since the 1980's, LC has distributed MARC records that include non-Latin script data in bibliographic records and in references in authority records. Starting with Chinese, Japanese, and Korean, LC now includes support for many of the languages that use the Cyrillic, Greek, Hebrew, and Perso-Arabic scripts, but not yet the full "repertoire" of all scripts covered by the Unicode standard for scripts.
The new cataloging code, RDA: Resource Description and Access instructs catalogers to transcribe data as found (original scripts) but accommodates access under a romanized form. LC expects to implement this new code in early 2013. The Anglo-Amercan Cataloguing Rules does the same.
Why romanized data and for whom?
MARC formatting conventions and US practice put romanized forms in the ”regular” MARC fields, with parallel fields for the original scripts to support systems with the capability to handle one or both scripts. However, most library systems still cannot accept the entire Unicode “repertoire” of characters for all scripts and some still cannot accept any non-Latin scripts.
Romanization is primarily for LC staff and staff at other libraries without language expertise working in:
1. Circulation
2. Acquisitions
3. Serials check-in
4. Shelflisting
5. Shelving
6. Reference
Romanization is also for systems that cannot use non-Latin forms, have support for only some scripts, or require romanized fields for indexing and sorting purposes.
How have we eased the process of providing romanized forms?
LC has implemented automatic transliteration capabilities known as “Transliterator” for the following languages and scripts:
Cyrillic script
1. Bulgarian
2. Belorussian
3. Russian
4. Serbian & Macedonian
5. Ukrainian
Arabic script
1. Arabic
2. Kurdish (not in production)
3. Persian
4. Pushto
5. Urdu
Hebrew script
1. Hebrew
2. Yiddish
CJK scripts
1. Chinese
2. Japanese (not available; generates empty 880 fields)
3. Korean (in development)
Other
1. Classical Greek
2. Chinese Wade-Giles to Pinyin (conversion)
OCLC's Connexion client software includes transliterators for Arabic, Persian, and OCLC members have contributed macros for Cyrillic, Greek, and Hebrew.