The task of relating contents of MARC21 Bibliographic format and the PREMIS Data Dictionary confronts a variety of challenges.
MARC21 was designed primarily as an exchange format for bibliographic data pertaining to a wide variety of materials, a purpose requiring its specification to be understood and applied similarly by users in many libraries. For some data elements – countries, languages, relators, etc. – code lists are provided that provide uniformity of content for several fields. However, most fields accept free text, with issues of intelligible display frequently having priority over machine-processable uniformity (The issue arose, for example, on the MARC Forum in a discussion about copyright jurisdiction in the proposal establishing field 542 (Information Relating to Copyright Status)). Such consistency as is achieved in the free text fields results from application of a set of cataloging rules employed by the majority of MARC21 users. These rules are distinct from the format, and, in an earlier form, antedated and influenced the creation of the MARC format.
In contrast, PREMIS is intended to facilitate the management of preservation metadata primarily within individual digital archives or repositories. Data exchange has been a secondary concern. The specification shows this clearly, allowing each PREMIS instantiation a great deal of latitude in what data are recorded, and in the vocabularies employed. The phrase, “Value should be taken from a controlled vocabulary,” present as a data constraint for many PREMIS semantic units is indicative of this design philosophy, the implication being that the choice of vocabulary may be a local one.
The result of these differences in purpose is that the two schemes focus, in their concern for particularity, on rather different metadata elements. Even when the same metadata particular is represented in the two schemes, it is almost never done in a way that would allow the value to be transferred directly or unambiguously from PREMIS to MARC21, or vice versa. MARC21 is at a disadvantage in the areas on which PREMIS concentrates, because much of the MARC specification was done when preservation of digital resources was a field in its infancy. The PREMIS Data Dictionary embodies an attempt to codify metadata needs in this area where standards and best practices are only emerging. There are no longstanding traditions to guide (or hamper) the effort.
NOTES ON THE ANALYSIS
The present paper examines these two metadata schemes for correspondences in the data which they accommodate. The findings are expressed in a pair of tables that have the same contents arranged differently. Table MARC21 vs. PREMIS is ordered by MARC tag and subfield code; table PREMIS vs. MARC21 follows the order of PREMIS semantic units specified in version 2.0 of the Data Dictionary.
Where feasible, the tables work at MARC subfield and PREMIS subunit levels. Departures from this principle occur for two main reasons.
The tables have been built assuming that it is preferable to identify possible correspondences that may be tenuous, or even erroneous, than inadvertently to omit a connection of importance. Two systematic exceptions from the policy of inclusiveness should be mentioned.
Users of these tables who are more knowledgeable of their and other's PREMIS implementations may be able to eliminate some table entries as universally inappropriate or too rare to be worthy of consideration. For instance, subfield 260$f, manufacturer’s name, is listed as a possible agentName, although the likelihood of that relationship’s being useful seems small.
Certain subfields whose roles in this context are obscure have generally been omitted. Their evaluation will require examination of individual instances.
In general, there are no table entries for semantic units whose names begin with the element “linking,” because their type and value subunits inherit data from similarly named units lacking the “linking” prefix. There are exceptions.
Finally, this analysis examined the question of MARC21 data that a repository might find appropriate to include in its PREMIS database, but that appear to fit only in the “Extension” containers. Sometimes it is unclear whether these marginal data would be appropriate to “Note” subunits. In the end, only the rightsExtension (4.2) appears in the tables; viz. fields 355 and 506.
The process of deciding what kinds of data are expected in any subfield of interest, particularly in free text note fields, was guided by the examples provided in the full online MARC21 Bibliographic Format document. No attempt was made to search for additional examples in library databases.
The investigator is aware of his tendency to think of metadata comparisons of this sort in terms of conversions from one scheme to another. That is a matter more limited than the general question of data overlap, and one that the character of these two schemes renders impracticable. Nevertheless, that mindset has crept occasionally into the language used to talk about the relationships between certain data elements and semantic units. This fact should not seriously affect interpretation of the findings, but it seems wise to mention it.
Prepared for the Library of Congress
by Charles W. Husbands
14 December 2009
MARC 21 HOME >> Bibliographic Format >> Overlaps Between MARC21 >> MARC21 to PREMIS Mapping
|The Library of Congress >> Especially
for Librarians and Archivists >> Standards
( 12/20/2010 )