DISCUSSION PAPER NO. 110

DATE: May 1, 1998
REVISED:

NAME: Enhancement of Computer file 007 in the USMARC Bibliographic/Holdings Formats

SOURCE: The Research Libraries Group, Inc. (RLG)

SUMMARY: This paper discusses the enhancement and expansion of the Computer File 007 values to accommodate better retrieval and management of digitally reformatted and preserved materials. It suggests slight changes to the existing six bytes to make them more inclusive, and the addition of seven new optional bytes which specifically address the needs of digitally reformatted materials.

KEYWORDS: Field 007 (Computer file) (BD, HD)

RELATED:

STATUS/COMMENTS:

5/1/98 - Forwarded to the USMARC Advisory Group for discussion at the June 1998 MARBI meetings.

6/28/98 - Results of USMARC Advisory Group discussion -
RLG elaborated further on the need for coded information about digitally reformatted material. This information will be applicable to both digitally reformatted and digitally preserved items. RLG would be open to defining a separate 007 for this information, as suggested in the paper. Other specific comments were:

LC and RLG will work together to finalize a proposal for the next meeting. LC will send a message to the USMARC list asking whether people have strong feelings about using the existing computer file 007 or defining a new 007 for preservation computer files. An RLG working group is now looking at field 583 (Action Note) to determine its relevance; this discussion should be included in a revised paper on this issue.


DISCUSSION PAPER NO. 110: Enhancement of Computer File 007

1 BACKGROUND

The Research Libraries Group is proposing new values in the existing computer file 007 field to accommodate digitally reformatted materials. These values are considered essential to communicate important preservation reformatting information; incorporating these values in an 007 field can accommodate practice analogous to that currently used to catalog microform masters as well as changes that may result from evolving cataloging practices. Inclusion of an USMARC 007 field incorporating these values is essential for any record that describes an item digitally reformatted for preservation purposes, whether or not the item is described on the same record as the original, on a record for the original and other manifestations, or on a separate record. This proposal addresses only the values required, not the cataloging structure in which they are used. The information recorded in the new set of 007 values will accommodate better retrieval and management of digitally reformatted materials, and help guide decisions to digitize materials for preservation purposes.

Although some information about digital reformatting may be carried in the digital file header, one cannot guarantee that the header will be consistently retained with the file. The catalog record serves as a permanent home for this important information.

Using coded values in an 007 field has advantages over a variable length field. An 007 can be recorded by preservation as well as cataloging staff since they do not require knowledge of cataloging rules. Coded values can be more easily utilized for machine sorting.

RLG needs a provision in USMARC for coding digital preservation physical aspects as soon as possible. The lack of such coding is currently an obstacle to adequately describing and providing access to items which have been digitally reformatted and to setting up agreements with organizational and institutional partners who want to exchange such data.

This discussion paper results from a year-long effort by an international RLG working group comprised of representatives from the British Library, Columbia University, the European Register of Microform Masters, the Library of Congress, the National Library of Australia, the National Library of Canada, the University of Toronto, and the University of Leeds. Representatives from another RLG member advisory group comprising representatives from Cornell University, Emory University, Getty Information Institute, Harvard University, New York University, Princeton University, University of Cambridge, and Yale University also reviewed earlier drafts of this paper. An earlier draft was posted on the RLG Web Site on January 30, 1998 to solicit comments from the broader RLG community.

2 DISCUSSION

2.1 Needs of the Preservation Community

The motivation for enhancing the computer file 007 comes from the preservation community. For the past ten years, digitization has been researched and developed as a new method of preservation and access. Increasingly, digitization projects are no longer for research and development purposes, but are full-scale conversion projects to add to an institution's "digital library." Digitization of traditional library and archival materials such as books and manuscripts are being joined by the digitization of recorded sound and motion pictures, creating tens of thousands of computer files in the process.

At the same time, the amount of existing electronic material (those items "born" digital) acquired by institutions has grown tremendously. In response to the acquisition of this material, cataloging conventions were drafted in order to adequately describe and increase access to this new material. Unfortunately, this has not yet happened for materials digitally reformatted into computer files.

The preservation community relies upon, and is driven by, both the need and desire to share information. The knowledge that preservation can only happen through collaboration and resource sharing is universal. In recognition, institutions communicate their preservation intent and efforts in a communal manner, generally through the contribution of records to national bibliographic databases. The goal and effect is the avoidance of duplication of effort, but even more importantly, the increased access to thousands of records in union databases for items which previously had not been cataloged. When the need arose to provide for coding microform preservation elements in USMARC, the preservation community identified a core set of elements. In response to the current need to code digital preservation elements in USMARC, the preservation community has identified the core set of elements in this discussion paper.

2.2 USMARC Format

There is precedent for extending the length of an existing 007 field. In 1985, the 009 fixed field for archival collections was made obsolete in the Film format. Fourteen bytes from the 009 were added to the end of the existing 8-byte 007 for motion pictures (along with a newly defined byte not from the 009).

The addition of archival film elements to the motion picture 007 also provides precedent for enhancing a 007 with additional elements related to the needs of a particular method of handling materials. Just as MARBI chose not to define a separate 007 field for archival film elements, so it would follow that physical description elements for digitally reformatted materials should be added to the existing computer file 007.

The archival film elements (bytes 9-22) of the motion picture 007 are optional. USMARC documentation states that the motion picture 007 may be either 8 or 23 bytes long, depending on whether the archival film elements are coded. Likewise, we propose that the additional eight bytes for the computer file 007 be optional. This would allow a longer 14 byte computer file 007 to be created only for an item that is a digital reproduction intended for long-term preservation.

The intent of this paper is to suggest the definition of additional elements only for digital preservation and access purposes. Discussion of the proposal may lead, however, to the conclusion that some elements proposed have a wider application, such as compression. In that case, the order of elements could be revised, so that elements with wider application appear starting in byte 6. The intent of the proposal is to have the elements that only relate to digital preservation and access physical description appear at the end of the computer file 007.

This paper retains the original proposals from the RLG working group (except for a few changes agreed upon between LC and RLG) and includes questions that need to be discussed before the submission of a final proposal.

2.3 Current definitions<

The character positions in the 007 for computer files are currently defined as follows: 007/00: Category of material 007/01: Specific material designation 007/02: Undefined 007/03: Color 007/04: Dimensions 007/05: Sound

3 ENHANCEMENT OF COMPUTER FILE 007

3.1 Additional character positions

With the proposed changes, the enhanced 007 for computer files has thirteen character positions defined for it. To incorporate all of the important information, the following character positions would need to be added: 007/06: Antecedent/Source 007/07: File Formats 007/08-09: Image Bit Depth 007/10: Quality control target(s) 007/11: Compression 007/12: Reformatting Aspect

As core elements of preservation information, these new bytes should be classified as mandatory, if applicable within USMARC.

Question 1: It would be difficult to enforce that they be mandatory for a specific type of computer file. Would it be preferable to propose that they be highly recommended for preservation computer files?

Question 2: Could it be considered to establish a new 007 for preservation computer files instead of using the established one? In this case the elements that exist in the CF 007 could be repeated and these additional bytes added. This might be a cleaner method so that the original 007 for computer files is not affected. How have users been served by the precedent mentioned above, i.e. adding the bytes for archival film elements to the 007 for motion pictures?

3.2 Notes on Changes to Existing Bytes

There are minor typographical or grammatical corrections that have been referred to LC. A few more significant changes are requested that will allow the computer file 007 to cover digitally reformatted materials.

The change to the definition of code c in 007/00 (Category of material) is proposed to provide a more explicit statement of the kinds of files considered to be computer files. Image, audio, and video files are the most commonly encountered files in digital reformatting, so mention of them in the definition will make it clearer that they are included.

The addition of code v (Varies) to 007/01 (Specific material designation) will provide a useful code for managing digitally reformatted materials. Code v indicates that "it is likely the specific medium will vary over time as the content of the computer file is refreshed and migrated to preserve and archive the file." The rapid development of new technologies makes it likely that new or significantly changed media will often be available when the time comes to refresh or migrate an existing digitally preserved item. In many cases, a preservation agency may have no concern for the media used for refreshing, except that it is whatever media is their current standard. In addition, the agency may not want to maintain the catalog record to indicate what specific medium was used for some portion of their digitally reformatted materials. Code v provides such preservation agencies with a meaningful code to allow them to track and retrieve records for such items. Code v is something distinct from the codes for "Unknown" or "Other," and is not the same as saying that the concept of specific material designation is not applicable.

Question 3. Varies has been defined as the item will change over time. If it's coded as "v" then is it assumed that one must go back to change the information in the other bytes when the file changes (e.g. dimensions, image bit depth)? Doesn't saying it varies means that the physical details should not be recorded? Or would the cataloger keep adding new 007s to cover physical details when the file changes? What is the purpose of the "v" if you either do not give physical details or go back and add additional 007 fields? Why doesn't "other" suffice? If it is found that varies is indeed needed, perhaps a better term would be "dynamic". In addition if a separate 007 were used for preservation computer files, then defining "v" wouldn't confuse what's already there (although maybe it could be assumed and be no longer needed).

3.3 Notes on New Character Positions

007/06 Antecedent/Source

      a              File reproduced from original object
      b              File reproduced from microform
      c              File reproduced from computer file
      d              File reproduced from an intermediate (not microform)
      m              Mixed sources
      n              Not applicable
      u              Unknown

Information about the source of a digital file is important to the creation, use and management of digitally reformatted materials.

As with microfilm, certain assessments can be made based on the source material being reformatted. In preservation replacement searching, determinations whether the item has been reformatted and whether the reformatted version is a quality reproduction are often the basis for decision-making processes. The proposed values in 007/06 will allow a searcher to determine which records are for reproductions from originals, microforms, computer files, and intermediates, as well as mixed sources.

Some of the proposed codes are associated with a particular type of computer file. Code a, for example, is defined for computer files that contain images that were reproduced from the original object, and excludes a computer file comprised of audio data. In this case, the exclusion is made because current practice is to never digitally reformat audio materials from the original audio source. Code b is defined specifically for computer files that contain images, because it is impossible for a microform to contain audio data.

Question 4. This byte does not seem to cover audio/visual or OCR but only scanned items. Should it be made clear that non-scanned is not included here? For instance the Prints and Photographs Division at LC considers the print of the photographer the original, and if they do not have that they consider the negative the original. The definition does not define what is meant by "original". What if it's scanned from a photocopy? And how to code it for a non-2 dimensional scanned image, e.g. video?

007/07 File Formats

      a              One file format
      m              Multiple file formats
      u              Unknown

Information about file formats is important when cataloging, viewing, and archiving digitally reformatted computer files. The codes proposed in 007/07 will allow for distinctions between a computer file composed of a single format from those that contain multiple formats. Consideration was given to defining codes for specific formats (e.g., .jpg, .tif). This idea was rejected for two reasons:

1. Technology is rapidly changing and file formats which are common today will likely change in the not-so-distant future. Keeping up with changes would require constant review of formats in use, so that change requests could be made to MARBI to keep the codes up to date. More importantly, the number of file formats would soon exceed the coding possibilities of a single byte.

2. A portion of items digitally reformatted will consist of multiple formats (pages, plates, maps, foldouts, etc.). When digitizing these materials, it may be necessary to use different scanning processes to adequately capture the information contained in the original (e.g., bitonal scans for text and color scans for illustrations and maps). Because of technical details associated with scanning, the end result is that the different types of scans will result in different files types (e.g., TIFF files for text-based pages and JPEG files for color pages). The original proposal would have only allowed for encoding of one of the file types.

Information on file formats is currently contained in some catalog records. However, the information is contained in variable fields with no standard terminology, making it impossible to effectively search for it. The values proposed here for the 007/07 provide a standard place and uniform format for information about file formats of digitized items.

Question 5. Could this be called "Number of file formats" since it does not describe the file format itself? What is this information used for?

007/08-09 Image Bit Depth Type and Image Bit Depth

      01-99          Exact bit depth (expressed in numerics)
      mm             Multiple (more than 1 image type)
      nn             Not applicable
      uu             Unknown

While specialized, this type of information is increasingly important when working with image files. File quality (the richness of the image captured) can be inferred from the bit depth of the file. The bit depth of a file can also be instructive when considering the viewing device to be used with the image. Research and development related to image bit depth is underway and shows promise for carrying information to interact with viewing monitors, though this will only be possible with certain numeric bit depths. Without a place to record this information, this technical capability would be lost.

The idea of having codes for specific numeric bit depths in the enhanced computer file 007 was considered. This was rejected for the same reasons that specific format codes in the 007/07 were dismissed: rapid technological advances could make it difficult to keep up with the variety of bit depths, and the number of possible codes could soon be exhausted.

The proposed definitions for bit depth require that if the exact bit depth is not known, or if there are multiple images with varying bit depths comprising the computer file, either "uu" (unknown) or "mm" (multiple) is used. Only exact bit depth information is useful. The proposed computer file 007/08-09 does not allow for coding such as "1-" to show that something has a bit depth somewhere in the range of 10-19 bits.

Question 6. Can this always be limited to two bytes?

007/10 Quality Assurance Target(s)

      a              Absent
      n              Not applicable
      p              Present
      u              Unknown

When reformatting items, it is imperative to also capture quality assurance targets in order to judge the quality of the conversion. "Targets" are standard reference points which can be interpreted by a human or machine and used to measure resolution, color, faithfulness of representation to the original, etc. For imaging (still and video), visual targets are included to judge spacial resolution, accurate color capture and color management. For audio reformatting, reference and azimuth tones are included to allow for frequency modulation and equipment calibration.

Recording and identifying this type of information in bibliographic files is important when cataloging digitally reformatted items. Unlike microfilming (wherein the use of targets is standardized), digital reformatting still exists in an environment of little standardization. In a bibliographic record for an item which has been preservation microfilmed, there is no need to record the use of quality control targets. By its very nature as standardized preservation microfilm, the use of quality control targets can be inferred. With the rapid rate of technological change, it is not likely digitization will become standardized to the extent that microfilm has. Therefore, it is important to have the opportunity to record the inclusion of quality control targets in the bibliographic record for digitally reformatted items.

Question 7. Does this indicate that quality assurance has been done? Quality assurance does not have to be done through targets, and monitors can be adjusted according to a known target. In addition this may not cover all forms of practice in the future.

007/11 Compression

      a              Uncompressed
      l              Lossless
      n              Not applicable
      o              Lossy
      u              Unknown

Compression reduces the size of the computer file so as to facilitate processing, storage, and transmission. It is an important component in the access of computer files over a network, but also in the quality of a file. Two different types of compression exist for computer files: lossless and lossy. Lossless compression will allow a computer file to be compressed and decompressed with absolute fidelity each time. Lossy compression schemes employ techniques which average or discard some of the encoded digital information. When the file is decompressed, it will not be an exact replica of the original file.

Because the goal of preservation is to be able to provide an exact replica of an original item wherever possible, lossy compression is not considered an acceptable technique to associate with preservation "master" files. When judging the fidelity of the digital item to the original and the possibility for reproducing an exact copy, the compression scheme used is a vital tool in the decision-making process. Because only two options for compression exist (lossy or lossless) it is easily encoded. The use of a standardized, one-byte code to convey the compression scheme used on a particular computer file will allow preservation searchers to quickly identify the scheme in evaluating the overall qualities of the file.

Question 8. It might be preferable to change the codes so that they are not mneumonic, since they can't be (lossless and lossy). Many of the codes in the computer files 007 are just in order alphabetically, so these could be changed to a, b, c, etc. Must the distinction be made between not applicable and uncompressed? Can't someone always come up with a way to compress a file? Can not applicable be deleted? What if there is a mixture of compressions? Should a value be added for mixed or combination? Would a searchable text file be coded as uncompressed? Levels of compression can be controlled so that a lossy technique is used, but the result is essentially a lossless compression. How would this situation be coded?

007/12 Reformatting Aspect

      a              Access
      n              Not applicable
      p              Preservation
      u              Unknown

One remaining key piece of information about a digitally reformatted computer file is what we call reformatting aspect. It is an overall assessment of the physical quality of the computer file in relation to its intended use. It can be used to judge the level of quality of a file, and an institution's commitment to maintain its availability over time, information crucial to the international preservation community.

Reformatting aspect information is similar to what is conveyed in the microform 007/11 (Generation), where distinctions are made between master, printing, service, and mixed copy microforms. The information recorded here is not strictly something that can be physically described. A master generation microfilm may be physically indistinguishable from a printing master, even under very close inspection. The difference is in how the owning institution physically handles the file, both in storage and use. The microform 007/11 represents those distinctions in handling and use.

The main difference between the microform 007/11 (Generation) and the proposed computer file 007/12 (Reformatting aspect) is that one of the Generation code definitions (for first generation--master) is tied to ANSI/AIIM standards, while none of the Reformatting aspect code definitions are so tied. There are currently no relevant standards for reformatted computer files for preservation purposes similar to the ANSI/AIIM standards for master microforms. The code definitions for Reformatting aspect deal with the lack of standards by referring to general physical features and intended use of a reformatted computer file, distinguishing between files intended for access to original items from those intended to preserve (and possibly replace) the original item. In spite of this difference, the similarities between the microform 007/11 (Generation) and proposed computer file 007/12 (Reformatting aspect) are strong, and establish a precedent for this type of data in a 007 field.

The proposed "Reformatting Aspect" byte will allow preservation replacement searchers to quickly discern whether the owning institution intends to create and maintain a high-quality computer file that could replace a brittle or endangered original object. The international library community needs to preserve as many brittle or endangered materials as possible without unnecessary, costly duplication -- whether the preservation medium is microfilm or digitally reformatted computer files. Sharing preservation information is vital to avoid redundant efforts. Further, this byte would provide a mechanism by which the institution responsible for creating the file may identify all such items under its control. Finally, the inclusion of this byte would also allow for the machine extraction of database records identified as "digital masters" so the information may be exchanged with other institutions and organizations worldwide.

Question 9. Why is this byte necessary? If the file is for access, then why would someone want to code these preservation bytes? If there were a separate 007 for preservation computer files then this would be unnecessary; it would be assumed that it is for preservation. Aren't all those items one chooses to code in this extension to the CF007 preservation reformatting? Could a file be used for both access and preservation?

3 SUMMARY OF CHANGES REQUESTED

3.1 Existing Computer File 007
In 007/00 (Category of material) in the Bibliographic and Holdings formats, change the definition of code "c" to the following:

      (< > indicates addition; [ ] indicates deletion):
      c - Computer file
      Code c is used for all computer files (e.g., program, data [files], <image files, audio and
      video files>, etc.), which usually consist of digitized machine-readable data, program
      code, etc., intended to be accessed, processed, or executed by a computer.

In 007/01 (Specific material designation), add value "v" as follows: v - Varies Code v indicates it is likely the specific medium will vary over time as the content of the computer file is refreshed and migrated to preserve and archive the file, and that it is not important to the cataloging agency to code a specific medium type.

In 007/03 (Color), add the word "bitonal" to code a in the list of codes, recognizing a word widely used in digital reformatting that is equivalent to "one color".

In 007/04 (Dimensions), change the definition of code n as follows (< > indicates addition; [ ] indicates deletion):

      n - Not applicable
      Code n indicates that physical dimensions are not applicable to the computer file.  This
      code is appropriate for remote computer files <and computer files whose specific medium
      varies (coded v in byte 01)>.

3.2 New Computer File 007 character positions

Extend field 007 in the Bibliographic and Holdings formats to thirteen character positions by adding the following:

      007/06               Antecedent/Source
      007/07               File formats  
      007/08-09            Image Bit Depth 
      007/10               Quality Assurance Target(s)
      007/11               Compression   
      007/12               Reformatting Aspect 
See Attachment A for descriptions of the new character positions.

4 ADDITIONAL GENERAL QUESTIONS

4.1 Is there concern about the fact that one would hard-code this information in a bibliographic record? In many cases the item itself does not correspond with what is described in a bibliographic record, and structural/administrative metadata is needed to reside with the file (e.g. digitized files represent pages of a digitized book and the bibliographic record is for the whole thing). However, a specific use has been expressed because of its comparison to preservation microforms. It is thus important that the additional character positions be optional. Since one cannot determine programmatically when the information is applicable, it needs to be "highly recommended", not mandatory if applicable, as stated in 3.1.

4.2. Should a separate 007 be defined instead of adding these character positions to the existing computer files 007? If so, are all the bytes still necessary?

4.3 Many of the bytes have a code "n" for not applicable. Are these needed? If it is not applicable we wouldn't choose to code the preservation bytes. If you need the information for preservation purposes, then not applicable would not be helpful.


APPENDIX A

007/06 Antecedent/Source

Codes:

                                    
      a     File reproduced from original object
      b     File reproduced from microform
      c     File reproduced from computer file
      d     File reproduced from an intermediate (not microform)
      m     Mixed sources
      n     Not applicable
      u     Unknown

Character Position Definition and Scope

A one-character alphabetic code indicates antecedent or source of a computer file. This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

CODES

a - File reproduced from original object
Code a indicates the image(s) comprising the computer file have been created by scanning the original item. Common examples of original objects include: books; manuscripts; leaves of paper or vellum; photographs; etc. It does not refer to computer files or photographic film. See code c for computer files and code d for photographic film.

b - File reproduced from microform
Code b indicates the image(s) comprising the computer file have been created by scanning from microform (16mm microfilm, 35mm microfilm, 105mm microfiche, etc.).

c - File reproduced from computer file
Code c indicates the computer file has been created or copied from an existing computer file (new copies, derivative copies with lower resolution or smaller file size, etc.).

d - File reproduced from an intermediate (not microform)
Code d indicates the image(s) or sound comprising the computer file have been created by reformatting from an intermediate other than microform. Common examples of non- microform intermediates for visual materials are: 35 mm film, transparencies, slides, 2nd generation video tape, etc. A common example of an intermediate for audio materials is 2nd generation analog tapes.

m - Mixed sources
Code m indicates the computer file has been created from mixed sources (portions scanned from original item, portions scanned from microfilm, portions digitized from audio tape, etc.).

n - Not applicable
Code n indicates that antecedent or source are not applicable to this computer file. This code is appropriate for computer files other than those created during a reformatting process.

u - Unknown
Code u indicates the antecedent or source of this reformatted computer file is not known.


007/07 File formats

Codes:

      a     One file format
      m     Multiple file formats
      u     Unknown

Character Position Definition and Scope

A one-character alphabetic code indicates whether the file(s) which comprise(s) the computer file are of the same format or type. This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

CODES

a - One file format
Code a indicates that the file(s) which comprise(s) the computer file are of the same format or type (e.g., all .jpg; all .tif; all .txt; all .mpg; etc.).

m - Multiple file formats
Code m indicates that the files which comprise the computer file are of at least 2 different formats (e.g., .jpg and .tif; .tif and .txt; .wav and .mpg; etc.).

u - Unknown
Code u indicates that the format(s) of the file(s) which comprise(s) the computer file are not known.


007/08-09 Image Bit Depth

Codes:

                                    
      01-99           Exact bit depth (expressed in numerics)
      mm              Multiple (more than 1 image type)
      nn              Not applicable
      uu              Unknown

Character Position Definition and Scope

A two-character numeric code indicates the exact bit depth of the scanned image(s) that comprise the computer file or a two-character alphabetic code indicates that the exact bit depth cannot be recorded.

This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

Bit depth is determined by the number of bits used to define each pixel representing the image. Each bit can represent one of two values: 0 for black, 1 for white. The greater the bit depth, the greater the possible combinations of "blacks and whites," and therefore, the greater the dynamic range of color or grayscale information that can be rendered from the item being reformatted.

CODES

01-99 Exact bit depth
The image bit depth should be recorded if a single numerical applies to all files (e.g./ all files were scanned in 24-bit color). The numeric value of the image bit depth, using two digits, is right justified with a leading zero (e.g., 01, 08).

  007/08-00           01
  [Image bit depth is 1-bit.]

007/08-09 08 [Image bit depth is 8-bit.]

mm - Multiple (more than one image type)
Code mm indicates that the computer file is comprised of images that have been scanned and captured at more than one bit depth. A common example of a mixed computer file would be a volume with text and color images where the text has been scanned as bitonal (1-bit) images and the color plates have been scanned & captured using 24-bit color.

  007/08-10           mm
  [The image bit depth is not known or multiple images with varying bit depths
  comprise the computer file.]

nn - Not applicable
Code n indicates that bit-depth is not applicable to this computer file.

uu - Unknown
Code uu indicates the bit depth level of the image(s) comprising the computer file are not known.


007/10 Quality Assurance Target(s)

Codes:

  a      Absent
  n      Not applicable
  p      Present
  u      Unknown

Character Position Definition and Scope

A one-character alphabetic code indicates whether quality assurance targets have been included at the time of reformatting/creation of the computer file. Inclusion of quality assurance targets (at the time of re-recording/transfer/scanning or from targets already included on some microforms) allow quality assessments to be performed on the audio, video or image file(s).

This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

CODES

a - Absent
Code a indicates that quality control targets were not included at the time of reformatting and/or are not present in the computer file.

n - Not applicable
Code n indicates that the inclusion of quality control targets is not applicable to this computer file.

p - Present
Code p indicates that one or more quality control targets were included at the time of reformatting and are present in the computer file. Commonly found quality control targets for scanning include the Kodak Q13 or Q14 Color Separation Guide and Gray Scale; Kodak Q60 Color Input Target; AIIM Scanning Test Chart #2; and the RIT Alphanumeric Resolution Test Object. Commonly found quality control targets for re-recording/transfer of audio files include reference and azimuth tones.

u - Unknown
Code u indicates that it is not known if quality control targets are present in the computer file.


007/11 Compression

Codes:

  a      Uncompressed
  l      Lossless
  n      Not applicable
  o      Lossy
  u      Unknown

Character Position Definition and Scope

A one-character alphabetic code indicates whether the computer file has been subjected to compression. Compression reduces the file size for processing, storage, and transmission, but the quality of the image or audio file may be affected by the type and level of the compression used.

This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

If compressed, record whether the compression type employed is lossless or lossy.

CODES

a - Uncompressed
Code a indicates that the computer file has not been compressed through the use of any compression technique.

l - Lossless
Code l indicates that the computer file has been compressed and the compression type used is "lossless." Lossless compression will allow a computer file to be compressed and decompressed with absolute fidelity each time. To be considered lossless, no informational loss may occur during this process. An example of lossless compression schemes would be TIFF Group 4 compression employed on bitonal image files.

n - Not applicable
Code n indicates that compression is not applicable to this computer file.

o - Lossy
Code o indicates that the computer file has been compressed and the compression type used is "lossy." Lossy compression schemes employ techniques which average or discard some of the encoded digital information. When the file is decompressed, it will not be an exact replica of the original file. Examples of lossy compression schemes include LZW, JPEG, Kodak ImagePac (Photo CD), AC-3 (Dolby Digital) and MPEG.

u - Unknown
Code u indicates that it is not known if compression techniques have been employed .


007/12 Reformatting Aspect

Codes:

  a      Access
  n      Not applicable
  p      Preservation
  u      Unknown

Character Position Definition and Scope

A one-character alphabetic code indicates the reformatting aspect of the computer file. This character position is only intended for use describing computer files created in the process of digital reformatting.

Guidelines for Applying Content Designators

CODES

a - Access
Code a indicates the computer file will be used for electronic access to the original item. The reformatting process used to create the computer file may have involved lower-quality captures of an original item or derivations of the high-quality, preservation files. Unlike computer files created for preservation purposes, there is not necessarily an institutional commitment to support long-term access to these computer files. Examples of computer files created for access purposes may include images created for a temporary, online exhibition (possibly to mirror an in-house installation); compressed, lower resolution versions of the preservation files that allow for easier transmission and access over the Internet; photos which have been scanned at lower resolutions to create an online browsing tool for a collection; or articles, music or oral histories digitally reformatted as a part of an E-Reserve collection which will be discarded per copyright at the end of an academic semester.

n - Not applicable
Code n indicates reformatting is not applicable to the computer file.

p - Preservation
Code p indicates the computer file was created via reformatting to preserve and possibly replace the original item. The capture and storage techniques associated with preservation files insure high quality, long-term computer files..

The implication is that computer files coded p (printed out, viewed on screen, played via a listening device) could serve as a replacement should the original be lost, damaged or destroyed. As with a master microform created for preservation purposes, there is an implied institutional commitment to keep this type of file properly archived and accessible over time.

u - Unknown
Code u indicates the reformatting aspect of the computer file is unknown.


Appendix B
Computer file (007/00 = c):
007 Examples with Explanations

EXAMPLES

007      cj#ca#
[Item is a computer program on 3  inch diskette (007/00, 01, and 04, which
supports a color (03) video interface but no sound (05).]

007 co#cga [Item is interactive software and data on a 4 inch optical disc (CD-ROM) (007/00, 01, and 04) intended to be viewed in color (03) with sound (05).]

007 cv#gn#aa08plp [Item is a digitized version of an original, reformatted for preservation purposes (007/00, 06, 13) The computer file is comprised of grayscale TIFF images only (no sound) which were scanned at a bit depth of 8 bits per pixel, including quality control targets and is compressed using lossless compression (03, 05, 07, 08, 09-10, 11, 12). Because this file was created for preservation purposes, the medium on which the file is stored will vary as it is refreshed and migrated to new systems to remain accessible (01, 04).]

007 cr#cn#cmmmuoa [Item is an access version derived from a computer file of a digitally reformatted original and is stored remotely and accessed over a computer network (007/00, 01, 04, 06, 13). The access file is comprised of both 24-bit color and 8-bit bitonal images (no sound) which have been compressed using JPEG (a lossy) compression (03, 05, 07, 08, 09-10, 12). It is not known if this access version contains quality control targets as a part of the computer (11).]

007 co#ngadannaoa [Item is an access version of an audio file which had been digitally reformatted from the 2nd generation analog tape and is stored on a CD (007/00, 01, 04, 05, 06, 07, 13). Quality assurance target tones are not present on this mpeg-compressed access copy (11, 12) Because it is not an image or video file, color and bit depth aspects are not applicable (03, 08, 09-10).


Appendix C Computer file (007/00 = c):
007 Examples within Bibliographic Records

Examples are included for different types of computer files and reflect the single record approach. Coding in the 007 (c) reflects the enhanced field of fourteen bytes.

1. A serial:
Five years of this title have been reformatted onto microfilm (007, nos. 1-3). Those five years worth of microfilm were then scanned to create the preservation copy of the digital files (007, no. 4). The master file was copied and compressed to create lower resolution files to be served over the Internet (007, no.5)

007              hdrafa014bacp                     [microfilm - service copy]
007              hdrbfa014baap                     [microfilm - master neg.]
007              hdrbfa014babp                     [microfilm - printing master]
007              co#go#bmmmpap                     [computer file - preservation]
007              cr#an#cammaoa                     [computer file - access]
008              750725d19161944nyumr1m######s0###a0eng#d
010              $a87644633$zsc793106
022              $a0097-0271
040              $aCStRLIN$cCStRLIN
050       0      $aQ11$b.N82
222              0         $aNew York State Museum bulletin
245       00               $aNew York State Museum bulletin.
260              $aAlbany, N.Y. :$bUniversity of the State of New York, $c1916-1944.
300              $a157 v. :$bill. (some col.), maps, plans ;$c23 cm.
310              $aMonthly
362       0      $aNo. 181 (Jan. 1, 1916)-no. 337 (Dec. 1944).
500              $aTitle from cover.
555              $aSubject index: No. 181-no. 319 in no. 322.
590              $aCopy 1: optical digital image files for archiving. Resolution 600 dpi.
590              $aCopy 2: remotely stored digital image files for viewing. Resolution approximately
                 120 dpi.
533              $aComputer file.$m1916-1920.$bMountain View, CA. :$c Research Libraries
                 Group, $d1998.$e1 optical disc.$n3300 digital images : 600 dpi
533              $aMicrofilm.$m1916-1920.$bMountain View, CA. :$cResearch Libraries
                 Group,$d1997.$e3 microfilm reels : negative ; 35 mm.
650       0      $aScience.
710       20     $aNew York State Museum.
780       00     $tMuseum bulletin (New York State Museum)$x1066-8012 $w(DLC) 87644632
                 $w(OCoLC)1476687
785       00     $tBulletin (New York State Museum : 1945)$w(DLC)87644637
                 $w(OCoLC)9454998
856       1      $uhttp://www.rlg.org/preserv/pri.html


2. Audio Item:
Originally on 3-10 in. 78 rpm discs, this music was reformatted onto analog audio tape for preservation purposes. At a later date, the library opted to increase access to the music and make it available via the Internet. The music was then captured from the preservation analog tapes to create the new computer file.

007              sd#dssdnnmslnb                       [78rpm disc sound recording]
007              st#psndobannae                       [preservation master reel-to-reel tape]
007              cr#nnadannpoa                        [digitally reformatted audio - computer file]
008              970213s1989####at#ppn###z##########eng#d
024       1      2147182162
028       02     838 216-2$bABC Records
033       20     1928----$b1938----
040              $aCStRLIN $cCStRLIN
245       00     $aSaucy songs, 1928 to 1938$h[sound recording].
260              $aNew York :$bABC Records,$cp1959.
300              $a3 sound discs :$b analog, 78 rpm ;$c 10 in.
511       0      $aVarious performers.
518              $aRecorded 1928-1938.
505       0      $aA guy what takes his time (Mae West) (2:47) -- You can't blame me for that
                 (Max Miller) (3:06) -- Oh! You have no idea (Sophie Tucker) (2:52) -- Come up
                 and see me sometime (Cliff Edwards) (3:06) -- Is there anything wrong in that?
                 (Helen Kane) (3:05) -- You brought a new kind of love to me (Ethel Waters) (3:22)
                 -- When I'm cleaning windows (George Formby) (2:50) -- I like to do things for
                 you (Frankie Trumbauer) (3:25) -- I found a new way to go to town (2:41) ; Easy
                 rider (Mae West) (2:27) -- Say, young lady (George Olsen & his Music) (3:03) --
                 Pu-leeze! Mister Hemingway (Ann Suter) (2:43) -- Bessie coudn't help it (Slatz
                 Randall) (3:00) -- I'm wild about that thing (Bessie Smith) (2:48) -- Ol' man Mose
                 (Patricia Norman) (2:27) -- Life begins at forty (Sophie Tucker) (3:02) -- It isn't
                 love (Ronald Frankau) (3:14) -- They call me Sister Honky-Tonk (Mae West)
                 (3:03).
533              $aComputer file. $bMountain View, CA. :$c Research Libraries Group, $d1998 $n
                 1 audio file : 65 megabytes.
650       0      $aBawdy songs.
650       0      $aPopular music$y1921-1930.
650       0      $aPopular music$y1931-1940.
700       1      $aWest, Mae.$4prf
700       1      $aMiller, Max,$db. 1895.$4prf
700       1      $aTucker, Sophie,$d1884-1966.$4prf
700       1      $aEdwards, Cliff,$d1895-1971.$4prf
700       1      $aKane, Helen.$4prf
700       1      $aWaters, Ethel,$d1900-1977.$4prf
700       1      $aFormby, George,$d1904-1961.$4prf
700       1      $aTrumbauer, Frank.$4prf
700       1      $aOlsen, George,$d1893-1971.$4prf
700       1      $aSuter, Ann.$4prf
700       1      $aRandall, Slatz.$4prf
700       1      $aSmith, Bessie,$d1898?-1937.$4prf
700       1      $aNorman, Patricia.$4prf
700       1      $aFrankau, Ronald.$4prf
856       1      $uhttp://www.rlg.org/preserv/pri/smith.mpg


Go to:


Library of Congress
Library of Congress Help Desk (09/03/98)