NAME: Changes to Field 856 (Electronic Location and Access) in the USMARC Formats
SOURCE: Library of Congress
SUMMARY: This paper proposes two changes to Field 856. The first suggests the addition of a first indicator (Access method) value 8 for Other, to be used when a Uniform Resource Locator (URL) is recorded in subfield $u. The second change is a redefinition of subfield $q (File transfer mode) to File format type. This would result in recording the type of file, or MIME type in subfield $q, instead of the current definition that requires recording "ASCII" or "binary" to indicate what mode of transfer is necessary.
KEYWORDS: Field 856 (Bibliographic/Holdings/Classification); Electronic Location and Access; Subfield $q, in field 856 [Bibliographic/Holdings/Classification]; File transfer mode; File format type; Access method
RELATED:
STATUS/COMMENTS:
12/1/95 - Forwarded to USMARC Advisory Group for discussion at the 1996 Midwinter MARBI meetings.
1/21/96 - Results of USMARC Advisory Group discussion - Rejected. Because of OCLC use of the data in 856$2 for display of access method in the INTERCAT catalog, participants felt that saving a few keystrokes was not worth any impact of such a change. As to the proposed change to subfield $q, concern was expressed that the need for file format to be explicit may be a temporary situation, and that in the future files may become more self-defining. It was suggested that it would be better to wait and see if this change is still needed in the future, since no specific need has been demonstrated.
2/15/96 - Results of final LC review - Agreed with the MARBI decision.
PROPOSAL NO. 96-1: Changes to Field 856 (Electronic Location and Access) 1. BACKGROUND Field 856 (Electronic Location and Access) was initially developed and approved by the USMARC Advisory Group in January 1993. At that time, the Internet Engineering Task Force was finalizing the draft standard for a locator, the Uniform Resource Locator (URL). During discussions of field 856, participants agreed that the field should enable a system to create a "hot link" to allow for the transfer of a file, the connection to another host, or the initiation of an email message through information recorded in the field. Since the publication of field 856 in the USMARC Format for Bibliographic Data and the increasing use of the field as records for electronic resources have been created, users have gained more experience in using the field. This proposal considers two possible changes to field 856. The first is proposed as a result of comments received over a period of time concerning redundancy in the use of subfield $2 when the access method is recorded as part of the URL in subfield $u. The second is the redefinition of subfield $q so that file format type can be given as part of the information in the field. 2. ACCESS METHOD The first indicator in field 856 was defined as Access method and the values were defined to represent the three main TCPIP protocols used on the Internet. Other access methods were to be recorded with a first indicator value of 7 to indicate that the information is recorded in subfield $2 (Access method). At the time it was clear that other access methods were being developed, and it was impossible to predict how the list might grow. Because the URL had not been fully developed or yet in wide use, separate subfields were defined for recording all pieces of information that were needed for a system to provide the appropriate link depending upon the access method used. In January 1994 two proposals were presented to enhance field 856. One of these, Proposal 94-3 (Addition of Subfield $u (Uniform Resource Locator) to Field 856 in the USMARC Holdings/Bibliographic Formats) defined a subfield $u for a URL. The URL standard stipulates that the URL begins with an access scheme, specified in Uniform Resource Locators (URL) (RFC 1738), a product of the Uniform Resource Identifiers Working Group of the IETF. Now that the URL has become a de-facto international standard for locating resources on the Internet, many records are being created with only a URL in subfield $u of field 856, rather than parsing the data into separate subfields. When a URL is used in the field, it is currently necessary to set the first indicator to value 7 and fill in subfield $2 with an indication of the access method that does not have a specific value defined. (This technique is common in variable fields of the MARC formats, indicating that the information that does not have its own value in the indicator can be found in subfield $2.) In the case of URL's, the first portion before the "://" containing the access scheme is repeated in subfield $2. The information in subfield $2 is then essentially redundant, since it is part of the URL itself. Catalogers using MARC for description of Internet resources have suggested that this redundancy results in additional unnecessary keying. An alternative would be to define indicator values for the most commonly used access methods. The advantage to this approach would be the ability to use an indicator value for retrieval. The disadvantage is that there are only four (or three if 8 is defined as Other) indicator values available. Now defined are values 0, 1, 2, 3, 7; 4, 5, 6, and 8 would be available. Value 9 is generally used locally; value 8 is usually used as "Other". Those access methods that might need their own values are: news, http, gopher. If all the values are used, then it would not be possible to define additional values when other access methods become available. If a value 8 for Other were defined in field 856 (another common technique in the formats), it would not be necessary to provide additional information in subfield $2. Only the URL would then need to be recorded in subfield $u and no other subfields would be required. If such a change were approved, the question arises as to what to do with existing records. Would they need to be changed to set the first indicator to 8 and take out subfield $2? If the situation is a file available by FTP, and there is an appropriate first indicator value (value 1), then the subfield $2 would not be filled in, but the information in the URL would contain the initial "ftp". In this case the first indicator would supply the information about access method, even though the information is already in the URL in subfield $u. In other cases, where there is no specific indicator value, the information would not specified except in $u. In cases where the separate subfields have been used instead of a URL, the appropriate value would be recorded in the first indicator with subfield $2 if the first indicator value is value 7. OCLC has established a searchable database of MARC records for Internet resources called Intercat. Participants in the project contribute records and the database is available through the World Wide Web. It is important to note that OCLC is using subfield $2 for display of the access method. After a search, a brief record display includes "Electronic access:" and the data in subfield $2; the display of the full record includes "Mode of access:[data in $2]" and "Location: [data in $u]". Consequently, this proposed change, although requested by catalogers who do not want to key redundant data, would have an impact on the OCLC Intercat catalog. It would be necessary for OCLC to change the program to extract the first piece of information in the URL if the first indicator were set to 8 to display it as the catalog currently does. 3. FILE FORMAT TYPE When the field for electronic location was initially being discussed, participants agreed that the information should include whatever was needed for interaction with the resource to take place. If the resource described in the record was available by telnet, the information should enable a connection; if the resource was available by email, it should enable the initiation of an email message; if by FTP, it should enable the transfer of a file. One piece of information that was deemed by participants to be required for FTP was whether the file is transferred as ASCII or binary. Thus subfield $q was defined as File transfer mode. In the past few years, the availability of all types of resources over the Internet has exploded. Now, the World Wide Web, which was only under development when enhancements were made to MARC to accommodate description of Internet resources, has allowed for the integration of multimedia resources. Software that is necessary for display of digitized images or playing of digital audio files is activated depending upon the file format. Often the file extension indicates the type of file and determines whether it is transferred in binary or ASCII mode (ASCII is the default; all other types of files are transferred using binary). In creating MARC records for Internet resources, catalogers have been confused about where to include information about file format. Field 516 (Type of file) is a note field containing generally nature and scope information about the file described. In some cases this information has been combined in the field with file format (e.g. "Electronic journal in ASCII format"). In other cases, field 538 (System requirements note) has been used, since requirements for processing the file are dependent upon the type of compression used or file format type. File format is a data element included in the Dublin Core, a list of core data elements needed for Internet resource discovery and retrieval. This list was developed by a wide range of participants at the OCLC Metadata Workshop held in March 1995. In the mapping of the elements to MARC, field 538 was used for this element (see Discussion Paper No. 86: Mapping the Dublin Core Metadata Elements to USMARC). However, this mapping is not entirely adequate, since the field can contain information other than file format, and since file format has been also recorded in other MARC fields. If a subfield were defined in field 856 for file format, then the information could be given at the level of the location, rather than for the intellectual work as a whole. In recent discussions of whether separate records need to be created for different file formats, the majority of respondents have endorsed using one record for the intellectual work and to use repeating 856 fields for different file formats. Recording such information within field 856 would allow for the file format to be associated with a particular file at a particular host. However, other note fields would still be available for recording file format if this were desirable. File format type is often referred to as "MIME type". Often the extension to a filename indicates the file format. An Internet Request for Comments (RFC1521) "MIME (Multipurpose Internet Mail Extensions)" specifies the type of data in mail messages, although this is generally extended to other types of resources that reside on the Internet. It includes content types and subtypes and defines a registration process that uses the Internet Assigned Numbers Authority (IANA) as a central registry for specific values. If subfield $q were redefined as File format type, the question arises as to how to record the data. Would a standardized list be maintained of file format types, or would the user use free text? If it were desirable to maintain a list, it should be consistent with others established. It may be necessary to use those that have been registered by IANA, as specified in RFC1521. Attachment A contains Appendix F from RFC1521, the summary of seven content-types as defined in the MIME standard. Attachment B contains a list of MIME types with a mapping to file extensions. Questions to consider: 1. Has anyone used subfield $q extensively so that a redefinition is not desirable? Do users need a separate subfield for the information about binary or ASCII transfer? Only subfields $e and $y are available in field 856 if a new subfield is needed for file format type. Or is it desirable to include binary or ASCII with file format type in this newly defined subfield? 2. Should the data in the subfield be a controlled list or free text? If a controlled list, what should be the authoritative source? 4. PROPOSED CHANGES The following is presented for consideration: - In the USMARC Holdings/Bibliographic Formats, define the following value in Field 856, First indicator: 8 Other - In the USMARC Holdings/Bibliographic Formats, redefine subfield $q (File transfer mode) as File format type. See Attachment C for a description of this field if this proposal is approved. ------------------------------------------------------------------ ATTACHMENT A RFC 1521 MIME September 1993 Appendix F -- Summary of the Seven Content-types Content-type: text Subtypes defined by this document: plain Important Parameters: charset Encoding notes: quoted-printable generally preferred if an encoding is needed and the character set is mostly an ASCII superset. Security considerations: Rich text formats such as TeX and Troff often contain mechanisms for executing arbitrary commands or file system operations, and should not be used automatically unless these security problems have been addressed. Even plain text may contain control characters that can be used to exploit the capabilities of "intelligent" terminals and cause security violations. User interfaces designed to run on such terminals should be aware of and try to prevent such problems. ________________________________________________________ Content-type: multipart Subtypes defined by this document: mixed, alternative, digest, parallel. Important Parameters: boundary Encoding notes: No content-transfer-encoding is permitted. ________________________________________________________ Content-type: message Subtypes defined by this document: rfc822, partial, external-body Important Parameters: id, number, total, access-type, expiration, size, permission, name, site, directory, mode, server, subject Encoding notes: No content-transfer-encoding is permitted. Specifically, only "7bit" is permitted for "message/partial" or "message/external-body", and only "7bit", "8bit", or "binary" are permitted for other subtypes of "message". ______________________________________________________________ Content-type: application Subtypes defined by this document: octet-stream, postscript Important Parameters: type, padding ------------------------------------------------------------------ RFC 1521 MIME September 1993 Deprecated Parameters: name and conversions were defined in RFC 1341. Encoding notes: base64 preferred for unreadable subtypes. Security considerations: This type is intended for the transmission of data to be interpreted by locally-installed programs. If used, for example, to transmit executable binary programs or programs in general-purpose interpreted languages, such as LISP programs or shell scripts, severe security problems could result. Authors of mail-reading agents are cautioned against giving their systems the power to execute mail-based application data without carefully considering the security implications. While it is certainly possible to define safe application formats and even safe interpreters for unsafe formats, each interpreter should be evaluated separately for possible security problems. ________________________________________________________________ Content-type: image Subtypes defined by this document: jpeg, gif Important Parameters: none Encoding notes: base64 generally preferred ________________________________________________________________ Content-type: audio Subtypes defined by this document: basic Important Parameters: none Encoding notes: base64 generally preferred ________________________________________________________________ Content-type: video Subtypes defined by this document: mpeg Important Parameters: none Encoding notes: base64 generally preferred Borenstein & Freed [Page 75] ------------------------------------------------------------------ ATTACHMENT B Mapping of MIME types to file extensions MIME type File extension application/activemessage application/andrew-inset application/applefile application/atomicmail application/dca-rft application/dec-dx application/mac-binhex40 application/macwriteii application/msword application/news-message-id application/news-transmission application/octet-stream bin application/oda oda application/pdf pdf application/postscript ai eps ps application/remote-printing application/rtf rtf application/slate application/x-compressed Z application/x-mif mif application/wita application/wordperfect5.1 wp application/x-csh csh application/x-dvi dvi application/x-hdf hdf application/x-latex latex application/x-netcdf nc cdf application/x-powerpoint ppt application/x-sh sh application/x-tcl tcl application/x-tex tex application/x-texinfo texinfo texi application/x-troff t tr roff application/x-troff-man man application/x-troff-me me application/x-troff-ms ms application/x-wais-source src application/zip zip application/x-bcpio bcpio application/x-cpio cpio application/x-gtar gtar application/x-shar shar application/x-sv4cpio sv4cpio application/x-sv4crc sv4crc application/x-tar tar application/x-ustar ustar audio/basic au snd audio/x-aiff aif aiff aifc audio/x-wav wav image/gif gif image/ief ief image/jpeg jpeg jpg jpe jif image/tiff tiff tif image/x-cmu-raster ras image/x-pcx pcx image/x-portable-anymap pnm image/x-portable-bitmap pbm image/x-portable-graymap pgm image/x-portable-pixmap ppm image/x-rgb rgb image/x-xbitmap xbm image/x-xpixmap xpm image/x-xwindowdump xwd message/external-body message/news message/partial message/rfc822 multipart/alternative multipart/appledouble multipart/digest multipart/mixed multipart/parallel text/html html text/plain txt text/richtext rtx text/tab-separated-values tsv text/x-setex etx text/x-sgml sgml sgm video/mpeg mpeg mpg mpe video/quicktime qt mov video/x-msvideo avi video/x-sgi-movie movie Additional information on file types (these documents also indicate whether the file is to be transferred as ASCII or binary): "List of file extensions", by Allison Zhang URL: http://ac.dal.ca/~dong/contents.html "Common Internet file formats", compiled by Eric Perlman and Ian Kallen URL: http://www.matisse.net/files/formats.html ------------------------------------------------------------------ ATTACHMENT C < > indicates addition; [ ] indicates deletion 856 Electronic Location and Access (R) Indicators First Access method 0 Email 1 FTP 2 Remote login (Telnet) 3 Dial-up 7 Method specified in subfield $2 <8 Other> Second Undefined # Undefined Subfield Codes $a Host name (R) $b Access number (NR) $c Compression information (R) $d Path (R) $f Electronic name (R) $g Electronic name�End of range (R) $h Processor of request (NR) $i Instruction (R) $j Bits per second (NR) $k Password (NR) $l Logon/login (NR) $m Contact for access assistance (R) $n Name of location of host in subfield $a (NR) $o Operating system (NR) $p Port (NR) $q File <format type> [transfer mode] (NR) $r Settings (NR) $s File size (R) $t Terminal emulation (R) $u Uniform Resource Locator (R) $v Hours access method available (R) $w Record control number (R) $x Nonpublic note (R) $z Public note (R) $2 Access method (NR) $3 Materials specified (NR) FIELD DEFINITION AND SCOPE This field contains the information required to locate an electronic item. The information identifies the electronic location containing the item or from which it is available. It also contains information to retrieve the item by the access method identified in the first indicator position. The information contained in this field is sufficient to allow for the electronic transfer of a file, subscription to an electronic journal, or logon to an electronic resource. In some cases, only unique data elements are recorded which allow the user to access a locator table on a remote host containing the remaining information needed to access the item. Field 856 is repeated when the location data elements vary (subfields $a, $b, $d) and when more than one access method may be used. It is also repeated whenever the electronic filename varies (subfield $f), except when a single intellectual item is divided into different parts for online storage or retrieval. ------------------------------------------------------------------- ----------------------------------------------- GUIDELINES FOR APPLYING CONTENT DESIGNATORS INDICATORS First Indicator - Access method The first indicator position contains a value that defines how the rest of the data in the field will be used. If the resource is available by more than one access method, the field is repeated with data appropriate to each method. The methods defined are the main TCP/IP (Transmission Control Protocol/Internet Protocol) protocols. The value in the first indicator position determines which subfields are appropriate for use. For example, when first indicator value 1 (FTP) is used, subfields $d (Path), $f (Electronic name), $c (Compression information), and $s (File size) are appropriate, whereas they would not be with first indicator value 2 (Remote login (Telnet)). 0 - Email Value 0 indicates that access to the electronic resource is through electronic mail (email). This access includes subscribing to an electronic journal or electronic forum through software intended to be used by an email system. 1 - FTP Value 1 indicates that the access to the electronic resource is through the File Transfer Protocol (FTP). Additional information in other subfields may enable the user to transfer the resource electronically. 2 - Remote login (Telnet) Value 2 indicates that access to the electronic resource is through remote login (Telnet). Additional information in subfields of the record may enable the user to connect to the resource electronically. 3 - Dial-up Value 3 indicates that access to the electronic resource is through a conventional telephone line (dial-up). Additional information in subfields of the record may enable the user to connect to the resource. 7 - Method specified in subfield $2 Value 7 indicates that access to the electronic resource is through a method other than the defined values and for which an identifying code is given in subfield $2 (Source of access). <8 - Other Value 8 indicates that access to the electronic resource is not specified by one of the other values or by a code in subfield $2.> Second Indicator - Undefined The second indicator position is undefined and contains a blank (#). SUBFIELD CODES $a - Host name Subfield $a contains the fully qualified domain (host name) of the electronic location. It contains a network address which is repeated if there is more than one address for the same host. The convention for a BITNET address is to add .bitnet. 856 1#$aharvada.harvard.edu$aharvarda.bitnet ------------------------------------------------------------------ $n - Name of location of host in subfield $a Subfield $n contains the conventional name of the location of the host in subfield $a, including its physical (geographic) location. 856 2#$apucc.princeton.edu$nPrinceton University, Princeton, N.J. $o - Operating system For informational purposes, operating system used by the host specified in subfield $a is indicated here. Conventions for the path and filenames may be dependent on the operating system of the host. For the operating system of the resource itself (i.e., the item represented by the title recorded in field 245), rather than the operating system of the host making it available, field 753 (Technical Details Access to Computer Files), subfield $c (Operating system) is used. 856 1#$ars7.loc.gov$d/pub/soviet.archive$fk1famine.bkg $nLibrary of Congress, Washington, D.C.$oUNIX $p - Port Subfield $p contains the portion of the address that identifies a process or service in the host. 856 2#$amadlab.sprl.umich.edu$nUniversity of Michigan Weather Underground$p3000 $q - File <format type> [transfer mode] Subfield $q contains an identification of the file <format type> [transfer mode]. <File formats specify the nature of the data, how it is used, and includes what is generally known as a MIME type. It may include both the type (e.g. image) and the subtype (e.g. jpeg). The file format type also> determines how data are transferred through a network. Usually, a text file can be transferred as character data which generally restricts the text to characters in the ASCII (American National Standard Code for Information Interchange (ANSI X3.4)) character set (i.e., the basic Latin alphabet, digits 0-9, a few special characters, and most punctuation marks). Text files with characters outside of the ASCII set, or non-textual data (e.g., computer programs, image data) must be transferred using another file transfer mode, usually binary mode. 856 13$aarchive.cis.ohiostate.edu $dpub/comp.sources.Unix/volume 10$fcomobj.lisp.10.Z $q[binary]<application/x-compressed> [File is UNIX compressed] <856 7#$3NYDA.1993.010.00130 $uhttp://www.cc.columbia.edu/imaging/photocd/3009-1031- 1443/IMG0089.512.gif$qimage/gif$2http> $r - Settings Subfield $r contains the settings used for transferring data. Included in settings are: 1) Number Data Bits (the number of bits per character); 2) Number Stop Bits (the number of bits to signal the end of a byte); and 3) Parity (the parity checking technique used). The syntax of these elements is: <Parity>-<Number Data Bits>-<Number Stop Bits> If only the parity is given, the other elements of settings and their related hyphens are omitted (i.e., "<Parity>"). If one of the other two elements is given, the hyphen for the missing element is recorded in its proper position (i.e., "<Parity>--<Number Stop Bits>" or "<Parity>-<Number Data Bits>-" )