NAME: Changes to FTP File Label Specifications for Electronic Files of USMARC Records
SOURCE: Library of Congress
SUMMARY: This paper proposes changes, originally proposed by the participants in the European CoBRA FLEX Project 10164 for the file label that is used for files of USMARC records transferred via the File Transfer Protocol (FTP). Additional fields are proposed that have been deemed necessary for exchange of records in a variety of MARC formats.
KEYWORDS: FTP Label; File Transfer
RELATED: DP61 (Jan. 1993); 93-9 (June 1993); DP94 (Jan. 1996)
STATUS/COMMENTS:
5/6/96 - Forwarded to USMARC Advisory Group for discussion at the July 1996 MARBI meetings.
7/6/96 - Accepted with the following change: Proposal 2. The end-of-field marker may be either carriage return (X'0D') or carriage return followed by line feed (X'0D''0A'). Do not use number sign or the current X'1E'.
8/6/96 - Result of final LC reveiew - Agreed with MARBI decision.
PROPOSAL NO. 96-7: Changes to FTP File Label Specifications for Electronic Files
1. INTRODUCTION
The European library community has been investigating the use of
the Internet File Transfer Protocol (FTP) for the electronic
exchange of bibliographic data. The European Commission's
Libraries Programme through CoBRA (Computerized Bibliographic
Record Actions) has funded the FLEX (File Label EXchange) Project
10164 to investigate the need for standards in this area, and "to
suggest a suitable file labelling and naming format".
The participants in the FLEX Project understand that without
standardization in the way files are described within the label
file, it would become increasingly difficult to exchange
bibliographic information internationally. Because the USMARC
specification for electronic file transfer has been widely reviewed
by the USMARC community and is now in use by many exchange partners
of bibliographic records, the FLEX project participants have
proposed that the USMARC specification be used as the base
specification. However, they have proposed some enhancements to
that specification to take into account a European dimension for
exchanging and processing bibliographic data.
In addition, the FLEX Project participants have suggested a file
naming convention for use when certain operating system constraints
apply.
2. PROPOSED CHANGES
See Attachment A for an FTP File label example. See Attachment B
for revised definitions of the fields.
Proposal 1. Change label file character set
It is proposed that the character set of the label file conform to
ISO 646-IRV or ASCII. (There are two differences between ISO 646-
IRV and ASCII: 1) ISO 646 character position "24" is the universal
currency symbol whereas this character is the "$" symbol in ASCII;
2) ISO 646-IRV character position "7E" is an overline or tilde
whereas this character is the tilde in ASCII. These differences
should not be problematic.)
Proposal 2. Change the end-of-field character symbol from the
current end-of-field marker (X'1E') to the number sign "#" (X'23'),
followed by a carriage return (X'0D') or carriage return/line feed
(X'0D''0A') depending on operating systems used.
There was objection to using the USMARC end-of-field character
(X'1E') in what was felt should be a text file. It is, therefore,
proposed that the same end-of-field characters that are currently
used in the diskette FTP file label specification be used in this
file label specification. These characters can be supplied by any
operating system.
Proposal 3. Add optional field CID (Country Identifier)
Field ORS (Originating System ID) is, in some cases, insufficient
to identify the originating system. When necessary, the CID
(Country Identifier) field would be used with the ORS field but its
use would remain optional. The country identifier would be the
two-character alpha code defined by ISO 3166 (Codes for the
Representation of Names of Countries).
Proposal 4. Make the FOR Field (Format) mandatory
It is proposed that the existing FOR field (Format) be made
mandatory to identify the structural format standard used for
records in the file. For example, "M" = Z39.2 (or its equivalent
ISO 2709), and "S" = SGML (ISO 8879).
Proposal 5. Add optional field FQF (Format Qualifier)
Field FOR (Format) is insufficient in itself to completely describe
the format of the record file, (e.g., for identifying a particular
tag set/specification for Z39.2 records or a particular DTD for
SGML records. The FQF (Format Qualifier) field would be used in
conjunction with the FOR (Format) field but its use would remain
optional. It is proposed that the FQF field follow immediately
after the FOR field in field sequence. The content of the FQF
field would be taken from a list of formats (e.g., similar to the
list of MARC format types in the Z39.50 Registered Record
Syntaxes** and DTDs. For SGML files the DTD is indicated by the
highest level tag in the document instance (or in the tag DOCTYPE
in the DTD itself).
**(//www.loc.gov/z3950/agency/objects/syntax.html)
Examples: FQF USMARC
FQF BOOK SYSTEM "iso12083-book.dtd" (DTD
specified in ISO 12083)
Proposal 6. Add optional fields CS<0-n> (Character Set<0-n>)
To assist specifying character sets and character set variations,
it is proposed that two sets of fields be added. The first are
CS<0-n> (Character Set <0-n>) which specify the character sets
found in the file. CS0 would specify the initial character set
needed for processing the records in the file. This indicates, at
least, the G0 set needed. It may indicate an 8-bit set in which
case it is more than the G0 set. For USMARC, it can be specified
as either ASCII (the G0 part of the USMARC character set) or as
USMARC.
CS1 indicates an additional set needed in the file; CS2 indicates
another character set used in the file; etc. The content of each
CS<0-n> would equate to a particular international standard
character set identifier (e.g., extended Latin ISO 5426 - 1983), an
ISO registration number (e.g., Registry #37), text (e.g., USMARC),
or a reference to a private character set. If the field content
represents a private character set then the reader should be
pointed to the NOT field (Notes) for further information on
processing requirements or the REP (Reply To) for a person to
contact. An occurrence could specify an additional control set
such as ISO 6630.
The use of the CS0 fields is redundant for USMARC records. Once
the USMARC format is defined (in FOR and FQF), the initial
character set is implied. In the USMARC context, the use of the
CS1-n fields are also redundant as character sets are specified in
each record. (The absence of an 066 implies USMARC Roman is used.)
The USMARC 066 field of a record identifies (implicitly or
explicitly) all character sets used in the record. This may be
different for other MARC formats, however. Likewise an SGML DTD
indicates the character sets internally in the CHARSET tag
(although it is not carried in a document instance that does not
have the DTD attached).
Example: CS0 USMARC Roman
Proposal 7. Add optional fields CV<0-n> (Character Variation<0-n>)
It is proposed that an additional field be used to provide
information on variations to the character sets specified in
CS<0-n>, if the sets noted 1) are not used strictly according to
the standard, 2) have options for some positions that need to be
specified, or 3) have additional characters in positions that are
undefined in the standard.
Example: CS0 ISO 646-Basic
CV0 2/3=number sign; 7/14=umlaut
CS1 ISO 5426
CV1 4/9 not used
Proposal 8. Add optional field FDI (Final Destination
Identification)
The FDI field in intended to assist those organizations that
exchange bibliographic information with a large recipient community
in identifying the intended customer. The field would be used to
contain the name or identifier of the final-destination database.
An example requiring this method of identification would be a PUT
transfer to a central customer point, and additional information is
required by this central point to determine the final destination
for the records.
It is proposed that this field follow the ISS field (Issue).
ATTACHMENT A
FTP LABEL EXAMPLE
DAT##19951221211236.0#
RBF##1564#
DSN##LOC.BOOKS.DIST.DATA.D951221#
ORS##DLC#
CID##US#
DTS##19951222013000.0#
DTR##1995122119951221#
FOR##M#
FQF##USMARC#
DES##MUMS Books Daily DQ#
CS0##USMARC#
CS1##USMARC Hebrew#
VOL##V21#
ISS##I50#
FDI##Hebraic Resource File--RS10#
REP##[email protected]#
NOT##Test set of Hebrew records#
"#" at end-of-field in above example is not a
space, but is a graphic character ("#")
ATTACHMENT B
PROPOSED CHANGES TO THE FTP FILE LABEL
Below is a summary of the enhanced file label
specification with changes indicated. [] indicates text
to be deleted; <> indicates text to be added.
Tag Element Name Description M/O F/V R/NR
DAT Date Compiled YYYYMMDDHHMMSS.F M F NR
RBF Number of Records Numeric M V NR
DSN Data Set Name Alphanumeric M V NR
ORS Origin. System ID Alphanumeric M V NR
<CID Country ID Alphanumeric O F NR>
DTS Date Sent YYYYMMDDHHMMSS.F O F NR
DTR Dates of Records YYYYMMDDYYYYMMDD O F NR
FOR Format Alphanumeric [O] <M> F NR
<FQF Format Qualifier Alphanumeric O V NR>
DES Description Alphanumeric O V R
<CS0-n Character Set 0-n Alphanumeric O V NR>
<CV0-n Char. Var. 0-n Alphanumeric O V NR>
VOL Volume Alphanumeric O V R
ISS Issue Alphanumeric O V R
<FDI Final Dest. ID Alphanumeric O V NR>
REP Reply to Alphanumeric O V R
NOT Note Alphanumeric O V R
DAT (Date compiled): Mandatory; Fixed length; Not
repeatable. This is the date the originating system
completed the compilation of the file of records. This
is not the date of the creation of the records contained
in the bibliographic file. The field is recorded
according to Representation for Calendar Date and Ordinal
Date for Information Interchange (ANSI X3.30) and
Representations of Local Time of the Day for Information
Interchange (ANSI X3.43). The date requires 8 numeric
characters in the pattern yyyymmdd (4 for the year, 2 for
the month, and 2 for the day; right justified and zero
filled). The time requires 8 numeric characters in the
pattern hhmmss.f (2 for the hour, 2 for the minute, 2 for
the second, and 2 for a decimal fraction of the second,
including the decimal point). The 24-hour clock is used.
RBF (Number of records in file): Mandatory; Variable length;
Non-repeatable. This element includes the number of
logical records contained in the file of USMARC records.
DSN (Data Set Name): Mandatory; Variable length; Not
repeatable. The filename of the file of USMARC records
(which is sent separately) for which this is a file
label.
ORS (Originating system ID): Mandatory; Variable length; Not
repeatable. The name of the system that compiled the
files of records. This could be a symbol (e.g., OCLC or
NUC) or text.
<CID (Country ID): Optional; Fixed length; Not repeatable.
The country identifier of the system that compiled the
files of records. The identifier would be taken from
Codes for Representation of Names of Countries (ISO
3166).>
DTS (Date sent): Optional; Fixed length; Not repeatable.
This is the date of transmission of the file of USMARC
records. The field is recorded according to
_Representation for Calendar Date and Ordinal Date for
Information Interchange_ (ANSI X3.30) and Representations
of Local Time of the Day for Information Interchange
(ANSI X3.43). The date requires 8 numeric characters in
the pattern yyyymmdd (4 for the year, 2 for the month,
and 2 for the day; right justified and zero filled). The
time requires 8 numeric characters in the pattern
hhmmss.f (2 for the hour, 2 for the minute, 2 for the
second, and 2 for a decimal fraction of the second,
including the decimal point). The 24-hour clock is used.
DTR (Dates of records): Optional; Fixed length; Not
repeatable. This includes inclusive dates of last
transaction of the records in the file, i.e. the first
and last date recorded in the 005 fields of the file of
records. The field is recorded according to
_Representation for Calendar Date and Ordinal Date for
Information Interchange_ (ANSI X3.30). The date requires
16 numeric characters in the pattern yyyymmddyyymmdd (4
for the year, 2 for the month, and 2 for the day for each
date; right justified and zero filled).
FOR (Format): <Mandatory>; Fixed length; Not repeatable.
This element designates the format of the records,
generally M for <Z39.2 or ISO 2709> (MARC) <, S for ISO
8867 (SGML)>.
<FQF (Format qualifier): Optional; Variable length; Not
repeatable. This element provides additional description
of the format of the record file. For example, it may
identify a particular tag set/specification for MARC
records or a particular DTD for SGML records. For MARC
formats, the content of the FQF field may be text or a
code from the list: Z39.50 Registered Record Syntaxes.
For DTDs, the content is the identifier in the DTD
DOCTYPE field.>
DES (Description of records): Optional; Variable length;
Repeatable. This element describes the records. The
data could be coded or describe a product name. (For
example, OCLC uses B for Bibliographic describing a data
type; CDS may use a product name, such as MDS-Books All.)
<CS0-n (Character set <0-n>): Optional; Variable length; Not
repeatable. These fields specify the character sets (control
and/or graphic) needed for processing the record data file.
The field content is text indicating a particular set (e.g.,
ISO 646-IRV, ISO Registry #37, USMARC, or a reference to a
private character set). CS0 indicates at least the G0 set and
CS<1-n> indicate other sets in the file.>
<CV0-n (Character variation <0-n>): Optional; Variable length;
Repeatable. These fields are used in conjunction with the CS
fields and contain a textual description of the variations
from the set specified in the corresponding CS field.
Variations may be because the set noted 1) is not used
strictly according to the standard, 2) has options for some
positions that need to be specified, or 3) has additional
characters in positions that are undefined in the standard.>
VOL (Volume): Optional; Variable length; Repeatable. This
may be used if it is desirable to assign a volume number
when distribution of records is by subscription. Each
file within a subscription year may be given a volume and
issue number.
ISS (Issue): Optional; Variable length; Repeatable. This
may be used if it is desirable to assign a volume and
issue number when distribution of records is by
subscription. Each file within a subscription year may
be given a volume and issue number. It may be combined
with Volume (e.g., V1402).
<FDI (Final destination ID): Optional; Variable length; Not
repeatable. This field would contain the name or
identifier of the final-destination database.>
REP (Reply to): Optional; Variable length; Repeatable. This
field contains an address given as a contact for
problems/questions in transmission. It may include an
Internet or postal address.
NOT (Note): Optional; Variable length; Repeatable. This
field contains textual information or messages about the
file.