DISCUSSION PAPER NO. 94

DATE: December 21, 1995
REVISED:
NAME: Proposed Changes to FTP File Label Specifications for Electronic Files of USMARC Records
SOURCE: Library of Congress
SUMMARY: This paper discusses changes that have been proposed by the participants in the European CoBRA FLEX Project 10164 for the file label that is used for files of USMARC records transferred via the File Transfer Protocol (FTP). Additional fields are proposed that have been deemed necessary for exchange of records in a variety of MARC formats.
KEYWORDS: FTP Label; File Transfer
RELATED: DP61 (Jan. 1993); 93-9 (June 1993)
STATUS/COMMENTS:
12/21/95 - Forwarded to USMARC Advisory Group for discussion at the January 1996 MARBI meetings.
1/22/96 - Introduced at the USMARC Advisory Group. Participants asked to respond over the list by February 29.
DISCUSSION PAPER NO. 94:  Changes to FTP File Label Specifications

1.   INTRODUCTION

The European library community has been investigating the use of
the Internet File Transfer Protocol (FTP) for the electronic
exchange of bibliographic data.  The European Commission's
Libraries Programme through CoBRA (Computerized Bibliographic
Record Actions) has funded the FLEX (File Label EXchange) Project
10164 to investigate the need for standards in this area, and "to
suggest a suitable file labelling and naming format".

The participants in the FLEX Project understand that without
standardization in the way files are described within the label
file, it would become increasingly difficult to exchange
bibliographic information internationally.  Because the USMARC
specification for electronic file transfer has been widely reviewed
by the USMARC community and is now in use by many exchange partners
of bibliographic records, the FLEX project participants have
proposed that the USMARC specification be used as the base
specification.  However, they have proposed some enhancements to
that specification to take into account a European dimension for
exchanging and processing bibliographic data.

In addition, the FLEX Project participants have suggested a file
naming convention for use when certain operating system constraints
apply.


2.   PROPOSED ENHANCEMENTS TO THE CURRENT SPECIFICATION

[In the tables below "M/O" will be the abbreviation used for
"mandatory/optional"; "F/V" will be the abbreviation used for
"Fixed length/Variable length"; "R/NR" will be the abbreviation
used for "Repeatable/Not Repeatable".]

Arising from the consultation process and a workshop held on
October 24, 1995, at the National Computing Centre offices in
London, the following enhancements to the base file label
specification have been proposed.

Proposal 1.  Label Character Set

It is proposed that the character set of the label file conform to
ISO 646 (and that this be specified).  ISO 646 is equivalent to
ASCII when properly specified.

Proposal 2.  End-of-field Character

It is proposed that the current end-of-file marker (X'1E') be
replaced with one that any operating system can easily supply.  The
"New line" character is the suggested replacement.

Proposal 3.  Enhancement to the ORS Field (Originating System ID)

In order to ensure that field ORS content is unique, it is proposed
that current field content be preceded by a country identifier
followed by a space.  The country identifier would be the two-
character alpha code defined by ISO 3166 (Codes for the
Representation of Names of Countries).  It is proposed that
inclusion of the country identifier be optional.

Proposal 4.  Enhancement to the FOR Field (Format)

It is proposed that the existing field FOR (Format) be made
mandatory to identify the structural format standard used for
records in the file.  For example, "M" = Z39.2 (or its equivalent
ISO 2709), and "S" = SGML.

Proposal 5.  Specification of a New Field FQF (Format Qualifier)

Field FOR (Format) is insufficient in itself to completely describe
the format of the record file, e.g., for identifying a particular
tag set/specification for Z39.2 records or a particular DTD for
SGML records.  Therefore, a new label field FQF (Format Qualifier)
is proposed with the following attributes:

Tag  Element Name        Description    M/O  F/V  R/NR

FQF  Format Qualifier    alphanumeric    O    V    NR

The field would be used in conjunction with field FOR (Format) but
its use would remain optional. It is proposed that field FQF follow
immediately after field FOR in field sequence.  The content of the
FQF field would be taken from a list of formats and DTDs.

Proposal 6.  Specification of a New Field CSI (Character Set
Initial)

To enable the processing of variations in character set, it is
proposed that two new label fields are specified.  The first is
field CSI (Character Set Initial) which would specify the initial
character set / graphic set needed for processing the record data
file.  The field content would equate to a particular international
standard character set, e.g. basic Latin (ISO 646), extended Latin
(ISO 5426 - 1980), Greek (ISO 5428 - 1980), USMARC set, or a coded
reference to a private character set.  The proposed field would be
represented as follows:

Tag  Element Name             Description    M/O  F/V  R/NR

CSI  Character Set Initial    alphanumeric    M    V    NR

If the field content represents a private character set field NOT
(Notes) can contain for further information on processing
requirements and/or field REP (Reply To) can conatin a person to
contact.  It is proposed that this field be Mandatory.

Proposal 7.  Specification of a New Field CSE (Character Set
Extension)

It is proposed that an additional field be used to provide
information on character set variations and extensions.  The
proposed field would be represented as follows:

Tag  Element Name             Description    M/O  F/V  R/NR

CSE  Character Set Extension  alphanumeric    O    V    R

This field would be used in conjunction with the field CSI and
would contain a particular ISO 2022 escape sequence(s) or a textual
description.  Although specified as an optional element, it is
proposed that if extensions are being used then the field would be
present.  It is further proposed that the CSI and CSE fields follow
the DES field in field sequence.

Proposal 8.  Specification of a New Field CID (Customer ID)

To assist those organizations that exchange bibliographic
information with a large recipient community in identifying the
intended customer, it is proposed that a new field be specified to
identify the customer.  The proposed field would be represented as
follows:

Tag  Element Name        Description    M/O  F/V  R/NR

CID  Customer ID         alphanumeric    O    V    NR

The field would be used to contain the name or identifier of the
end customer or recipient database.  An example requiring this
method of identification would be a PUT transfer to a central
customer point, when additional information is required by the
customer to determine the final destination for the records.  It is
proposed that this field follows field ISS (Issue).


3.   SUMMARY OF THE PROPOSED LABEL SEQUENCE

Below is a summary of the enhanced file label specification with
changes indicated. (Information between [] proposed for deletion;
between <> proposed for addition.)

  Tag     Element Name        Description         M/O  F/V  R/NR
  ---     -----------------   ------------------- ---  ---  ----
  DAT     Date Compiled       yyyymmddhhmmss.f    M    F    NR
  RBF     Number of Records   numeric             M    V    NR
  DSN     Data Set Name       alphanumeric        M    V    NR
  ORS     Orig. System ID     alphanumeric        M    V    NR
  DTS     Date Sent           yyyymmddhhmmss.f    O    F    NR
  DTR     Dates of Records    yyyymmddyyyymmdd    O    F    NR
  FOR     Format              alphanumeric     [O]<M>  F    NR
 <FQF     Format Qualifier    alphanumeric        O    V    NR>
  DES     Description         alphanumeric        O    V    R
 <CSI     Charac. Set Initial alphanumeric        M    V    NR>
 <CSE     Charac. Set Exten.  alphanumeric        O    V    R>
  VOL     Volume              alphanumeric        O    V    R
  ISS     Issue               alphanumeric        O    V    R
 <CID     Customer ID         alphanumeric        O    V    NR>
  REP     Reply to            alphanumeric        O    V    R
  NOT     Note                alphanumeric        O    V    R


4.   FILE NAMES FOR ELECTRONIC FILE EXCHANGE

As there is wide variation in local file naming conventions and due
to difficulties presented by application and operating system
software, it is proposed that the content of the file name be left
to the exchange partners to agree on an appropriate format.  This
has the benefits of:

     -  Exchange partners that require long file names to
     adequately describe the product being transferred will not be
     constrained by a file naming convention that would need to be
     designed to meet the limitations imposed by short file names
     i.e. eight characters and a three character extension.

     -  It would not be necessary to change existing application
     software, e.g. download and upload routines.

However, it is proposed that where operating system constraints
mandate filenames that contain eight characters (with a three-
character extension), and the exchange partners feel that a naming
convention would be desirable, a file extension convention is
suggested.  The following file extensions (uppercase) are suggested
to differentiate between the label file and the data file.

     The label file will take the extension  .TXT
     The record file will take the extension .DAT


5.   FTP LABEL EXAMPLE

  DAT##19951221211236.0
  RBF##1564
  DSN##LOC.BOOKS.DIST.DATA.D951221
  ORS##US DLC
  DTS##19951222013000.0
  DTR##1995122119951221
  FOR##M
  FQF##USMARC
  DES##MUMS Books Daily DQ
  CSI##USMARC
  CSE##USMARC--Hebrew
  VOL##V21
  ISS##I50
  CID##Middle East Library--RS6
  REP##[email protected]
  NOT##Test set of Hebrew records


6.   QUESTIONS FOR FURTHER DISCUSSION

     1. Are these enhancements to the FTP file label useful to the
     North American community?

     2. Should the current end-of-field character be replaced, or
     should more flexibility be allowed with regard to label field
     termination?

     3. Is there a requirement for variability in the label
     character set?

     4. Is the two-part format specification useful?  It allows the
     U.S. to use only FOR as it has done, but to be clearly
     understood internationally, FQF needs to also be used.

     5. Comments on character set specification fields?

     6. Are there North American uses for the CID field?
Go to:
Library of Congress
Library of Congress Help Desk (09/03/98)