Library of Congress

Program for Cooperative Cataloging

The Library of Congress > Program for Cooperative Cataloging > PCC Standing Committee on Automation > PCC Standing Committee on Automation (SCA)

Task Group on Journals in Aggregator Databases


January 2000

Task Group Members:
Jeanne Baker (U. Maryland); Matthew Beacom (Yale); Karen Calhoun (Cornell); Eric Celeste (M.I.T.); Ruth Haas (Harvard); Jean Hirons (LC liaison); Oliver Pesch (EBSCO liaison); John Riemer (U. Georgia) -- Chair



The PCC SCA Task Group on Journals in Aggregator Databases has investigated and is making recommendations for a useful, cost-effective and timely means for providing records to identify full-text electronic journal titles and holdings in aggregator databases. The following report contains the working assumptions governing the group's work, recommended best strategies for creating MARC bibliographic record sets, recommendations on the content of two types of vendor-supplied records as well as on the range of ways they need to be maintained, a report of progress on the demonstration project with EBSCO, some information about related projects, and next steps for these bibliographic control endeavors.

The predecessor of the SCA task group was formed in spring 1998 at the CONSER Operations Committee meeting. A number of CONSER representatives have mandates from their reference departments to provide better bibliographic control of full-text journals available in services like Bell & Howell's ProQuest Direct, EBSCO's Academic Search Elite, and other products widely available to library users. The CONSER task force began by surveying CONSER libraries--what measures are being taken and what would CONSER libraries like to be to be doing? The results of that survey are available in the July 1998 issue of CONSERline.

Key findings were that CONSER libraries are using numerous methods for providing access to full text titles in aggregations, including the CONSER single-record approach for online versions, lists on a Web page, paper guides, and separate catalog records in the library online catalog. Many of those surveyed noted lack of staff time as the biggest obstacle to providing cataloging and maintenance for these titles.

At the fall 1998 PCC Policy Committee meeting, the CONSER task force was encouraged to survey the broader library community. The survey, which was conducted before ALA Midwinter 1999, indicated that 71% of the 62 responding libraries want records in the OPAC to represent the full-text journals available in aggregators, and 75% are interested in purchasing record sets. Half were willing to pitch in and do some of the work to create the records. Respondents submitted twenty single-spaced pages of comments, indicating a high level of interest in this topic.

Providing access to the full text journals in aggregators made the agenda at the Big Heads (ALCTS Technical Services of Large Research Libraries) discussion group at ALA Midwinter 1999. Aggregators were also the topic of the ALCTS Catalog Management Discussion Group in Philadelphia. Before the speakers at that program could walk out of the room, Oliver Pesch of EBSCO offered his organization's participation in a demonstration record-creation and loading project.

Following that ALA Midwinter meeting, the CONSER task group moved under the aegis of the PCC SCA. A charge was prepared to reflect our SCA group's responsibility for recommending vendor record content, for demonstrating the feasibility of automated generation of record sets, and for communicating preliminary specifications to the appropriate vendors.

Since the early months of 1999, the majority of our time has been devoted to

  • developing a data element list of what should be included in an aggregator analytic record
  • developing a set of working assumptions to guide our efforts and to explain our thinking to others
  • completing a demonstration project with EBSCO
  • preparing recommendations for maintaining the accuracy of data in record sets, once a vendor makes a set available
  • meeting and discussing the issues with representatives from other vendors that produce aggregations
  • raising awareness in the library community, through writing, speaking, and correspondence, of the issues pertaining to journals in aggregator databases

Working Assumptions

The work of our task force has been guided by the following working assumptions.

  1. Aggregator analytic records need to contain sufficient fields such that they could stand alone in an OPAC as separate records, since for many serial titles the aggregator will be the only local source. (In other words, the analytic record must adequately serve in lieu of the print or other record that will be present in the OPAC of another library.)
  2. Records will need to include those fields needed for deduplication against the existing hard-copy version records in an OPAC, for those libraries concerned about avoiding multiple hits for a given title. (In other words, we need to preserve their option to perform an additional processing step upon loading a set of records, toward application of the single-record technique.)
  3. Records need to contain data elements that ensure the possibility of their partial or complete removal from the OPAC in the event of a subscription cancellation.
  4. Data in records will primarily be a subset of that found in a record for a hard-copy version. Creation of these records by deriving data from other bibliographic records, followed by necessary modifications, may be a strategy that institutions/vendors choose to follow.
  5. It will be desirable for interested libraries to obtain a single record in an OPAC, reflecting coverage by multiple aggregators through repeatable fields. It would greatly facilitate local application of the single-record technique if bibliographic utilities would collect all the coverage information onto a single aggregator analytic record. However, some record sets will be available only from a non-bibliographic utility source. Libraries may choose to consolidate 773s onto a single OPAC record, for a fuller list than will exist in a bibliographic utility. A library interested in having separate bibliographic records for every version in an OPAC could theoretically obtain such from a bibliographic utility plus individual vendors; no consolidation steps would be undertaken upon loading record sets.
  6. Loading records will involve prior local customization steps such as decisions on classification number fields, selection of a URL for the 856 field(s), deletion/suppression of irrelevant 773 fields, adding other desired fields like 655s, etc.
  7. For analytic records that libraries must create, holdings data might be too ambitious at this early stage. If holdings information happens to be supplied by a vendor at this time, an ANSI/NISO 39.71-1999 summary statement could be placed in subfield $3 of the 856. Inclusion of holdings data should be accompanied by a plan to maintain its currency.

Recommendations for Record Creation

A. The best in terms of quality is human-created analytics.

Advantage: Thoughtful application/update of authorized subject headings, inclusion of absent classification, etc.

Disadvantage: Production of the records doesn't go nearly fast enough when one faces an aggregator database larger than a couple hundred serial titles.

In summer 1999, the SCA and our task group wrote to OCLC encouraging OCLC to move forward energetically and quickly to meet the need for WorldCat Collection Sets for journals in publishers' and scholarly organization's aggregations. OCLC representatives responded with an update on enhancements to WorldCat Collection Sets. As of late August 1999, sets were being maintained for Project Muse, JSTOR, Academic IDEAL, Elsevier PEAK, ARTFL, and ECO. Sets were in progress for Wilson Select, Kluwer, Springer LINK, Wiley, the American Physical Society, and netLibrary. OCLC users have indicated they would like sets for IAC Infotrac, Lexis Nexis Academic Universe, Bell and Howell Proquest Direct and ABI Inform, and any full-text OCLC FirstSearch database. OCLC requested our task group's assistance in recruiting libraries to create record sets and/or maintain them.

B. The next best is machine-derived analytics.

Machine-derived analytics are MARC records that are produced by computer programs from the content of an existing human-created record, such as the print version, as the basis of an electronic version record.

Advantage: Authorized access points (subject headings, corporate added entries) for a minimum of effort. Records stand alone well in lieu of those for the print versions that the library does not have. An example of this approach is the EBSCO demonstration project, described later in this report.

Disadvantage: Assumes availability and affordability of the necessary cataloging records for other (usually print) versions.

C. Third best is machine-generated analytics from vendors.

Machine-generated analytics are MARC records that are produced by computer programs from data elements provided by the vendor. They contain many default values and do not use existing human-created records as the basis of the electronic version records.

Advantage: Most of the desired data elements are addressed in some fashion, albeit by defaults, in the absence of cataloging staff at vendor.

Disadvantage: Does not provide LCSH or authority control on corporate body names. Subject headings assigned may be nonspecific, broad categorizations.

D. Fourth best is local scripting by a single institution to create minimal-level record sets from vendor-supplied title/ISSN listings.

Advantage: This at least gets something into the OPAC, and it requires minimal resources.

Disadvantage: Individual libraries must do the work themselves, and the sets they produce are difficult or impossible to share with other libraries. No subject or corporate body access is included at all. This method is least likely to support deduping and record consolidation, especially if the ISSN is not available.

E. A final option is creation of a single, combined index of serial title coverage within all the aggregator databases to which an institution subscribes. An example can be seen at the Online Journal Search page (external link) of the Virginia Commonwealth University libraries.

Advantage: This is better than nothing, in that, if one-stop shopping in the OPAC is not offered, at least there is only one additional place to look.

Disadvantage: Same as D plus nothing in the OPAC.

Recommendation: Method A is best for aggregators containing no more than a couple hundred titles. For larger ones, our task group recommends Method B as the ideal; the remaining options are available in situations where machine-derived records are not possible.

Recommendations for Record Maintenance

An adequate maintenance strategy for access to the contents of aggregator databases must address at least the following issues:

(A) Overall Record Set Distribution/Delivery

There appear to be two main strategies.

1) Reissuance of the entire set

Advantage: All subscribers are given the same thing at the same time; records contain a 005 field a date of issuance; after a periodic reload into the OPAC, all records remaining with a prior date are deleted; seems well suited to institutions loading separate records; workflow may lend itself to a fairly automatic maintenance routine.

Disadvantage: Large amount of data to distribute and process each time; daunting to consolidate data over and over again.

2) Distribution (after the initial base set ) merely of records flagged New, Changed, or Deleted

Advantage: There is a smaller amount of data to distribute each time; for those institutions committed to consolidating data, attention is drawn only to the records needing maintenance/loading with each batch.

Disadvantage: Risk of a botched distribution or load effort. If something goes wrong, the library might have to start all over, just as with a loose-leaf volume whose insert pages are processed incorrectly even one time.

(B) Added & Dropped Titles

For titles newly added, the records need to be derived or generated in the same fashion as the records in previously-issued sets. For vendors accustomed to deriving the records from existing cataloging records, newly-published titles that are aggregated immediately upon issuance may pose a challenge for timely cataloging. For titles being dropped, a distinction needs to be made between those being retrospectively revoked (bibliographic record needs removal from OPAC) and those whose future volumes will not be forthcoming (summary holdings statement needs closing out). The former could be designated with a Leader/05 "d" (record status "deleted"), and the latter with "c" (for "corrected or revised").

(C) Changes in the Volume Coverage for Each Title

As the range of volume coverage changes, particularly with those titles being prospectively frozen or extended farther back in time, it is important that the analytic records promptly reflect the changes in the summary holdings statement. Catalog users must be able to see differences in volume coverage when they exist among multi-aggregator coverage of a given title. The same Leader/05 coding cited in (B) would apply here.

(D) The Completeness of the Content for the Volumes Covered

When less than all the print version content (all the articles, all the illustrations, etc.) is available in the aggregator database, this should be noted with the summary holdings statement. Whenever the generalization changes, that information should be modified in the aggregator analytic records.

(E) Currency of the URLs in the Aggregator Analytic Records

Maintenance should be done at least as often on these records as for other bibliographic records residing in an OPAC. The maintenance represents much more of an issue when the URL takes a user directly to the title.

(F) Creation of New Records for the Conventional Changes of Title for Serials Included in an Aggregator Database

The processing of serial title changes is just as necessary in this setting as it is for holders of the print versions of records. Consideration should be given to extending any existing title-change notification service used in the library world to vendors (some changes in title wording necessitating a new record are admittedly subtle and easily overlooked). Libraries with sufficient authorization to process title changes on the master versions of a bibliographic record ought to be encouraged to look kindly on any aggregator analytics in need of the same processing found within the utility.

(G) Cancellation/Change of a Subscription Necessitating Complete Removal of a Record Set

The flag in bibliographic records designed for identification of all the records within a set of aggregator analytics should remain stable throughout the lifetime issuance of a set. If a change of a vendor's name or product is allowed to affect that flag, it may affect a subscriber's ability to remove all of a record set. If such modernization is imperative, the customization of record sets available to subscribers should include the possibility of perpetuating an additional, old-style flag.

Recommendation: The records in any set need to be maintained along all the lines above. Our task group recommends distribution of the entire record set each time the OPAC is to be updated (A1) as the simpler, safer method of maintenance. We believe this method is safer because we have our doubts about the permanence of vendor record ID numbers that could be used as the basis of repetitive overlay.

Proposed Data Elements--Machine Derived Records

(These fields are taken from a cataloging record for another version of the title.)

All Leader and 006/007/008 bytes as appropriate

001 Control number
003 Control number identifier
022 International Standard Serial Number
035 System control number(s)

1XX Main entry
240 Uniform title
245 Title statement (insert $h)
246 Varying form of title
250 Edition statement
260 Publication, etc. (Imprint)
310 Current publication frequency
362 Dates of pub., vol. designation
4XX Series statement
5XX Notes
6XX Subject added entries
700-730 Name/title added entries
773 Host item entry
780/785 Preceding/Succeeding entry
8XX Series added entries
856 Electronic location and access ($3, $u only)

This list grew out of the Core record requirement codes on the CONSER Web site.

Proposed Data Elements--Machine Generated Records

(For use when cataloging records are unavailable to consult)

All Leader and 006/007/008 bytes as appropriate

< Leader default:
byte 05 n or c or d -- depending on if new or corrected record, or if record is to be deleted
byte 06 a -- for language material
byte 07 s-- for serial
byte 17 z -- for not applicable
byte 18 u -- for unknown conformance to AACR2 rules
byte 20 4 -- for length of the length-of-field portion of entry map
byte 21 5 -- for length of the starting-character-position portion of entry map
byte 22 0 -- for length of the implementation-defined portion
byte 23 0 -- for undefined entry map character position
(all other bytes would default to "blank")>

< 006 default:
006/00 m - for computer file
006/09 d - for document
(all other bytes would default to "blank")>

< 007 default:
007/00 c - for computer file as a category of material
007/01 r - for "remote" as a specific material designation
007/02 'blank' since that byte is no longer used for anything
007/03 a - for one color (black-and-white counts as one color)
007/04 n - for not applicable in the case of remote resources
007/05 u - for unknown sound content in the resource>

< 008 default:
bytes 00-05 yymmdd -- for date record created
byte 06 c or d or u -- for continuing or dead, according to serial's publication status; status would be u for unknown if the holdings statement does not reach into the most recent 3 years
bytes 07-10 -- year the serial began publication (not first year of full-text availability) copy from 362 field
bytes 11-14 -- year the serial ceased publication, or "9999" if open-ended
bytes 15-17 xx_ - for unknown place of publication
byte 18 u -- for unknown frequency
byte 19 u -- for unknown regularity
byte 20 'blank' -- for agency assigning the ISSN
byte 21 p -- for periodical as type of serial
byte 22 'blank' -- for form of original item
byte 23 'blank' -- for form of item
byte 24 'blank' -- for nature of original work
bytes 25-27 'blanks' -- for nature of contents
byte 28 u -- for unknown if a government publication
byte 29 zero -- for not a conference publication
bytes 30-32 'blanks' -- for undefined
byte 33 'blank' -- for original alphabet/script code
byte 34 zero -- for successive entry record
bytes 35-37 language code -- first 3 letters of the name of language, in English. Exception: use 'jpn' for Japanese
byte 38 'blank' -- for a (non)modified record
byte 39 d -- for non-LC source of cataloging record>

001 Control number
<vendor's control number, if any>
003 Control number identifier
<USMARC organization code for the vendor>
022 International Standard Serial Number
035 System control number(s) (USMARC org. code is parenthesized at beginning of the field)
<vendor file ID (portion of this field) would be (a) key to record removal if subscription to aggregator database is canceled>

1XX Main entry
<This field would probably be used much less often when records are not being derived. If the vendor's brief listing of titles gave the body name and generic title, separated by a period, this field could be used.>
245 00 Title statement (insert $h)
<Omit initial articles for sake of titles indexing correctly, and set both indicators to zero.>
250 Edition statement
<Might be applicable if online version only equates to one audience or geographic edition>
260 Publication, etc. (Imprint)
<reflects publisher of original version; publication date ($c) can be omitted>
310 Current publication frequency
362 Dates of pub., vol. designation
<reflects facts of original publication, not range of volumes covered in aggregator>
4XX Series statement
<if applicable>
500 General note
<Standard wording: Record generated from vendor title list.>
506 Restrictions on access note
<if applicable: "Access limited to licensed institutions.">
516 Type of computer file or data note
<"Text (electronic journal)">
530 Additional physical form available note
<if applicable: "Online version of print publication.">
538 System details note
<"Mode of access: Internet.">
653 Index term--Uncontrolled
<would be used at vendor's discretion; probably would reflect a broad subject categorization>
720 Added entry--Uncontrolled name
<1st indicator should default to "2" for non-personal name>
773 Host item entry
<Sample: 773 0_ $t Title of aggregation $d Place of publication : Publisher of aggregation, date- $x ISSN of aggregation as a whole>
780/785 Preceding/Succeeding entry
<when known to vendor>
856 Electronic location and access ($3, $u, $z only)
<$3 to represent volumes covered within the aggregation.
$z at end of field to contain user instructions. Examples:
$z Available via ProQuest Direct. Search for this journal by title.
$z Consult "field by field instructions" to qualify a search by publication. [for Lexis-Nexis]
$z Search "Publications by title" in Dow Jones Publication Library.>

Comments on Record Content

  1. The aggregator analytics will contain value "s." [This reversal from the Interim Report stems from the need to do searching like "find Title X and format serial" and the fact that at least two ILS systems do not index Leader/07 "b" (serial component part) properly.] The ideal is value "b", for use when the proper indexing becomes available in ILS systems.
  2. The Encoding level (Leader/17) of these experimental records could initially be set at "5," for "partial/preliminary."
  3. An examination of the deduping potential of the test set of records from EBSCO showed the robustness of the ISSN as a matching tool. One usually thinks of the virtue of a standard number search as retrieving a minimum quantity of records. The ISSNs present in 022 $a noticeably overcame the differences between print and microformat (as the means the library is using to permanently retain the serial) and the differences between latest and successive entry records (selected as most convenient for covering an older run).
  4. Field 245 needs to include $h to alert user that item is an electronic journal. Field 130 is too problematic to create across-the-board in records.
  5. Field 260 would contain the publisher of the original version. This would identify the serial; it would also be practical in that the analytic record would need to represent potential coverage by multiple aggregators.
  6. Field 362 would represent the facts of publication for the original version.
  7. Fields 5XX/6XX/7XX would be the same as in the print version record. No special note or added entry would be included for the aggregator(s); the 510 fields would be omitted. Field 655 would not be used.
  8. Field 856 would include only subfields $3 and $u, in that order.

Demonstration Project and Examples

Oliver Pesch, our task force's liaison from EBSCO, has been working with us since early 1999. He derived a set of records for us experimentally based on the task group's instructions. The data in these records is a subset of data from the corresponding records for the print journal found in the CONSER database. The vendor's program also adds some fields to the record.

The example (Figure 1) follows closely the model for data elements we have presented in this report. EBSCO began offering the set of some 1100 records to its customers just before ALA Annual 1999. As of this fall, 18 institutions had requested the file and EBSCO has made a number of changes to enhance its program for deriving records, based on comments and requests.

In December 1999, California State University, Northridge successfully loaded the record set into its OPAC (Geac) and reportedly is pleased with the results. The library opted to de-dup via ISSN and it placed EBSCO's URLs on its print version records.

EBSCO's machine-derivation program retains 1xx, 245, 260, and 362 from the print record. Field 245 has $h [computer file] inserted after $a, $n, $p but before all other subfields. The machine-derivation program constructs a 773 (host item entry) field to provide information about the host title (for this set, Academic search elite), publication data, and the ISSN of the set. The program also constructs field 856 subfields $3 and $u, in that order, to encode information about the materials specified and the URL.

EBSCO Machine-derived record

Figure 1. EBSCO Machine-derived record, Journal of Urban Design

The task force discussed having the program construct a 130 or uniform title field.. Public services staff and users have said they like having the title qualified by (Online) because it helps them pick out the electronic version in an index display in the OPAC. In many cases, creating a uniform title automatically would be relatively straightforward--the title has no initial articles and the title does not conflict with the title of any other serial.

When the 130 field already has a qualifier, or when a record has no 130 field, writing a program to create or revise a 130 field would not be a trivial exercise. Therefore, in the interest of helping vendors to get record sets ready and available to the library community quickly, our task force put the requirement for a 130 field aside.

At the ALA Annual meeting in New Orleans, members of our task group met with representatives of Bell & Howell and CIS, producers of ProQuest Direct and Lexis Nexis Academic Universe, to discuss the possibility of those organizations' creating sets of records for the full text journals in their aggregator databases. In preparation for those meetings, we defined the possible record content (described previously in this report) for these vendors, who may not have access to bibliographic records from which to derive their analytic records. Figure 2 is an example of what a machine-generated record for an Academic Universe title might look like, if our task group's proposed data elements were used to create the record.

Potential Machine-Generated Analytic for Academic Universe

Figure 2. Potential Machine-Generated Analytic for Academic Universe, Using Task Group's Proposed Data Elements

Related Projects

The task force investigated other developments of interest at the University of Tennessee at Knoxville (UTK), the University of Illinois at Chicago, and OCLC.

David Atkins and Bill Britten at UTK were kind enough to provide our task force with detailed information about their projects to bridge the gap between citation databases and journal holdings. One of the aggregations they have treated is Dow Jones (4270 titles); another is Proquest (1500 titles). UTK staff began by harvesting data from the vendor's Web site (lists of titles with ISSNs and coverage dates). They wrote PERL scripts to massage the vendors' lists of full-text journals, then ran the resulting text file through a utility called MarcMakr. This step created MARC records that they imported into their OPAC. Figure 3 provides an example of the public display of one of these records.

Example of UTK Machine-Generated Aggregator Record

Figure 3. Example of UTK Machine-Generated Aggregator Record

Before creating the MARC records with MarcMakr, UTK staff did preliminary work to define tags for storing the data. UTK puts the ISSN, which they need for their hook to holdings, in subfield $9 of the 022 field. They add $h [electronic fulltext] to the title and store it in field 245. Field 506 stores a note about access restrictions; 856 $u stores the URL and $z stores public notes; field 945 contains service/vendor information, dates of coverage, lag time, more access notes, and a control number for the entire set (this serves as the hook to delete the whole set globally). UTK has been able to achieve astonishing turnaround times for providing title-level access to full text and holdings information in their catalog using this technique.

The task force also contacted Karen Zuidema at the University of Illinois at Chicago. Staff there are working on H.W. Wilson Select full text for OCLC's WorldCat Collection Sets project. Zuidema and her staff are creating records in OCLC using a workform and constant data records. OCLC will later send the library a tape of the set for loading. Figure 4 provides an example. The constant data form they are using contains prompts for 022 $y, 043, 110 & 130, 245, 260, 530, 650, 710, 773, 776, 785 (succeeding entry) and 856.

Manually created analytic form U. of Ill. at Chicago

Figure 4. Manually-Created Analytic from University of Illinois at Chicago Initiative

Communications with the Library Community

There has been much interest and debate about providing access to journals in aggregator databases at library professional meetings. Numerous programs have been devoted to the topic. In keeping with our charge, members of our group have spoken and written widely about the issues and the work of our task group. Task force members have made presentations at ALA, NASIG, the American Association of Law Libraries, and elsewhere. In addition, members have maintained a lively correspondence with dozens of inquirers via e-mail. Two articles by task force members are forthcoming in Cataloging & Classification Quarterly and the ALCTS Newsletter "From Catalog to Gateway" series.

Next Steps

A number of things need to be undertaken following the work of our task group. At the November 1999 meeting of the PCC Policy Committee, the SCA chair reported on the work of our task group, noting that our initial charge is nearly complete. She asked if the task group should continue to exist and if our recommendations should be built into the PCC tactical plan. The answer to both questions was yes. Our task group was asked to prepare this final report for submission to the SCA and the Policy Committee. After that, a new charge is to be written and membership reconstituted in order to:

  1. Continue to pursue creation of sets of records for major aggregations like ProQuest and Lexis-Nexis Academic Universe.
  2. Make a list of desirable sets of human-created analytics and recruit WorldCat Collections Sets contributors from among the OCLC membership.
  3. Fully test the feasibility of loading and maintenance arrangements for one or two sets of records. To begin, conduct a survey of the libraries that have requested the EBSCO set, to get their views on loading and maintenance needs. Include questions about loading and maintenance using both the single and separate record approaches.
  4. Determine if the same specifications for serials are suitable for full-text monographs in aggregator databases, e.g. netLibrary.
  5. Ascertain more systematically what others are doing to provide access to the contents of aggregators.
  6. Mainstream bibliographic control measures for aggregators into the PCC tactical plan.
  7. Continue raising awareness in the library community of issues pertaining to journals in aggregator databases.
  8. Continue to monitor related developments pertaining to access to journals in aggregator databases.

The PCC tactical plan now contains item 1.4.4: "Encourage vendors, utilities, and PCC members to develop record sets for full text titles in aggregator databases." As this report is being completed, the new task group charge is being prepared. John Riemer will continue as task group chair.

Prepared by J. Riemer/ K. Calhoun January 2000

Back to Top