Library of Congress

Program for Cooperative Cataloging

The Library of Congress > Cataloging, Acquisitions > PCC > PCC Standing Committee on Automation > Task Group on Automated Classification

Final Report

December 29, 2000,
approved by the SCA, January 2001

Task Group members: Kyle Banerjee, Oregon State University; Matthew Beacom, Yale University; Martin Kurth, Cornell University; Louise Ratliff, University of California, Los Angeles; Gary L. Strawn, Northwestern University, Chair; Diane Vizine-Goetz, OCLC, Inc.; David Williamson, Library of Congress

Contents

Summary

Call numbers are the single most important device for the arrangement of library materials. Classification numbers are one of the primary tools for providing subject access to collections, and are one means by which access can be provided to networked resources. Although many areas of library operations have been automated successfully, automation has yet to have the effect on the assignment of classification and call numbers it has had on other aspects of the cataloging process. Because work with classification and call numbers remains largely a manual operation, it continues to constitute a significant part of the cost of cataloging.

Substantial opportunities exist for library software vendors to enhance their products to assist with the generation of classification and call numbers. Some of these opportunities call for comparatively small changes to existing modules, some call for the integration of existing features into a more seamless whole, and some call for extensive new development, perhaps in cooperation with other vendors. The following lists identifies the major functional areas we feel need to be addressed if library software is to provide effective assistance in the generation of call numbers. The provision of any of these would be a step forward; the incorporation of all of these into a single tool would be a major advance in library automation.

  • Classification schemes commonly used by libraries should be made available in the MARC format
  • Applications that enable searching and display of machine-readable classification schemes should not be solely stand-alone programs or services, but should also provide an interface that allows an external application to query the classification data to retrieve information in a useful manner, to present the results of the query to the user and otherwise to interpret and act on the results of the query
  • Library systems should be able to generate the following products from the catalog against which call number assignment is performed and present them to the operator: a list of the classification numbers most frequently associated with a subject heading, a list of the subject headings most frequently associated with a classification number, and a list of the call numbers assigned to other editions or versions of the item being cataloged
  • Library systems should be able to determine that a call number is unique in the local catalog
  • Library systems should be able to generate complete call numbers for members of classed-together series, drawing information from the authority record for the series and the bibliographic record for the analytic
  • Library systems should be able to derive complete call numbers for belletristic works from the classification numbers found in authority records for literary authors.
  • Library systems should assist in the assignment of internal and final Cutter numbers, drawing on information in a bibliographic record and information (such as subject headings in other records) present in the catalog against which call number assignment is performed.
  • Library systems should provide some means for identifying local exceptions to classification practice (based on criteria such as type of material, location, and classification number), and should provide some means for accommodating those exceptions
  • System call number indexes should be sorted in the proper order

The attached report provides additional background, descriptions, justification, and a scheme for implementation.

Background

Call numbers are one of the two principal methods by which libraries provide subject access to materials, and are the primary means for locating items on shelves. A user who identifies a particular item of interest in the catalog uses the call number directly to retrieve the item from the shelf; the call number provides the link between the surrogate in the catalog and the item described. Classification numbers make manifest the organization of knowledge embodied in a classification scheme, thereby bringing together materials on the same and related topics. A user with a general information need can find one or more likely classification numbers in the catalog and use them to browse the shelves for useful items.

The value of classification is not limited to the physical items controlled directly by an individual library, but has been extended to networked electronic resources. Classification numbers can be assigned to surrogates for Web resources (either maintained separately in a database, or embedded in the resources as part of the metadata) to enable more focused searching than is possible with mere keyword retrieval; and classification numbers can be used to organize the results of a search for networked resources, providing a more efficient display for the user.[1]

Nearly every facet of library technical services has been redesigned to take advantage of the benefits afforded by automation. The use of library utilities such as OCLC has reduced the amount of time it takes to make an accurate and complete record available in the local catalog. Acquisitions functions such as ordering, billing and claiming are handled electronically. Tools such as Cataloger's Desktop that allow rapid searching of a wide variety of resources help catalogers create records quickly and efficiently. Automated assistance has even been made available for the creation of authority records, a task which in its current realization would have been beyond the capabilities of systems in use a decade ago. All this automation has not only made possible the generation of records of more consistently high quality and at lower cost, but also increased the sharing of these records. The savings in staff time coupled with increases in quality together demonstrate the advantages to be gained from the use of sophisticated automated routines in library processing operations.

At most institutions, the assignment of classification and call numbers[2] is a process not markedly different from that followed thirty years ago. Because automated assistance has not yet been brought to these activities, today's typical cataloger must follow a sequence of operations changed from work performed thirty years ago only to the extent that the local catalog and call number list are available at the desktop, and not in another room or on another floor. The local library system at present does nothing to speed or ease the work involved other than to make records that were once held on cards available in electronic form.

Because subject analysis remains primarily a manual operation, largely untouched by advances in automation, the cost of producing call numbers has held steady while the cost of other parts of cataloging has declined; working with call numbers forms an increasingly large part of the cost of processing an item. The most important reason for this is that the processes for generating call numbers are complex and have traditionally required multiple tools that were available in paper format only. To complicate matters, many libraries establish local policies for call numbers, making it difficult for vendors to develop general tools that could be used by many libraries.

This picture is about to change. The recent availability of some classification schemes in some kind of machine-readable form, the completion of at least the bulk of retrospective conversion at many institutions (providing a suitable basis for local decisions) and continued advances in the sophistication of automation included in local library systems all point to the possibility that the era of the manually-assigned call number may be near its end. The time seems suitable for libraries to present to library automation vendors a scheme to provide automated assistance for work with classification and call numbers.

At its meeting during the 1999 ALA Annual Conference in New Orleans, the Standing Committee on Automation (SCA) of the Program for Cooperative Cataloging (PCC) approved the formation of a task group to study the possibility of automated approaches to the assignment of classification and call numbers. A charge for the group was prepared by Karen Calhoun, Chair of SCA, and approved by the PCC Board at its November, 1999 meeting. The basic idea motivating this group is that a coordinated plan for providing automated assistance with classification and call numbers would benefit both library system vendors and librarians.

The task group met as a body once, during the 2000 ALA Midwinter Meeting in San Antonio. At that meeting, a general plan for action was developed, and assignments made to members of the group. The remainder of the group's work took place via e-mail exchanges. The present document represents the task group's summary of its activities and conclusions. The task group hopes that this document will help libraries and vendors both understand what may be involved in the provision of automated assistance in call number assignment, and provide a common framework under which projects can be undertaken and evaluated.

Survey of previous efforts

As far as is known there now exists in production no comprehensive tool that draws on information held in a local library system to assist in the assignment of classification and call numbers. However, a number of projects undertaken at various institutions over the past decade have attacked one or more components of the process. The following paragraphs describe several of what we feel to be significant or notable instances of the automation of one part or another of the process of assigning call numbers. This list is not intended to be comprehensive, but simply an indication of the kind of work that has been done to advantage at various institutions.

  • Web Dewey. A browser-based version of the Dewey decimal classification scheme (schedules, tables, manual, index, built numbers from the index), enhanced with LCSH headings mapped (either editorially or statistically) to DDC numbers, and the LCSH authority records for the linked headings. Currently accessed through the CORC project home page, WebDewey can propose Dewey classification numbers for Web pages as they are harvested for cataloging in CORC; it does this by scanning the text of the Web page for appropriate grammatical constructions, and linking the keywords they contain to the Dewey classification index. WebDewey is also useful in non-Dewey libraries: the LCSH headings associated with the proposed Dewey numbers can be a useful starting point for libraries that use other classification schemes. WebDewey also performs the number-building functions available in the CD-ROM Dewey for Windows product (which WebDewey is expected to replace in the next several years). Web Dewey (like Dewey for Windows) does not interact with the local library system. It builds on technology developed at OCLC for Project Scorpion.
  • DESIRE Project [URL http://www.desire.org not working as of July 5, 2005]. [3] The Desire Project examined the construction of tools to build Internet search services. Among these is a tool for automatically assigning classification numbers to Web documents in engineering.
  • Web version of Library of Congress classification. This tool is not yet in production. It was only very recently made available for review outside the Library of Congress uses a hierarchical browse interface to identify Library of Congress classification numbers. As the user drills down through the classification hierarchy, the interface builds the corresponding classification number. The application at present is a stand-alone application, that does not interact with LC's online system (or any local system). This interface is seen as being of special advantage to those not familiar with a given area of the LC schedule, and for copy catalogers verifying a number found in an existing record. Naturally, this tool draws on only those parts of the LC classification that have been converted to machine-readable form.[4]
  • Assisted assignment of Cutter numbers. Several tools have been developed at various institutions to assist catalogers in the assignment of Cutter numbers and the completion of call numbers. Most of these are stand-alone applications that simply look up of the Cutter number in a table, or (for the LC classification) use an algorithm to derive an approximate Cutter number. The tool developed at Oregon State University takes this notion one step further. Using this tool, the cataloger stores the classification number in the Windows clipboard and types the first few letters of the main entry into a dialog box. The application calculates the Cutter number and adjusts it to conform to local practice. When the operator presses a function key, the program performs a call number search in the local catalog (combining the stored classification number and the derived Cutter number). The operator completes work on the call number without further assistance from the application. This tool reduces the time and effort needed to shelflist an item, once the operator has formulated the classification number.
  • Northwestern University's Cataloger's toolkit. The cataloger's toolkit used at Northwestern with the NOTIS system until 1998 provided, with little operator intervention, a completely-shelflisted call number. (Northwestern's main library classifies most materials according to the Dewey scheme.) The program examined the completed bibliographic record, took into account local exceptions to standard practice, consulted bibliographic and authority records in the local database at appropriate points, and asked the operator for guidance when the appropriate course of action was not certain. With a few clicks of the mouse, the operator was able to create a complete call number, fitted comfortably into the web of existing call numbers and conforming closely to local practices. (A version that works with the Voyager system is in development.) This tool did not have access to a machine- readable version of any classification schedule.

These and other pioneering efforts clearly demonstrate that substantial benefits lower cost, higher productivity and closer adherence to standards can be achieved if automated assistance for any part of the work needed to assign call numbers is available; and that benefits should continue to accrue as more parts of the process are automated and joined together to form a comprehensive set of tools. By making greater use of technologies they have developed and whose value they have already demonstrated, libraries are better equipped to handle the increasing workloads brought by the addition of items in electronic formats to the list of materials waiting to be processed.

Proposals for action

The automation of call number assignment has as yet been untouched by major system vendors. Those preparing auxiliary products (such as machine-readable classification schemes) have not yet developed packages that can function in the broader world. Such tools that exist do not typically interact with the local system, or other tools. From its examination of the process by which call numbers are assigned in their own institutions and the foregoing list of successful automation projects, the task group has identified a number of areas now ready for automation. Much of this work needs to be done by, or in collaboration with, vendors of library systems, as the tasks to be performed must be done, at least in part, in the context of the local database.

The Task Group presents this set of operations ripe for automation as a set of independent descriptions. A local library system could implement any one of the functions described below, or any given number of them, as independent tools, which the cataloger could draw on as needed. Beginning at this scale would allow institutions to reap immediate benefits from what would in many cases be a modest amount of development effort on the part of library system vendors. Applying this approach, system vendors could gain experience in the largely untested area of call number assignment, and modify these individual functions quickly as they receive feedback from system users. Such work would in effect allow libraries to replace many of the manual steps required for the assignment of call numbers with automated steps. New features could be added as systems developers gained confidence in this new area; the eventual outcome might be a tool that smoothly assists the cataloger in most aspects of call-number assignment in a unified operation; this new tool could allow libraries further to adjust workflow to take best advantage of it.

  • Correctly arrange call numbers in the call number index. Portions of the classification numbers used by some schemes should be treated as values, not as simple strings of characters. The portions of classification numbers to which this consideration applies vary from scheme to scheme.
  • Check a call number for duplication. The system searches the call number supplied by the operator against call numbers of the same type in the local database. If the number has been used in a different record, the system notifies the operator of the duplication and provides an easy means for the operator to review the associated holdings and bibliographic records. If the number has not been used in another record, the system simply reports the fact. (Optionally, the system could say nothing at all if the call number is unique.) Such a feature would not in any way prevent the operator from adding a record with a duplicate call number (call numbers might be re-used for any of several reasons), but would merely provide notification that the number had, or had not, already been used.

    This feature could also be made part of a system's bibliographic batch- loading routine. The system could test the call number in an incoming record for duplication; the system could either load all records and report duplicates, or send records with call numbers already present to a holding file for review and manual update. The system might also provide an arbitrary mechanism (such as adding an "x" to the end of the number) to distinguish otherwise-identical call numbers.
  • Complete a call number for a member of a classed-together series. The local copy of the authority record for a series should identify (either explicitly, or by default[5]) the series classification practice followed at the local institution. If the series is classed together, the authority record also contains the basic local classification number (again, identified either explicitly or by default). A local system acting either on its own initiative or upon operator request should be able to determine from the series authority record the local series classification practice; if the series is classed together according to local practice, the system should be able to add the series numbering (subfield $v) from the corresponding bibliographic series heading to the basic call number from the authority record to form the complete call number. The system should also check the completed call number for duplication, and present it to the operator for approval.
  • Use author numbers in authority records. In the absence of a locally- established classification number for a literary author, use the classification number in the author's authority record (if any) as the basis for the call number for a belletristic work.
  • Complete a call number by assigning a Cutter number. The operator provides a partially-completed call number and indicates in some manner the information in the bibliographic record on which the Cutter number should be based. The system determines the "ideal" Cutter number and investigates the relevant portion of the local call number index. The system does not simply determine whether or not the "ideal" number is already present, but examines items with similar numbers to determine the basis on which they have been assigned similar cutters, and adjusts the "ideal" number as necessary to fit comfortably with existing numbers.

    This feature would be of use in many classification schemes. For classification schemes (such as Library of Congress) that use multiple Cutter numbers, the feature could be designed to handle both the terminal Cutter (which is often based on the main entry or title) as well as internal Cutters (which are often subject-based).
  • Make use of machine-readable records for classification data. The structure of records constructed according to the MARC format for classification data parallels in broad outline that of records constructed according to the MARC authority format: there are established classification numbers, reference tracings from unused variant numbers, and reference tracings for related numbers; there are also notes and links to associated subject headings. Records for classification data could be loaded into a local file in a manner that closely parallels the manner in which library systems already handle authority records, and index entries could be generated from them. Such index entries, together with the formatted displays generated by the system from the associated classification records, could replace the print and CD-based versions of some classification schemes now available. To make the classification information even more useful, the index entries generated from classification records could be merged with index entries generated from call numbers in holdings records in the local file. The resulting hybrid index would work in a manner parallel to that provided by indexes that merge information from authority and bibliographic records, and provide products parallel to those generated from the headings index that mingles bibliographic and authority information. (Such products include lists of new classification numbers, unestablished classification numbers, and classification numbers that match a reference to some other number.)

    Records in the MARC classification format can contain elaborate instructions for building numbers; these instructions are in many instances designed so that they may be interpreted and acted on by computer programs. The potential exists for library systems that incorporate machine-readable classification data to offer assistance in the building of the complete classification number, and to validate numbers found in existing records and those newly added to the local database.

    As far as is known, there has been no move over the past decade on the part of any library system vendor to investigate the possibility of using machine-readable classification data, not even to the simple extent of making the records available as a stand-alone file for consultation.
  • Make classification schemes available in machine-readable form. The MARC format for classification data was published in 1991. Machine- readable records for the Library of Congress classification scheme began to appear in 1997. After input of all LC classes is complete, additions and changes will also be in machine-readable form. The developers of other classification schemes have not followed the lead of the Library of Congress, and are not yet working on MARC-format versions of their schemes. (For example, although the Dewey scheme is maintained in a proprietary machine-readable format that is theoretically convertible into the MARC classification format, there has been no movement to make the Dewey scheme available in this manner.) Suppliers of other classification schemes commonly used in American libraries, such as the National Library of Medicine scheme, are even less ready to deliver records in machine-readable form. Although the availability of the Library of Congress classification in MARC format will satisfy the needs of many libraries, some libraries will not be able to make use of some new system features until machine-readable records for additional classification schemes are prepared. Providers of the principal classification schemes used in libraries are urgently requested to make their schemes available in the MARC format.
  • Provide open access to machine-readable classification schemes. Systems and applications that provide access to classification schemes in machine-readable form should not be closed to outside queries, but should instead provide an interface that permits other applications to query the classification information and retrieve results in a manner amenable to further manipulation. The Library of Congress and Dewey schedules are both distributed in "electronic" form on CD-ROM. Both products store their data in proprietary formats, and neither is open to access from applications other than that provided by the included search/retrieval software. (For example, an external program cannot query the data stored on the LC CD-ROM to determine whether a classification number in a bibliographic record is valid.) The Web versions of these products suffer from similar limitations. Although both of these applications are useful, they would be even more useful if their data stores could be searched by other programs and if, were retrieval of information from them were made possible, that information were presented in a standard format (i.e., in the MARC format) instead of a proprietary format or as text.
  • Provide services that aid in the formulation of classification numbers. Local systems should provide searches and other tools that can assist the cataloger in the assignment of classification numbers. Such services include the following:
    1. Generate a list of classification numbers associated with a given subject heading.[6] For headings built according to the Library of Congress scheme, this list would at best include only cases in which the given subject heading appeared as the first subject heading in other bibliographic records. The list should be arranged in decreasing frequency of occurrence.
    2. Generate a list of subject headings associated with a given call number. Again, this list should include records in which the subject heading is the first subject heading in the record, and shold be arranged in decreasing frequency of occurrence.
    3. Notify the operator that the collection already holds other editions or versions of the work contained in the item being cataloged, and generate a list of call numbers assigned to those other editions and versions.
  • Be aware of local exceptions to standard classification practice. An institution will often choose to make local exceptions to standard classification practice. For example, an institution that normally classes materials with the Library of Congress scheme may classify materials bound for certain locations, or with certain characteristics (such as a particular type of material) in an exceptional manner.[7] Although library systems written for general use cannot necessarily be expected to be able to apply the local exceptions to classification practice, the systems should provide a means for identifying those exceptions, so the system can notify the operator to divert materials for special handling as required.

    Any of these capabilities added to a library system, or a package containing all of them as a group of separate functions, would be considered a major advance in library automation. In the longer term, to reap the greatest advantage from this development effort and to allow libraries to realize the greatest cost savings, these discrete system functions should eventually be incorporated into a comprehensive function that leads the cataloger in a single complex step to a call number ready to use in the record for an item. This comprehensive function could be built in large part by assembling into a whole the discrete functions listed above. If these discrete functions are built from the beginning with the view that they might eventually become part of such a comprehensive feature, the work of uniting them into a seamless whole will be reduced. An outline of the work performed by such a comprehensive feature might contain the following steps, performed in this order:
    1. Determine, by examining the record for the item,[8] whether the item being classified fits any defined local exceptions to standard classification policy. If so, handle the exception if possible, or present information regarding the exception to the operator.
    2. If the record for the item being classified contains any series headings, check the corresponding authority record for each to find local classification practice. If the item is a member of a series classed together, formulate the complete call number and present it to the operator.
    3. Check the local database for other representations (editions, versions) of the content carried in the item being classified. If any other representations are present, notify the operator; as requested by the operator, base the call number for the current item on one of the existing call numbers.
    4. If the bibliographic record for the item being classified already contains a call number of the proper type, check it for validity; if the number is valid, check it for duplication in the local file. Notify the operator if the number is a duplicate; adjust the Cutter number as necessary and appropriate.
    5. If the record for the item being classified contains a suggested classification number of the proper type, check it for validity. If the number is valid, complete it by assigning a Cutter block and checking the resulting number in the local index.
    6. If none of the above conditions applies, generate a list of likely classification numbers by drawing on information in the record for the item being classified, and present the results to the operator. If the operator selects one of these numbers, complete it by assigning a Cutter block and checking the resulting number in the local index.

Closing remarks

The quickly-changing world in which we now all operate places an increasing strain on the providers of information services. We must all continually examine the tasks we perform, to make sure they continue to be necessary; and we must continually seek ways to perform those tasks in the most effective manner. The generation of classification and call numbers is a complicated task that continues to be important in the networked environment. The next few years should see the creation of automated tools that will help those assigning classification and call numbers to bibliographic records and other types of metadata. The development of such tools will require cooperation among librarians, other information brokers and library system vendors, and will constitute a significant advance in library automation. This work will bring automation to at least a part of the task of subject analysis, the last large portion of library technical services to receive such treatment.

Footnotes

  1. BUBL LINK (bubl.ac.uk/link) is one example of electronic resources arranged by Dewey classification numbers. For an overview of the use of classification numbers to arrange Internet resources, see The role of classification schemes in Internet resource description and discovery
    (http://www.ukoln.ac.uk/metadata/desire/classification/ (external link)).

    Back to text
  2. The assignment of call numbers involves two principal steps (which may take place simultaneously): subject analysis and shelflisting. Call numbers used in libraries typically consist of two segments: a classification number and a Cutter number. (The term number is used even though these two segments often contain both numerals and letters of the alphabet.) The classification number, constructed by the rules of a classification scheme or drawn from a list of numbers valid in a scheme, provides an abstract representation of the location of the subject matter contained in an item within the organization of knowledge used in the classification scheme. The Cutter number, constructed by the cataloger, distinguishes items with the same classification number and allows for the appropriate subarrangement of materials.

    Subject analysis is the determination of the nature and scope of the item being described, and involves the location of that subject matter in a knowledge organization system. Classification schemes with their corresponding notation are one class of such systems; subject analysis in the context of a classification scheme results in the assignment of a classification number. (Subject heading systems such as LCSH and MeSH are another examples of a knowledge organization system; their use results in the assignment of one or more subject headings.) The construction of the classification number is performed through consultation of a published classification scheme, and involves also queries of the local catalog to find other manifestations of the same work, other materials on similar subjects, and other materials bearing candidate classification numbers; this work often also involves consulting authority records, for suggested classification numbers.

    Shelflisting involves the addition of arbitrary symbols of various kinds to the classification number to make a complete call number; this number fits into a set pattern of subarrangement under a given classification number. This task is accomplished in part by consulting the local system's online index of active call numbers. (This task takes its name from the shelflist-a card file arranged by call number-that has been replaced at least approximately in online systems by an index of call numbers.)

    Back to text
  3. http://www.desire.org/ [URL http://www.desire.org not working as of July 5, 2005]

    Back to text
  4. These portions of the Library of Congress classification scheme are also available in the CD-ROM Classification Plus product.

    Back to text
  5. The system could assume that a library follows LC/ NACO practice unless local practice is explicitly indicated with the library's symbol in subfield $5. Here and elsewhere, any call number extracted from an authority record must be of the 'type' (Dewey, LC, etc.) supported by the local institution.

    Back to text
  6. The model described here works well for materials represented by standard bibliographic records-including those prepared according to some metadata schemes. If intended to assign classification numbers directly to Web pages (which may or may not contain metadata, or even neatly-coded headers with subject-loaded terms), such a service would need to examine the content of the Web page itself.

    Back to text
  7. For example, microforms might be assigned sequential numbers; sound recordings recordings might be arranged using a locally-developed scheme or the publisher number; videorecordings of feature films may be arranged alphabetically under a general classification number; bibliographies cataloged for the Reference collection may all be placed in LC's 'Z' class.
  8. The term record here refers both to a separate bibliographic record in MARC format or some other format, and to metadata stored as part of an item.

Appendix 1: Charge to the task group

Back to Top