Thursday, Oct 13,1994, beginning 9:00 am, Mumford Room
Sarah Thomas welcomed the participants:
The University of Virginia was so well organized. The seminar yesterday at UVa was like the topic itself, interactive and dynamic. She was ready to resign her job, and apply to be a fellow at the Institute.
The Seminar today will be a success, because the participants will be making it so. Thanks to UVa for yesterday. Acknowledge that OCLC has made a contribution to the support of this seminar.
We have gotten a lot of very good questions. She has given the questions to Beth Davis-Brown to prepare for the panel tomorrow. Our speakers will be asked to answer these questions.
This is a very important day for LC. There will be a press announcement this morning about a sizable donation to LC, $10 million to LC today for digitizing. We will be part of the press tour this afternoon. News coverage this afternoon, when David Williamson is making his presentation.
Susan Hockey = 9:00 am Introduced by SET
She will talk about the text encoding initiative and SGML. She organized a conference in New Jersey this spring. SET volunteered to sponsor a meeting [this seminar today].
Director of Center for Electronic Texts in the Humanities (CETH) since 1991, Rutgers. Background in classics, from Oxford. 25 years in humanities computers. Author of a guide to electronic texts in the humanities, and many articles.
Text Encoding Initiative and SGML
TEI= Text encoding initiative.
SGML = Standard Generalized Mark-up Language
Humanities computing. What humanities people have been doing. Started in 1949. A long tradition of using computers to create concordances, text retrieval, stylistic analysis, authorship studies, scholarly editions, sound patterns, lexicography, logical studies, hypertexts.
We can do things in electronic medium that we can't do with printed texts. Studies of sound patterns. Lexicography = Creating large data bases. Old English corpus. Electronic OED. Can transform dictionary into other structures which can lead to retrieval in different ways. Links between different sets of information.
Humanities scholarship is the study of source materials, interpretations. Humanities work does not get out of date, unlike scientific studies. A solitary activity. Need to represent the material, very complex material. Need to represent the complexity in the computer.
Each person began doing it differently. Many have built in canonical scheme. A host of encoding schemes which were very different.
Why we need markup (encoding)? Things embedded in computer file for computer processing. Text formatters reveal code in WordPerfect shows you its markup.
Another kind, for analyzing text. Retrieve things and more complicated analysis. Defining area of text to be searched. (words spoken by Ophelia). A bibliographical record is fielded data, one type of markup. Possible things to encode: chapters, speaker, document no., quotations, author. Also interpretation in text. Work types (nouns, verbs, etc.) This is analytic and interpretive coding. Also historical interpretation, literary interpretation, etc.
Overhead has 6 ways of encoding the beginning of the Gettysburg Address. Speaker can think of about 20 different ways.
TEI= Text Encoding Initiative. Two printed volumes = Guidelines for Electronic Text Encoding and Interchange, published in May this year. 1987 = planning meeting at Vassar. Can we create a common encoding scheme? Two editors in Europe and Chicago, Advisory Board (15 organizations). A joint project between America and Europe.
SGML made this attempt more successful than before. An international standard since 1986. (Not a markup scheme) a syntax which enables designer to create an encoding scheme. It is a meta- language.
SGML is descriptive, not prescriptive. Meta-language. Tells what is title, etc.; not like telling computer to bold, underline, etc. Text usable for many different purposes.
Plain ASCII files, no nonstandard characters. Can move from one machine to another. Machine independent, application independent. Makes archival form of data not to get out of date.
A number of fairly good introductory books now on SGML.
SGML components: entities, elements, attributes, put together in document types.
An entity is any named bit of text. (1) nonstandard characters, & beta = Greek letter beta. Program will translate it for screen representation. (2) boiler-plate text + &TEI will print at Text Encoding Initiative.
Meat of an SGML document is the elements, the components. They can be anything. With more then and less then brackets. Marked start and end tags.
Can mark start of chapters.
Attributes can do more interesting things can be done. Example name type = personal name. name normal = normalized form of name (index as smithj) All these can be used to control processing.
DTD = document type definition. Something a document designer builds, all possible things to be in document, and designs relationship between them.
Example: A play, title, acts, scenes, speeches; Computer can tell you that an act cannot be part of a scene, if you make a mistake. SGML can look very verbose. Can some of the markup be omitted, if a new line starts, the last of line must have ended. Can leave out some end tags, it you tell the program that is part of your scheme.
TEI makes heavy use of attributes. TEI began with idea of creating SGML for all the humanities.
Divide work into 4 groups.
Documenting electronic text is most important for this group.
Also, physical description, and logical structure of text.
Linguistic analysis and interpretation.
A group decides which parts of SGML to use.
July 1990 is 1st draft. Gave away. Got lots of comments.
Second cycle. Testing and extending. Technical review groups. Published in May 1994.
TEI guidelines = suggest things people need to think about, what to encode and how to do it. Absolute requirements very small indeed. Same overall syntax.
TEI DTD base always have:
header
coretags (common to most types of doc.)
base tag set (such as prose, verse, drama...) a whole set for transcribing speech; dictionaries(such as OED) ; how to encode terminological data.
Build up like a pizza: base, additional tag sets, additional sets for the topping. Dates, graphs, mathematical.
What a TEI text looks like: = the header.
Contains about 60 possible elements in 4 sections.
File description gives source from which transcribed, more than traditional cataloging record. Spelling changes, hyphenated work, quotes, different languages in text. (Things which make it useful)
Profile description = classification, subject headings, etc. People in a conversation. Revision history. (Who made the change and when).
File descriptions give what can be used for cataloging.
Publication statement = a lot could be said about this.
Source description = where the document came from (Shakespeare texts)
She shows simple source descriptions.
Overheads: Poe poems; CNN12 = news cast.
Can use WordPerfect spell check to look for spelling errors, can say spelling normalized, what's done in quotation marks.
TEI prose text in overhead. Text, front matter, chapter, body, back matter. A new type of text may show up, you can make new subdivisions.
Minimum TEI document overhead:
1 set of tags for body of ext.
Use of TEI SGML
Will not be seen by most people. "UNDERGROUND TUNNELS" of how to use the text. = makes everything work together underneath. Software should hide it as much as possible.
Scope of projects: a DTD for many different directions, can be extended to multimedia, for example.
Don't have really good software for everything yet.
The header needs much input from this community. A SGML structure for header created by cataloger, to give all information for cataloging.
Need to educate publication community of the importance of this.
Build a super-set of TEI headers to point directly to text.
Questions later.
Introduced Aurora Ioanid as her new cataloger.
Carl Fleischhauer, LC. In American Memory for a number of years.
During American Memory's five-year pilot which just ended.
We wrestled with dozens of issues.
This element (to discuss today) includes cataloging, finding aid of bibliographical records. Concerned with archival collections, photographs, printed matter, monographs, etc. as parts of larger groupings.
All 4,000 panoramic photos in P&P. 4,000 Matthew Brady Civil War photos.
Collections in three categories. Digital reproductions of individual items, also frameworks which include finding aids. CD- ROM frameworks. = hybrid of hyper-text and catalog record. Starts with a title page which is more like a table of contents.
Search and retrieval software. Rare Books supplied very complete headers. Photos had very brief records, which give excellent access to photos.
Recently joined Internet. Can mix coded text with bibliographic records.
These mockups were fine for some. Some divisions wanted traditional register, not a bibliographic record. (Guide to collection)
Started designing online registers. Chronology of person's life, menu, list of containers, folders. A nonlinear register can include search terms to be used. Subjects could be added.
Coded markup language is the basic tool.
Susan Hockey and her TEI co-workers convinced American Memory to use this TEI as SGML. Some still stuck in previous markup language. Dan Pitti has samples of registers here on his laptop.
What happens with collections with register co-existing with collections with header, etc. bibliographic record. Many other questions. We look to meetings such as this one.
Complex and diverse materials.
Vary from customary records for monographs.
Linking to reproductions: of the item described. Using 856 field for like name, filename, or set name. A set contains a number of related files.
File names use DOS naming files. Example: single photograph.
Another example : Each panorama is in a set of pictures, like a comic strip.
Printed matter and manuscript collections = get more complicated. (has written paper)
Example = collections of speech has recording, transcription, image of original 78 recording label. Portrait of speaker, with caption. Some collections have contents notes for the item.
2nd topic = Managing data bases. Copyright, etc.
Archivists control these steps by databases. How can these be transferred into headers, databases.
California books from general collections. Contractor for text scanning, contractor for picture scanning, tried to keep transactions on slips of paper. Elaine Woods for help. She suggested adding ad hoc 980 and 984 fields into record.
Fields and subfields to show, Binding Office, Federal Prisons Industries batch 7. Physical location, need copyright search.
Copyright information tracking = have to have permission of copyright holders. Must be able to track use and fee system. Delivery side of question not yet worked on.
Would rather provoke discussion. Look up table for permissions and fees. Copyright needs name of person, name and time for renewals. Agreement with owners, restrictions. Haven't gone that far yet.
Used PREMARC records, filled in with extra info. included copyright search date, date,person, memo
Relationship between bibliographic records and texts. Worked up header, asked would we include all data from cat. record. Answer = no. Redundancy. Had other matters to do too (not enough time) Some files have finding aid, not bibliographic records, some have nothing, or online register.
Our level of effort much lower than what some scholars might use.
Add "a machine-readable transcription" to title field.
Cite existence of facsimile images. Digitization contractors have one typo per page, so want facsimile image also, linked.
Hardware and software contractors use oriented toward IBM character set. IBM character set tilt also found in software. Bibliographic records use ALA character set. Conflict between these 2 character sets. Went with what contractors could do for us. This will require post-processing. Both options cumbersome and bothersome.
Look forward to continuing.
End of Carl's talk.
Seminar on Cataloging Digital Documents
Thursday, Oct. 13, 1994
Lynn Marko 10:40 am
She was also present at New Jersey Conference. University of Michigan. Head of monographic cataloging.
Lynn Marko
Where we are going in the future.
I live in a dual household. Cross fertilization between his career and mine. Was exposed to work of a professor named:
James Utterback: Mastering the dynamics of innovation.
Discovered impact of innovation on established enterprises. Read this book above.
An example is ice harvesting the source of wealth in New England. History of such. 1880's was the height. Then refrigeration developed in American South. Even though man-made ice much more expensive, suddenly overtook harvested ice. Within a generation, tools were just left to drop to the bottom of the pond. These process continues, ongoing, necessary. Change continues today.
Automotive example: Electric vehicles mandated in California in 1998. GM is preparing to deal with this.
In digital development. patterns of turbulence , patterns of stability. Technological change causes discontinuities.
At the University of Michigan. Digital lib. program = partnership = Univ. lib., school of library science, tech. division of univ. (UTD)
building on strength of each partner. Each partner brings a particular competency.
Library brings organization, access, literacy.
SILS brings research, systems design.
Information Technology Division (ITD) brings technology.
Collectively they see campus issues: open access policies, long- term access, content funding models, copyright, intellectual property, funding.
Umbrella of projects has become so large.
"Campus-wide digital library projects."
papyrology, etc.
2 projects as examples.
University licensing project: 43 material science journals. Elsevier Science Publishers and engineering information. north campus has desk top delivery.
UMI-University of Michigan journal image project.
= core journal, expanding for undergraduate population. access, viewing, accounting. 400 journals covered by Wilson indexes. images linked to Wilson indexes. Viewing on Ann Arbor campus.
One of main problems = obliterated. Shelf availability. don't have to worry about where the kids have hidden them.
Access to journal literature has not been covered by cataloging.
Is this a structural change in the cataloging universe? Prepare ourselves for change.
National bibliographic delivery system has been developed. We can be proud of this achievement.
Problems. All our structures were developed for print. Non-print is a rapidly changing marketplace.
Lack of responsive climate.
Business theorists = now is the best time to identify the path to the future. How can cataloging community do this?
Recognize the value of what we do. The old doesn't go away. The old stays, together with the new technology.
Recognize the value or our structures. Generally available. Not expensive. Do not put all our energies into outmoded standard activities.
Recognize the value of MARC as a communication protocol. 350,000 inquiries per month on University of Michigan campus. Hackers have not taken it away or brought it down. Search engines make it kind of universal.
Example = random RLIN record. SGML in the header.
To go from MARC to SGML to save re-keying. Consistency between text and its description.
As other meta-data formats become used. GILS format, lots of conversation with those who developed MARC.
Another key point. recognize emergence of other electronic type of data. A change for catalogers.
At University of Michigan -- Gateway should provide: conceptual structure. etc. client access, user authentication (payment) value-added.
Put up initial home screens in overheads. Home page in 8 languages, most likely users. Shows overheads to seminar.
Judy Arrenheim (1 of the catalogers) made substantial effort. Need to describe latitude and longitude of WWW (world wide web) site.
Team approach. Cataloger brought real contribution = essential knowledge of the structure of information.
Description and access will become important in this environment.
Cataloger needs to be supplied with the best equipment --High-end desk top equipment, a connection with the ether-net. Need to make capital investments.
UNIX for dummies = becoming worn = people buying their own copies.
Summary: entering a period of tech. shift. Will have an impact on what we do. Exciting challenges and opportunities. Somewhat chaotic, unpredictable, unstable. Presents opportunities and challenges.Joan Swanekamp
Guidelines ... Interactive Multimedia.
More copies coming.
Multimedia -new opportunities for teaching.
Text, graphics, full motion video. (see definition in G)
Non-linear navigation necessary.
G published in June 1994. Experts on committee.
Started June 1991. Input from cataloging experts. Ben Tucker in the group. Publication by ALA for American Libraries, not an international standard. Listed committee members.
issues and concepts = G
Had some heated discussions.
Q = definition = is this a computer file or a video = into 3 camps. Changes to chapter 9, 7 or consider this to be a kit. A number of examples and discussions. Debating for weeks. Intellectual property was important. Months doing definition. Diana Oblinger in NC. (reads definition) Not always easy to decide -- a "help" section. [the next 4 pp.]
May have to go back and re-work. Machinery getting more sophisticated. Lists (what is and what isn't) will need to be changed.
Musical works = can buy the parts separately, put them together. (example can by music disk separately)
Work included all of the parts - rethink chief source.
New GMD = interactive multimedia.
260c can become complicated. Music = composer as part of t. 260c = people responsible for entire work. Other names in 500. find a date for the complete work, not dates of parts.
significant difference from AACR2 - separate parts by commas, or multiple 300's. [shows examples if 300's]
Last week = workshop = preferred method = Laurel Jizba and Ann Fox =use whatever seems most reasonable for your item and collection.
Choice of entry: Music presented problems, most others had few problems. Testing G -- Music catalogers chose composer as 1xx. interactive multimedia title is a new work by an assortment of persons. "entire work" Usually under title.
examples:
OCLC 19940707 Law work> You be the judge c1990.
OCLC 19940707 Real world legal ethics c1990.
The magic flute ZCU record c1989 under title.
OCLC 19940714 BCH AND BEFORE c1992.
Music plus computer program + other stuff. see 21.1B2
Main entry not a dead issue also. From audience: same for children's books.
Web document. She thinks this would be interactive multimedia. Answer from Sherry Kelley.
Example from yesterday = journal with video and sound from Institute.
What comes next. OCLC= RLIN = use computer files format = interim decision.
CC:DA = send comments to them.
Will be revised based on comments. Will move towards new rules in AACR2. Maybe a new chapter in AACR2.
Summer in Chicago. 1 day pre-conference.
October 13, 1994, Afternoon Sessions
David Williamson, Library of Congress--Text Capture and Electronic Conversion
The Electronic Cataloging in Publication (E-CIP) Project was developed by David Williamson, Dick Thaxter, and Bob August of the Cataloging Directorate in consultation with the CIP Division. Galleys for CIP titles are sent by the publisher over the Internet via file transfer protocol (FTP) and an application is sent via Internet mail. Using a bibliographic workstation (BWS), an IBM PC with an OS/2 operating system and multitasking capabilities, a cataloger can build a record from the transmitted data. A specially designed visual interface provides a two-part workform in which the cataloger can manipulate data by pointing and clicking. From the manuscript copy in the upper window, the cataloger can highlight text and use menu options to build a MARC bibliographic record in the lower window. The program handles conversion of the data including determining proper name order (first name and surname), abbreviations, punctuation, and capitalization. Since little if any keying is required, the integrity of the data is preserved and record creation time is kept to a minimum. After the record copied to the OS/2 clipboard, it can be migrated to the MUMS system by pasting it into a record creation screen. Once the bibliographic record has been created, a name authority record may be built by selecting the appropriate option from the menu. The program will scan the information, select appropriate data, reformat it with corrected order and punctuation, and supply references.
Copy cataloging is possible by using the BWS and an Internet connection to search library catalogs from around the world. The cataloger captures text by blocking it, then builds a MARC record following a procedure which is similar to the one used for electronic CIPs. This technique is especially useful when library staff lack the language expertise necessary to process an item.
Another CIP project uses the World Wide Web and Web browsing software to provide publishers with a form for applying for a preassigned Library of Congress card number (LCCN). Instructions on the prompt screen guide publishers to transcribe publishing data in a MARC and ISBD-compatible form. Once the workform is stored into memory, it can be scanned and converted to a MARC record and a form letter is generated for return to the publisher via Internet mail. Both the preliminary MARC record and the form letter are generated based on data provided by the publisher on the form. This saves many keystrokes and, again, is very accurate.
Edward Gaynor, University of Virginia--Cataloging Sample Electronic Texts: a Practical Experience
We need to examine the issues surrounding digital images.
*Is it time for MARC to evolve into a new format such as NSGML?
*Is it time for a new type of bibliographic record which is independent of the digital resource? Should the library community take the responsibility for creating it?
*Is there a need for separate bibliographic records or should digital images themselves be considered carriers of information? Is tagging the electronic resource the solution or is it necessary to add descriptive information such as size and format to the image?
*Do we need Chapter 9 to catalog digital images or should they be cataloged as reproductions?
The following cataloging issues were identified by participants after group discussion of sample GIF and JPEG image records.
*Is the digital image a reproduction of a photograph or a reproduction of the architecture?
*How does the patron know which format to access?
*Should the cataloger describe the slide from which the digitized form was made?
*Who will describe the images? The art and architecture departments? Should they be described by art history specifications?
*Should one description be mapped into different views for different types of patrons?
*How much information is needed to tell the patron what it is and where to find it?
*Should the image be described in a separate bibliographic record or as part of a component record for all manifestations of the work? Can institutions afford to create separate records?
*Are additional notes for information such as image manipulation (compression), authorship (creator of architecture, photograph, digital image), and dates of creation needed?
*Is the MARC record the best vehicle for this information?
*Should there be a main entry? If so, is the author the creator of the architecture, photograph, or image?
*Should there be a hypertext link from the bibliographic record to the image?
*Is an image header needed for technical information on scanning, resolution, and compression?
*What type of information is included if the image was taken with a digital camera?
Diane Vizine-Goetz, OCLC--Cataloging Internet Resources
There is a diverse and growing collection of electronic resources that present unique cataloging challenges to librarians. Rapid growth in the amount of network traffic and users has resulted in the development of many locating and access mechanisms such as anonymous ftp, Archie, Gopher, WAIS, and WWW. The impermanence and instability of these resources and the prevalence of multiple versions create problems of preservation, authentication, and attribution. A copy preserved and cataloged today, may not exist on the network tomorrow. Mechanisms for tracking and distinguishing one version from another are inadequate. A document that is sent out on the Internet may be captured and changed. The challenge to librarians is meeting users' needs by improving access to electronic resources with existing library practices and services. Time is wasted finding and maintaining documents. Why catalog them?
Cataloging provides a standardized description with controlled access points and indexing vocabularies. The MARC record structure facilitates indexing and retrieval and exchange of bibliographic data. The OCLC Internet Resources Project completed in 1993 led to the January 1994 approval of the 856 MARC field for electronic location and access. USMARC format changes were approved in June 1994 for creation of records for online systems and services. A September 1994 grant by the U.S. Department of Education was awarded to OCLC for cataloging Internet resources.
Recent statistics on electronic journals reveal the following: CICNET archive has 740 titles; SUNY Morrisville Gopher has 250 titles; OLUC has 192 records (137 or seventy-two percent of the total number were input via the National Serials Data Program; ninety-three have zero holdings (preliminary records); and forty- one titles overlap with CICNET Archive). Documents such as eMail messages and certain newsletters do not merit cataloging and are not represented in these figures. Among the top electronic journals in OLUC are the Online Journal of Current Clinical Trials (forty-two titles) and the Journal of International Academy of Hospitality Resources (seventeen titles). The most popular electronic journals on CICNET are ones in the categories of politics, computing, computer underground, news, library, education, literature, culture, and miscellaneous. A survey of scholarly electronic texts on OLUC revealed eighty-eight records for Internet-accessible items. There are more than 1600 machine- readable texts in the Rutgers inventory, over 158 in the University of Virginia Library's Electronic Text Center, over 350 in Project Guttenberg, and about ninety in Internet Wiretap.
There may be tens of thousands of these texts in existence, but only a small number have been cataloged. Most of these were selected for cataloging because of the library community's familiarity with them. Consequently, important subject areas such as psychology are not represented. Librarians must find where these resources are located and determine how to provide access to them.
The OCLC Office of Research Activities is investigating a prototype system for creating record descriptions. By mapping TEI headers to MARC, it is possible to capture electronic texts and their descriptions worldwide, then create customized descriptions from this information.
The Office also supports cataloging standards development. Currently, MARC records for electronic resources are being identified and analyzed to track 856 field usage and the need for deeper level subfielding in the 500 tags. Holdings data and access fields are being monitored to track the validity and currentness of location and access information contained in cataloging records.
It is important to know if organizations are limiting holdings information to resources maintained or archived by the institution and how this affects access. One solution for increased access is to create machine-derived descriptions for recurring classes of items such as FAQs, library catalogs, discussion lists, electronic journals, and newsletters. More evaluation of data in resources is needed to determine the success rate for services handling search requests, the number of links in HTML documents, and the kind of data maintained and its location. We should explore the functionality needed to create electronic resources descriptions (MARC records, TEI headers, URLs). Automatic indexing should be investigated for mapping at a broad level. The concepts of completeness or thresholds must be addressed. Consideration should be given to providing data on the level of quality of a record so that the user can specify what level of completeness he requires.
Erik Jul, OCLC--U.S. Department of Education Grant Project
A grant by the U.S. Department of Education has been awarded to OCLC to build a catalog of Internet resources. At the end of the grant period, October 1, 1994 to March 31, 1996, OCLC will host a symposium to report on the project and describe any plans for the future. The goal is to implement, test, and evaluate cataloging of MARC format records for Internet-accessible resources. One particular area of interest is determining the efficacy of the 856 field for providing location and access. OCLC is soliciting volunteers to identify, select, and catalog Internet-accessible materials. OCLC anticipates that the project will serve as a catalyst for facilitating the discovery of electronic resources and increasing cooperation among the creators of electronic files, librarians, etc. OCLC will serve as a clearinghouse providing cataloging guidelines and support. Progress reports will be posted to Autocat and PACS-L.
Seminar on Cataloging Digital Documents
Friday morning Oct. 14, 1994
Panel Discussion with Edward Gaynor, Susan Hockey, Lynn Marko, Joan Swanekamp, Diane Vizine-Goetz. Sarah Thomas, moderator.
Diane Vizine-Goetz - We need to look at the characteristics of items, study the hard data, and make decisions.
Susan Hockey - There should be a link from the description of the material to the item itself with instructions in the catalog about how to use the link.
Lynn Marko - We need to put on a hat other than that of a librarian's. Who should manage the description? We decided that we're not part of the process and now we're out of the process. We look at the chaotic way things are done and give them order. We have to come to terms with dealing with the ambiguity.
Edward Gaynor - We're not part of the process but should be because librarians are the ones who make decisions about organizing. We need to cooperate with the technical people. The catalog record as we think of it today is not necessary. But, something is to give access to information.
Joan Swanekamp - Electronic resources are on a campus-wide network. The decision may be made to purchase an electronic version of an item instead of another hard copy. If the material is not linked to others in the catalog, we've done the patrons a real disservice. We need to consider the role of the printed work.
Sarah Thomas - The Library of Congress should be thought of as a database provider. Catalogers need to take a more active role in providing access.
Diane Vizine-Goetz - Machine-derived preliminary records is one solution that OCLC is examining.
Edward Gaynor - Very few academic institutions are involved in repeatedly perfecting cataloging until it is just so. That doesn't mean that the work isn't valuable. They've finally bought into cooperating with others. There are trade-offs.
Sarah Thomas - We have to determine what the standards or thresholds are. What is the best use of resources?
Susan Hockey - It's time-consuming to catalog electronic resources. The equipment used with them is expensive and if they're not properly installed, technical support is needed. CD-ROMs, disks, etc., are all very different.
Sarah Thomas - Cataloging requires an investment of time and effort. We should focus more on training and documentation to support cooperative cataloging. We should focus on testing records. Instead of discussing distribution, we should be discussing sharing models. What do we need to describe the resource? A nontraditional description and access method such as SGML? A tiered model with MARC and AACR related to SGML and TEI?
Joan Swanekamp - At Columbia University we are purchasing a large number of electronic resources, particularly in CD-ROM format. We have site licenses for some so that they can put on the LAN. It's hard to know what the best model is. It would be preferable to have a different view that would allow linkage to the printed edition rather than recatalog for the electronic version.
Susan Hockey - Originally, groups within the scholarly community encouraged cataloging staff to provide this kind of information. These groups had no means of their own for describing things.
Edward Gaynor - We can't assume that we have to do everything the way we do now. Although there's going to be a certain amount of status quo, we have to consider how we'll expend our resources and find a better way to do things. We have to find a new way for resources to carry with them the descriptive information from the publisher, creator, or cataloger.
Joan Swanekamp - We need a new definition of what edition means in a digital environment. A definition is needed for multiple versions.
Sarah Thomas - We have to keep the discussion alive on Autocat and Emedia. We need annual meetings with catalogers and computer specialists to discuss intellectual access from different perspectives. We should try the tiered model approach to solutions, i.e., discipline-based, audience friendly. We need to form information alliances or information user groups and focus on cooperation more. We need programs in library schools that prepare future librarians to deal with information systems. We are bound by what we think data is and by what constitutes bibliographic description. We need to think electronically and come up with a new descriptive structure. We need to evaluate how well the records we create succeed in getting people to the material. Do we need hypertext links to the actual items? The user doesn't want to leave the bibliographic record to get into a separate system to access the material. We tend to be a rules-based profession that is not flexible.
EG = Edward Gaynor
SH = Susan Hockey
LM = Lynn Marko
JS = Joan Swanekamp
DV = Diane Vizine-Goetz
SET = Sarah Thomas, moderator
SET = Talk at least an hour. Come out with a sense of direction for LC or others.
Flip charts for small groups for draft action plan.
Q = Traditional catalog has provided access to what is at a particular location.(library) Does it make sense to catalog things on the internet available everywhere. Should this be in local catalog.?
LM = CIC is a very strong agent for cooperative library in the Midwest (regional cooperation). Linking together big 10 Chicago, Pen state, etc. catalogs of. Using Z39.50
A 1200 mile campus. local cataloging be beginning to be looked at in a different way. Beginning to link webs of catalogs in a very transparent way. Do not look at local resources as local resources. Need to provide reasonable access to what is needed.
EG= Yes and no. Don't use the term "cataloging" for this stuff, provide some sort of intellectual access to stuff available on the internet. We should catalog the resources as part of the resources, can display sensibly so local users can find it. Catalogs and records we have now will not go away anytime soon. Various formats will always be here. MARC RECORDS will always be here. New resources may carry new type of access.
Q = Explain local to determine if its valuable.
Lynn= Same as a book. Electronic journals not of scholarly interest, not sure we would be cataloging these things. (a little blue). Just as we don't buy English pulp novels. They have staff that peruse what is out there. In the University of Virginia they have bibliographers. Lynn has other/bibliographers to select this.
Susan = A whole lot of new issues. All the scholarly issues same as print media. For electronic, consider how people will be using it. You can read much of it up, if not marked up that is all you can do. 10 different electronic versions of a Shakespeare version. Intellectual content of original is the same; search engine controls intellectual content, because of indexing.
Q=Issue of ownership get in the way. We share data all the time, because of our collections. To determine who is cataloging the data. Many, when you go to find them,are no longer where they were cataloged. In this country, sharing resources, back to 1902 in LC. We should continue to use this. Catalog files you have, that they think are worthy in the network.
SET = Do we need much closer ties with creator and cataloger?
Helen Schmierer= Not all the files will be as volatile as the files we are seeing now. As they get integrated into the scholarly process.
Daniel Pitti: Where should the cat. record reside? Local responsibility for what is held locally. People still have to meander from catalog to catalog. Should have 1 entry into internet to find out where these reside.
SET asked Eric Jul if that is what the OCLC project is.
Eric Jul= Much of what the current internet world lacks today is everything libraries have brought to bear on their collections. Principal... Institutionalized stability (CICnet, etc) will lessen the problems. there will be an archive of electronic resources.
Libraries will begin to provide access to these materials. Can Some of these problems can begin to be addressed. A single point of access is a bit much to hope for. An institutional commitment to maintain and provide access to their own students, also commitment... to the wider community. As widely available as any other item in a union catalog.
Fran Miksa= Helen mentioned volatile files. We should not say volatile files are less valuable. Volatile, not-very-well- organized files are very valuable, especially to the scientific community. We have to include these in the cataloging structure.
Q Potential to link directly from catalog record to the file. Major concern is to get the patron to the materials. Think of what does the patron need
Diane = We re going to use OCLC project to study volatile files -- We can give you some statistics on how volatile they are, how usable susan = One has to get from the description of the material to the material. We may not have the right software to use the thing. Provide enough info. so patron will know if he will be able to use it.
SET = Is a catalog record useful or necessary to electronic text. Does ... registration of electronic text obviate the need for a cat. record. A group on each side.?
Lynn = Caution:If you put on a non-electronic librarian hat, who will manage, describe, provide access to these resources? If we're not part of the process, we will be out of the process. Try to put some order in our universe. By the time you get a record up you have to take it down again. That is part of the new order.
EG = Librarians make organizational decisions. Computer people provide technical support. Something is necessary, but it may not be a MARC record in current format. Patron will want some sort of preci, not look at 500 screen looking 4 the right thing. Patrons want what they want.
Q from David Levy, computer scientist. Lots of people suggest grand and glorious tools. Today things are so unstable and unclear. What new institutions will grow up. There can be no clear answer at the moment. Partly a political questions. Solutions will be imposed by the people with the power & the money. What comes will be somewhere in-between. Lots of the things computers cannot do. You librarians do lots of things behind the scene to bring about order. Finding what you want in the electronic universe will not be easy (quick) without librarians bringing order. Bring together people from different disciplines. Input of scholars,librarians, computer folks.
Q = think of cataloging as monolithic. Tiered kind of access. Finding lists. vertical file. Think of internet as categories of different things that need different kinds of access.
Joan= Problems at home: Putting electronic resources on campus- wide network. When we pay money to acquire things, we try to put in out catalog, even electronic things. What is the role of this particular work . Evaluate these materials,make some decisions.
Michael Shapiro= MARC record is an access into finding things. As an index, could be distributed on the network. Cat.maintained locally, distributed on a net. Search this internet catalog. Get information on where it is. Use internet to distribute the catalog
Howard Harris = integrate these records into local. Will we see a commitment to distribute records, or make them proprietary. 1 point of access, not customized to local use , collocated locally?
SET = We have not begun ...Seminar to expand the horizons for our own staff. We need to be providing access to this info. We're not doing a good job of providing access to our own resources. Marvel is getting to this point. Cat. could take a much more active role in providing access. We're not cataloging electronic texts right now.
Eric Jul= Putting the record in a proprietary system: Sharing of the record is primary purpose of union catalog. Any record should be obtainable (usable) at a local place. Ability to be found and shared is separate from system of creating and maintaining it.
Q = Anka Gray = concerned about local access to things we don't have. If we purchase it, it must provide access. Books, etc. still coming in, catalogers disappearing. We don't have resources in providing access to internet. We need cooperative effort. All goes to one central location. Need a mega-catalog of Internet resources.
Q= it's strange to hear people putting up access vs. cataloging. making files is what we do. Item file is the things themselves. Surrogate files because item file is too big to go through. surrogate file is what we make to focus in on essential things. No way we can access without surrogate files. Have the best kind of surrogate files.
Diane = machine derived preliminary records. Make them available to people to update and change. Who is interested in that activity on an experimental basis.
Q = Item file vs. surrogate files. When do s-files get created, by whom?
Q = Per electronic journals. Will be completely eclipsed by article indexing
SET = When there already is that rich index, that is true.
Q = Do a search in the future, certain things come back - in the process, search engine will create subject file for items which don't have them, sent to human catalogers to look over. How things could change.
Lynn = knowledge navigation is a major task at the University of Michigan. When search has finished its course, if it comes back to the lib.--has date of interest to more than original searcher.
Q = a price for cataloging internet resources.
EG = It's bothered me - cooperative cataloging - in practice there are very few large libraries that really do this. UVa does a lot of local processing. Not something we should discard. We are finally buying into cooperation, other libraries do good work. There is a trade off. Accepting this cooperation frees up resources to provide access to materials that have no access. Why should we touch LC MARC records. for a large percent of the items we buy.
SET - struggling to the adjustment to using copy cataloging. How we can better use our resources. Very tight standards on some classes of materials, we have ignored new forms of materials. We need to consolidate our resources. Good not just for us, but 4 the nation.
Susan = Cataloging electronic text is time consuming operation. Need to load it up, play with it, to see what it is. Need a lot of tech. support. to create a good cat. record. They r all different.
SET = Once the cat. has invested that much effort, the public service people need the same sort of involvement.
Q= Bill Anderson = Need to pursue a better method of distributing records.
SET = concentrating more on getting program in order - training. Test of core bibliographic record. Members will share records with each other and their utilities. Models still under discussion.
Q = Not hearing one of the issues = Do people think there has to be a full and complete description - maybe from 1 or 2 sources. Do we really need this? Who would do it.
SET = what do we need for bibliographic record? Is there a shared model? Can TEI and SGML help?
Q= Will be used multiple times, simultaneous users. More effort into things that will be used often, less into those not used much
Joan = Columbia University is buying a lot of electronic resources. Lots of money. Cataloging, because necessary, after spending all this money. We would consider another view - link to original work from which derived. Investing heavily - grant money. Hiring a person to deal with electronic resources. Catalog directly on OCLC. All records will be on OCLC and RLIN. Set with hundreds of works - doing analytics for all works in the set.
Lynn = Electronic texts - enormous amount of money to catalog - you need to know which version of original you r dealing with.
Susan = Creators of these things. These things come from groups in scholarly community. Know very little about bibliographic control. We need to talk them into providing this kind of information. They cannot describe, they don't know how.
SET = Is not the germ of these projects a grant? Work with foundations to emphasize, put into the proposal, they most have description to get grant.
Susan = Humanities projects are often one person project. Very difficult to get them to do documentation.
SET = David's question. Not going to be centralized. We at LC cannot catalog all of these resources, neither can Canadian National Library.
Q = CIP program, would it work for electronic publishers.
John Celli = Absolutely, it would work. University press people using electronic CIP. Those presses least inclined are regular trade presses. Publishers losing opportunity to provide whole new product to sell.
Regina Reynolds.- ISSN program. You need a critical mass. Commercial publishers will follow when there are enough university presses using, and libraries asking for this.
John Celli = a lot of hazards, publishers see liabilities. Random House (del ray) science fiction. Editors like paper manuscripts so they can work on the bus.
Q = We haven't figured out the models to do these things. (Sally McCallum) This needs to be worked out.
Beth Davis-Brown = Political economy of cataloging - Catalog the same materials over and over - Once for hard copy, once for electronic version, while not providing access at all to many materials. This is not increasing the "world's bibliography" by cataloging materials to let people know they exist.
Joan = Gone ahead and bought the network version, after we cataloged CD-ROM version. Records look very different.
EG = Have to expend resources finding a better way to deal with this.
Lynn = Can I buy your records?
Joan = OCLC can sell a block, like major microform sets.
EG = Can I get someone to select records for me if not on major grants.
SET = a way to get records when items are acquired.
Q = If we stick with physical description, we will get very tired.
Q = Our patrons are crying for this kind of access.
SET = questions are unbelievably rich. Gets synapses sparking.
Action item. What to do with questions we haven't answered yet. Issues we have raised this morning.Groups:
Focus on priority issues. EG is moderator
Magda El-Sherbini wants follow-up in 4-6 months to what this group will be doing in the future. All are in different points along the road. Second person wants also. List serve for group.
Satellite video conference.
A way to summarize. (by Dec.)
Bao-chu Chang wants to know where the experts are. And know where they are along the road.
Survey questions to know answers too. - To come out of LC after the meeting
People have opportunities to visit sites to see what others are doing, the more advanced. Maybe a good video.
White papers would be excellent
Responses to our own ignorance. Ed. in Oct. issue of LIRTS.
Educational component.
SET & EG - ALCTS could sponsor a seminar like this at some meetings. Site visits at other places. Institute. Broad survey of all projects.
A more formalized structure to look at some of these issues.
CETH and MARBI formal steady contact (liaison). TEI in there, also Internet Engineering Task Force. CIP program.
A structure to address issues and guidelines. A committee structure. Like interactive multimedia guidelines?
Need more discussion on when to do it. Get ACRL, collection development issue.
More than ACRL - more than academic libraries. ACRL, ALCTS. Archivists.
Follow traditional cataloging guidelines, or intellectual access guidelines. Who should investigate. Who will do it and how?
A committee to look at user patterns.
What does it mean to have the item is hand is a basic issue. PCC has a committee on this, has not met. Barbara Tillett is chair. ALCTS committee on scholarly communications.
Susan Morris suggests survey to get info. -- Cornell - very hard to get all this on a survey.
White papers on what seems to be the problem. Internet is not all of it.
EG: on to particular things on list. 1) policy and guidelines: on cataloging digital documents. We have not defined the problem. Do we really need to catalog this stuff-other access may be available.
SM says we've abandoned AACR2, EG says we are tied to it, and it is not adequate.
Gophers are lists, function as bibliographies.
Bao-chu - whole issue of cat. internet resources is still unsettled.
Suggested options.
Need to involve vendors of library systems? System vendors.
Mode of electronic text production we see today will not be what happens 10 years from now. Methods of distribution may be radically different. keep our eyes open, don't rush to judgement. EG says, lets's get going.
How many of these things are we talking about - is this critical mass, or a beginning trickle. How many are "published?" Definition of publishing may need to change. Publish also means distribution. Transmission. Copyright - Loading a electronic object into RAM means the act of copying.
EG wants guidelines. We need to start. If we start working we will find out what the problem is. Involve systems librarians.
Do we need to make MARC more compatible with SGML, other systems. Who could do this? Do it as a community, LC should do this as custodian of MARC. Need TEI people also.
What's our role here as librarians? LC should not take all responsibility. LC, utilities, etc. TEI should be involved.
Policy or guidelines -- Mapping should be investigated. TEI may not be the wave of the future. Most TEI headers have been done by libraries, says EG.
Networking with producers of texts. -- ALA groups already meat with publishers. Role for LC, utilities, libraries. Back to a project.
SM says utilities have a better access, credibility, instant access.
EG says LC has a responsibility in light of CIP program. Do more than make MARC records, at least investigate this.
Text producers, not necessarily publishers, scholars creating databases. CETH might help with humanists. National Science Foundations. Requirement for getting money - foundations.
4) research on how texts are used. Who would do? When? How?
library schools do a lot of research. - area for who
OCLC project is going to go on. How do we get this information to library schools. ALISE is their organization, approach them. Librarians can do research. Sociologists probably out there researching how people use this info. There may be, should be international effort. IFLA.
Needs for training: Need to know what were training, the who. : people with money, senior management,technicians. Do we need training or education? Collections development, catalogers; Bao says there are other levels out there for administrators, etc.
Exposure requires certain kinds of equipment. Vendors, publishers, scholars, legislators, Call your congressman.
Out of group, back to big discussion.
SET = plan of action each group present it results. 10-15 minutes per group.
Try to have proceedings. Speakers papers, or condensed versions. Up on Marvel, or electronically. Maybe add photographs.
Will be mark-up of today's proceedings. Volunteers? SGML
Transcribe flip-charts.
No way to make a plan of action today. A small group of volunteers to work together on action plan.
Lynn =
discussion should be ongoing - electronic, emedia, annual meetings of people from various communities, presenters from different prospective.
Tiered model approach - user friendly.
Information alliances -- information alliances. building bridges between what we know & what we do, user specific information resources, enormous study going on within groups.
Cooperation - multiple cooperative, discipline rather than format based.
Preparation for the future, what to teach. Engineering needs to be interjected.
Teach new librarians internet, user-friendly interfaces.
In this cooperative model, library is not necessarily the driver. Discussion about MARC. Are we MARC-bound? Make our data more efficient. Adapt MARC to new discipline structures - think more electronically.
MARC search engine design - there is a lot of redundancy in MARC. Electronic exile. --into a separate structure.
Resistance to electronic access.
How will the records we create get people to the material?
Training
Non-mediated service model.
EG =
see previous pages and flip charts.
Diane =
Electronic objects, what were they? (see list on flip chart)
FAQ = frequently asked questions
We don't know what users' needs are for any one of these objects.
What belongs in the bibliographic record.? Copyright. Should publishers (creators) provide basic information on who they are, what they have done.
What is intellectual work? What is it we catalog? We don't know what record structure we need.
Diagram to show where we can use automation. When electronic object is created, automation should extract some information for the record. Create some kind of record creation from this extracted data. Fuzzy - link record into some type of registry data base.
Joan =
Discussion all over the place.
Identify problems
New models - Cataloging community involved in development of them
funding, where coming from.
What we should be expecting from electronic resource publishers.
How to divide up the work.
Centralized vs. distributed access?
Cataloging rules, what we have are not sufficient.
Where will cataloging data come from?
Trying to define our model. Let's begin with the end user. What can we do to meet their needs. OPACS to search a number of sources. Define core data elements, where would they come from, where stored.
Issues of location in this model. Availability, maintenance, standards, transportable.
Issues of authority control go with version control.
Linkages - which ones are valuable?
What we need to do next - pursue standards, get various communities together that are involved in these materials (list on flip chart).
SET = an incredible wealth of ideas.
We don't have to do it single-handedly. Most inspiring feature of a meeting like this.
Group bound together
Issues on flip charts, notebooks, heads.
This has developed into something larger and more valuable than (what are we doing with 856, 538)
Issue call now: Have those interested in continuing work on this issues, a more coherent plan, declare yourself in some way, email SET or Beth Davis-Brown. Letters. Papers. Come together, electronically or at ALA.
Sponsors, thank University of Virginia. OCLC material support. All the presenters. LC staff.
Group Reports of Action Plan
Lynn Marko
*Ongoing discussion - Emedia, annual meetings
*Tiered models approach to solutions
*Informatics - development of information systems; informatics alliances that build a bridge among librarians, specialists, and users
*Cooperation - multiple disciplined rather than format- based
*Training - for librarians and future librarians; training in engineering information systems, Internet and web setup, interface design, etc.
*Reduction of proprietary software on Internet
*MARC search engine design - examine redundancies
Edward Gaynor
*Follow-up - eMail lists, video conference proceedings, more information on expertise, ALCTS roadshow, white paper on some of the issues
*Need to see technology on site
*Survey of existing electronic projects
*Formal structure needed to deal with some of the issues
*Liaison between relevant organizations such as TEI and MARBI; networking - example: ALA and text producers
*Collections development
*Intellectual access - user surveys
*Projects outside of North America
*Cataloging issues - adaptation of MARC record, levels of treatment
*Publishing issues - distribution, fees, vendors
*Mapping MARC to SGML - utilities, Library of Congress, TEI
*Research on use of resources
*Education of librarians - catalogers, collections development staff, etc.; knowledge about equipment and technology; work with vendors, publishers, and scholars; work with legislatures to get more funding
Diane Vizine-Goetz
*Electronic objects - essential elements: databases; electronic journals (discussion lists); gopher ftp sites; FAQs; computer source code; image data; electronic texts; multimedia (hypertext)
*Elements of the record - users' needs; copyright; authentication; attribution; intellectual work; computerized extraction of data; links to deeper levels
Joan Swanekamp
*Problems and new models - involve cataloging community and users
*Sources of funding
*Liaison with publishers - needs and how to convey to them
*Centralized versus distributed access
*Cataloging rules - not sufficient; determine what should be cataloged
*Accessibility - interoperability of systems; layers; location of data; maintenance
*Core elements - vocabulary; links
*Standards - involve vendors, publishers, librarians, etc.