Skip Navigation Links The Library of Congress >> Cataloging
Program for Cooperative Cataloging - Library of Congress
  PCC Home >> Annual 2007 PCC Participants' Meeting
Find in

PCC Participants' Meeting Summary
ALA 2007 Annual Conference

DC Convention Center, Room 201
Washington, D.C.
June 24, 2007
4:00-6:00 p.m.

PCC Chair Mechael Charbonneau (Indiana University) opened the meeting with the introduction of chair elect Rebecca Mugridge (Pennsylvania State University) The Chair's program review included the introduction of new institutional members: The Frick Art Reference Library (BIBCO); Pennsylvania State University and the University of Pennsylvania (CONSER); new NACO funnel projects in New Jersey and Utah with independent members in South Carolina, Mexico, and India; and several new SACO only members in the United States and Canada. Statistics this year provide a milestone. The LC/NACO Authority File now contains seven million records. NACO members have contributed 39% of that total. LCSH, the largest subject authority file in the world, now has 308,000 subject authority records, of which 14% are from PCC members.

As part of the program review that went into PCC2010: Planning for the Future, a revised governance document has been posted to the PCC Web site; the tactical objectives and action items are near completion.

Charbonneau introduced several new faces to the meeting: the recent Policy Committee elections have brought in Robert Ellett (Joint Forces Staff College) festively flashed the meeting as a BIBCO representative; David Banush (Cornell University) for CONSER; James Mouw (University of Chicago) for NACO; and David Miller (Curry College) as the first SACO representative. Charbonneau also discussed with members recent initiatives. An Ad Hoc Task Group on Series Authorities is forming under co-chairs Les Hawkins and Carolyn Sturtevant, the CONSER and BIBCO Coordinators. The Standing Committee on Training is beginning two task groups on medical cataloging and on map cataloging.

Finally, Mechael Charbonneau as Chair of the PCC, has been invited to present remarks to the third session of the LC Working Group on the Future of Bibliographic Control. This will be held July 9, 2007, at the Library of Congress. Charbonneau invited members to send her comments for inclusion in her remarks by that date. Remarks may also be made directly to the LC Working Group by July 15, 2007. Robert Wolven (Columbia University) a past chair of the PCC, is a member of the LC working group.

Just as the chair recognized new members and officers, she delivered similar recognition to outgoing officers and individuals who had made significant contributions to the PCC. Certificates of appreciation were given to outgoing standing committee chairs, Policy Committee representatives, and to those who had worked on the CONSER Standard Record and the third edition of the SACO Participants' Manual.

The meeting ended with a reiteration by both Mechael Charbonneau and guest speaker Jay Girotto of their willingness to hear comments from interested parties about the July 9, 2007, meeting of the Library of Congress Working Group.

Rebecca Muggridge introduced the guest speaker for the meeting, Jay Girotto, a member of the LC Working Group on the Future of Bibliographic Control and a Microsoft vice president.

Mr. Girotto began his remarks on the cataloger's contribution to the next generation of information discovery with comments on how he had become interested in information discovery through his college research. He spoke also of his work on the LC Working Group on the Bibliographic Future. A particular difference that he noted was between the librarians on the working group, who approach issues through standards development while he sees things from the viewpoint of the end user.

There are currently four key themes in the information industry: information discovery; consumer behavior; the transformation of the industry and institutions. In these themes one finds three main user groups: the broad based information seeker; the professional researcher; and institutions.

Information discovery is the search technology catalyst. It can be a disruptive technology, making changes across industry. Some use social tools, some professional level tools, others information world tools. Mr. Girotto's remaining remarks focused on things Microsoft is working on.

Search has become ubiquitous. On a daily basis, about half of all searches are unique; about a quarter of all searches have never been seen before. This search environment offers certain challenges. It is limited in scope. It does not understand user intent in searching an ambiguous term. It provides links, not information. It lacks long-term memory of a user's search behavior; it has session memory only. On the part of the user, there is a disconnect between searching and in taking action upon the search results.

This takes place in a setting in which, whatever the public might think, only 5% of the world's information is currently available online. Even with digitization projects, electronic availability of Library of Congress assets will only amount to 10% in the foreseeable future.

Monitoring use of Microsoft's Live Search Books product shows that many searches are subject-based. Microsoft is trying to make this more effective by mapping both subject headings and name authorities.

As an example of searching behavior and problems, Girotto used the example of "Bush". Immediately, the user is faced with the "Bush" wanted: Reggie Bush, a President Bush, or shrubbery? The use of "snippets" will help: text taken from the web site and added to the link that appears in the search result. The search system can refine a search result through the user's creation of customized accounts. Fewer than 10% of users who have such accounts, however, invoke them when searching. Microsoft is looking for algorithmic patterns and cookies to provide a user profile without that person logging in.

Finally, there is the problem of taking action with the search results. Live search academic and Citation Export find the information--not simply a link--and directly import the information into the work of the searcher.

The ensuing question-and-answer session brought out further points.

Search engine optimization can be achieved through Web page design. When a crawler comes by, it navigates through the entire site map and pulls out metadata and links. This affects ranking in search results. Some sites are very hard to go into deeply, such as American Memory.

In response to a question about "search profiling" and privacy issues, Girotto said that the system reads the collection of cookies that forms during a search session. It refines search results as the session continues. As search memory is lost at the termination of a session, individual privacy is preserved. The system, however, uses the experience to build future search results to similar search strings.

Language issues are a challenge for Microsoft. There are no products yet to translate query terms or content. Dealing with multiple languages is easier if the user is logged in to allow development of a profile or preferences.

There are two people with library backgrounds in Girotto's working group: the Microsoft librarian, who has a library science degree. The member who runs technical computing issues came from an academic librarian background.

Microsoft is heavily invested in issues relating to non-text retrieval. Audio and video are easier. Images are harder. This entire area is in the early stage, using user-based cataloging such as Flickr. One drawback is that tagging and searching still uses words rather than using an image to find an image.

On his experience thus far with the LC Working Group for the Future of Bibliographic Control, Girotto said that it is very interesting seeing the work done with standards, how they change, and how they serve the needs of user groups. One of the surprises in the Working Group has been about understanding the user base. Many submissions are more about institutions and standards and not so much abut users wishes and needs, the central focus for commercial enterprises. For the upcoming third meeting on July 9th, he has been thinking about data silos-how can we use typical library data in new ways? How can we use data currently provided to build better services? We spent centuries building the info, and we want to use it.

In response to a question about using LCSH terms for searching, Girotto said that Microsoft is trying to accommodate users who are trying to navigate information on a subject level. They are even interested in reclassifying older material using the subject headings and classification from more recent books on a topic. They use the actual subject terms, not just keywords. This is a difference in approach from Google. Google is more interested in analyzing individual, digitized texts. Microsoft has done research into cross-mapping different thesauri. They bought Medstory to add to LiveSearch. And they are adding new search vocabularies to their system, even across language lines.

Name authority records are as useful as subject headings. Microsoft is experimenting with authority records to enhance searching. They are useful for name disambiguation on myriads of documents. Others are doing research on making name authority records more useful for searching. It is vital to know when you have something unique.

The business model for the work done by Microsoft and Google puzzles most. The companies exist to create products. In one sense, they are giving this away. Will libraries end up paying Microsoft for access to all the data they've given Microsoft? Microsoft contracts allow for works to be shared for academic purposes. Microsoft looks on its work as building up Web content that attracts users of its products and keeps users coming back. Their profit is made in commercial products available through the web. It is true that there is a market for metadata itself, but Microsoft has not been thinking about entering this market. Microsoft prefer to target content that will enhance user experience; hence its contracts to digitize library collections.

Search results through Microsoft and through Google differ due to differences in user traffic and use of links when searching a certain term. Microsoft is looking at query logs and unanswered queries, to gauge how long people stay in certain sites and which sites they do not use. These all affect future displays of search result on that term.

Top of Page Top of Page
  PCC Home >> Annual 2007 PCC Participants' Meeting
Find in
  The Library of Congress >> Cataloging >> PCC Home
  April 14, 2011
Legal | External Link Disclaimer

Contact Us  
BIBCO CONSER NACO SACO Program for Cooperative Cataloging Home