PCC Participants' Meeting Summary
ALA 2007 Annual Conference
DC Convention Center, Room 201
Washington, D.C.
June 24, 2007
4:00-6:00 p.m.
PCC Chair Mechael Charbonneau (Indiana University) opened the meeting with
the introduction of chair elect Rebecca Mugridge (Pennsylvania State
University) The Chair's program review included the introduction of new
institutional members: The Frick Art Reference Library (BIBCO); Pennsylvania
State University and the University of Pennsylvania (CONSER); new NACO
funnel projects in New Jersey and Utah with independent members in South
Carolina, Mexico, and India; and several new SACO only members in the
United States and Canada. Statistics this year provide a milestone. The
LC/NACO Authority File now contains seven million records. NACO members have
contributed 39% of that total. LCSH, the largest subject authority file in
the world, now has 308,000 subject authority records, of which 14% are from
PCC members.
As part of the program review that went into
PCC2010: Planning for the Future, a
revised governance document has been posted to the PCC Web site; the
tactical objectives and action items are near completion.
Charbonneau introduced several new faces to the meeting: the recent Policy
Committee elections have brought in Robert Ellett (Joint Forces Staff
College) festively flashed the meeting as a BIBCO representative; David
Banush (Cornell University) for CONSER; James Mouw (University of Chicago)
for NACO; and David Miller (Curry College) as the first SACO representative.
Charbonneau also discussed with members recent initiatives. An Ad Hoc Task
Group on Series Authorities is forming under co-chairs Les Hawkins and
Carolyn Sturtevant, the CONSER and BIBCO Coordinators. The Standing
Committee on Training is beginning two task groups on medical cataloging
and on map cataloging.
Finally, Mechael Charbonneau as Chair of the PCC, has been invited to
present remarks to the third session of the LC Working Group on the Future
of Bibliographic Control. This will be held July 9, 2007, at the Library of
Congress. Charbonneau invited members to send her comments for inclusion in
her remarks by that date. Remarks may also be made directly to the LC
Working Group by July 15, 2007. Robert Wolven (Columbia University) a past
chair of the PCC, is a member of the LC working group.
Just as the chair recognized new members and officers, she delivered
similar recognition to outgoing officers and individuals who had made
significant contributions to the PCC. Certificates of appreciation were
given to outgoing standing committee chairs, Policy Committee representatives,
and to those who had worked on the CONSER Standard Record and the
third edition of the SACO Participants' Manual.
The meeting ended with a reiteration by both Mechael Charbonneau and guest
speaker Jay Girotto of their willingness to hear comments from interested
parties about the July 9, 2007, meeting of the Library of Congress Working
Group.
Rebecca Muggridge introduced the guest speaker for the meeting, Jay Girotto,
a member of the LC Working Group on the Future of Bibliographic Control and
a Microsoft vice president.
Mr. Girotto began his remarks on the cataloger's contribution to the next
generation of information discovery with comments on how he had become
interested in information discovery through his college research. He spoke
also of his work on the LC Working Group on the Bibliographic Future. A
particular difference that he noted was between the librarians on the
working group, who approach issues through standards development while he
sees things from the viewpoint of the end user.
There are currently four key themes in the information industry: information
discovery; consumer behavior; the transformation of the industry and
institutions. In these themes one finds three main user groups: the broad
based information seeker; the professional researcher; and institutions.
Information discovery is the search technology catalyst. It can be a
disruptive technology, making changes across industry. Some use social tools,
some professional level tools, others information world tools. Mr.
Girotto's remaining remarks focused on things Microsoft is working on.
Search has become ubiquitous. On a daily basis, about half of all searches
are unique; about a quarter of all searches have never been seen before.
This search environment offers certain challenges. It is limited in scope.
It does not understand user intent in searching an ambiguous term. It
provides links, not information. It lacks long-term memory of a user's
search behavior; it has session memory only. On the part of the user, there
is a disconnect between searching and in taking action upon the search
results.
This takes place in a setting in which, whatever the public might think,
only 5% of the world's information is currently available online. Even with
digitization projects, electronic availability of Library of Congress
assets will only amount to 10% in the foreseeable future.
Monitoring use of Microsoft's Live Search Books product shows that many
searches are subject-based. Microsoft is trying to make this more effective
by mapping both subject headings and name authorities.
As an example of searching behavior and problems, Girotto used the example of
"Bush". Immediately, the user is faced with the "Bush" wanted: Reggie Bush,
a President Bush, or shrubbery? The use of "snippets" will help: text taken
from the web site and added to the link that appears in the search result.
The search system can refine a search result through the user's creation of
customized accounts. Fewer than 10% of users who have such accounts, however,
invoke them when searching. Microsoft is looking for algorithmic patterns
and cookies to provide a user profile without that person logging in.
Finally, there is the problem of taking action with the search results.
Live search academic and Citation Export find the information--not simply a
link--and directly import the information into the work of the searcher.
The ensuing question-and-answer session brought out further points.
Search engine optimization can be achieved through Web page design. When a
crawler comes by, it navigates through the entire site map and pulls out
metadata and links. This affects ranking in search results. Some sites are
very hard to go into deeply, such as American Memory.
In response to a question about "search profiling" and privacy issues, Girotto
said that the system reads the collection of cookies that forms during a
search session. It refines search results as the session continues. As
search memory is lost at the termination of a session, individual privacy
is preserved. The system, however, uses the experience to build future
search results to similar search strings.
Language issues are a challenge for Microsoft. There are no products yet to
translate query terms or content. Dealing with multiple languages is easier
if the user is logged in to allow development of a profile or preferences.
There are two people with library backgrounds in Girotto's working group:
the Microsoft librarian, who has a library science degree. The member who
runs technical computing issues came from an academic librarian background.
Microsoft is heavily invested in issues relating to non-text retrieval.
Audio and video are easier. Images are harder. This entire area is in the
early stage, using user-based cataloging such as Flickr. One drawback is
that tagging and searching still uses words rather than using an image to
find an image.
On his experience thus far with the LC Working Group for the Future of
Bibliographic Control, Girotto said that it is very interesting seeing the
work done with standards, how they change, and how they serve the needs of
user groups. One of the surprises in the Working Group has been about
understanding the user base. Many submissions are more about institutions
and standards and not so much abut users wishes and needs, the central focus
for commercial enterprises. For the upcoming third meeting on July 9th, he
has been thinking about data silos-how can we use typical library data in
new ways? How can we use data currently provided to build better services?
We spent centuries building the info, and we want to use it.
In response to a question about using LCSH terms for searching, Girotto said
that Microsoft is trying to accommodate users who are trying to navigate
information on a subject level. They are even interested in reclassifying
older material using the subject headings and classification from more
recent books on a topic. They use the actual subject terms, not just
keywords. This is a difference in approach from Google. Google is more
interested in analyzing individual, digitized texts. Microsoft has done
research into cross-mapping different thesauri. They bought Medstory to add
to LiveSearch. And they are adding new search vocabularies to their system,
even across language lines.
Name authority records are as useful as subject headings. Microsoft is
experimenting with authority records to enhance searching. They are useful
for name disambiguation on myriads of documents. Others are doing research
on making name authority records more useful for searching. It is vital to
know when you have something unique.
The business model for the work done by Microsoft and Google puzzles most.
The companies exist to create products. In one sense, they are giving this
away. Will libraries end up paying Microsoft for access to all the data
they've given Microsoft? Microsoft contracts allow for works to be shared
for academic purposes. Microsoft looks on its work as building up Web
content that attracts users of its products and keeps users coming back.
Their profit is made in commercial products available through the web. It is
true that there is a market for metadata itself, but Microsoft has not been
thinking about entering this market. Microsoft prefer to target content that
will enhance user experience; hence its contracts to digitize library
collections.
Search results through Microsoft and through Google differ due to differences
in user traffic and use of links when searching a certain term. Microsoft is
looking at query logs and unanswered queries, to gauge how long people stay
in certain sites and which sites they do not use. These all affect future
displays of search result on that term.
|