700 likes | 891 Views
从无的放矢到个性化的知识探索 From OPAC to Personalized Discovery. Foster Zhang Systems Integration Digital Library Systems and Services fjzhang@stanford.edu. 问题的提出.
E N D
从无的放矢到个性化的知识探索From OPAC to Personalized Discovery Foster Zhang Systems Integration Digital Library Systems and Services fjzhang@stanford.edu
问题的提出 • Today’s library catalog no longer meets the expectation of users accustomed to Internet search engines. -- Eric Celeste, Associate University Librarian, University of Minnesota • Only 30% of students turn to library websites when searching for scholarly information sources. • Only 2% of students begin their searches at library sites. -- OCLC survey
使用图书馆与使用检索引擎的对比(LibQUAL’05) CNI Spring 2005 Task force meeting
读者对图书馆服务的期望 • Start from one place to search all • Full-text, please! • Easy move from a citation to the item itself • Systems will provide lots of intelligent assistance • Authenticated single sign-on • Security and privacy • Do not have to come the library to use the service
门户是否是解决办法? Web portals are sites on the World Wide Web that typically provide personalized capabilities to their visitors. -- http://en.wikipedia.org/wiki/Web_portal
User’s problem Personal research interests vs. World of knowledge Library’s solution Classification and Cataloging for Library-own materials 图书馆员的思路与读者的想法有差距
9. Availability 10. Library of Congress Classification • Subject: Topic • Subject: Genre • Format • Library • Subject: Region • Subject: Era • Language • Author
Technical Overview • Endeca co-exists with SirsiDynix Unicorn ILS and Web2 online catalog. • Endeca indexes MARC records exported from Unicorn. • Index is refreshed nightly with records added/updated during previous day.
Online System Basic Architecture Offline Process Endeca DataFoundry Endeca NavigationEngine Endeca PresentationServer HTTP Raw Data Sources Navigation Engine Indices Data Foundry Configuration XML Endeca Studio Client Browser
MARC Records… Step 1: Data Transformation Activity: Transform data source(s) into RecordXML Endeca Forge Data Foundry RecordXML MARC MARCXML RecordXML Transform Transform Rules XSLTscripts Set up Data Pipeline
Step 2: Data Pipeline Editing Activity: Create orthogonal facets from MARC fields; clean up data Endeca Foundry Record Header …. 008 …. …. …. 650 $a 651 $a Geographic Subject 654 $a …. 656 $a 657 $a 6## $d, $y Time Period 6## $v Format 6## $x General, modifies $a 6## $z Geographic Subject [RH CP6 + 008 CP23] Physical Form Filters & Scripts [008 CP22] Target Audience [008 CP24-27 + CP28 + CP33] Content Form [008 CP35-37] Language [650,654,656,657 $a + 6## $x] Subject - General [651 $a + 6## $z] Subject - Geographic [6## $d + 6## $y] Subject – Time Period [6## $v] Format
Publication Date: from to Language …English, Spanish, French, German Content Form …Not Fiction, Fiction, Novels, Speeches Target Audience …General, Juvenile, Adolescent Physical Form …Language Material, Projected Material,Cartographic Material, Manuscript Subject – Other Name …IBM, American Express, Endeca, Barnes & Noble Subject – Person Name…Jack Blount, Ric Rodriguez, Mark Calkins, Steve Nielsen, Rob Madsen Step 3: Create Dynamic Dimensions Endeca Foundry Activity: Use GUI in Dev Studio to designate & configure Dynamic Dimensions. Underlying data automatically drives dimension category values. Publication Date Language Content Form Target Audience Physical Form Subject – Other Name Subject – Person Name Subject – General Subject – Time Period
LOC Taxonomy Step 4: Hierarchy Transformation Activity: Create and run a script to transform input hierarchies into DimensionXML. Atlernatively, for smaller hierarchies, can use GUI to build in the Developer Studio. Data Foundry DimensionXML Transform Transform Rules scripts Class A – General WorksClass B – PhilosophyClass C – Auxiliary Sciences of HistoryClass D – History, General and Old WorldClass E – History, America: USClass F – History, America: AllClass G – Geography, Anthropology…Class H – Social SciencesClass J – Political ScienceClass K – Law Break out Facets
Step 5: Create Edited Dimensions Endeca Foundry Activity: Use GUI in Dev Studio to designate & configure Edited Dimensions. Underlying data automatically matches to values from hierarchy; dead ends suppressed & stored for future matches. LOC Classification Subject – Geography LOC ClassificationClass A – General WorksClass B – PhilosophyClass C – Auxiliary Sciences of HistoryClass D – History, General and Old WorldClass E – History, America: USClass F – History, America: AllClass G – Geography, Anthropology…Class H – Social SciencesClass J – Political ScienceClass K – LawClass L – Education … … Subject – GeographyNorth AmericaSouth AmericaEuropeAsiaAfrica … RecordXMLBook Records 050 H1.xxx.xxx
Step 6: Load Indices and Create UI Activity: Use GUI in Dev Studio to designate & configure all dimension options. Underlying data automatically matches to values from hierarchy; dead ends are not shown, but stored for future potential matches. Use API and standard Web coding to modify reference implementation to final UI. Data Foundry Navigation Engine Indices ASP JSP .Net MDEXEngine Endeca API DGraph (exe / binary)
Easy move from a citation to the item itself Discovery 如何发现相关的引文 Citation 引文 OpenURL resolver 解析器 Full text sources 全文源
赋予读者检索与研究的主导能力 • Expanded possibilities for users • Guide users back to library’s resources
将读者引回图书馆的例子 • Users control what they want to go and see • Script codes made available for free share • Currently only work on Firefox • Personalize information services