550 likes | 680 Views
Stanford InterLib Technologies. Hector Garcia-Molina and the Stanford DigLib Team. Stanford Digital Libraries Team. Faculty: Dan Boneh, Hector Garcia-Molina, Terry Winograd Research Scientist Andreas Paepcke Librarians Vicky Reich, Rebecca Wesley Partners:
E N D
Stanford InterLib Technologies Hector Garcia-Molina and the Stanford DigLib Team
Stanford Digital Libraries Team • Faculty: • Dan Boneh, Hector Garcia-Molina, Terry Winograd • Research Scientist • Andreas Paepcke • Librarians • Vicky Reich, Rebecca Wesley • Partners: • InterLib Partners, ACM, Dialog, Hitachi, IBM, Intel, Microsoft, NASA Ames Library, Stanford Libraries,SUL HighWire Press, Xerox
Barriers to Effective DLs Physical Barriers Economic Concerns Information Loss Information Overload Service Heterogeneity
Thrusts Physical Barriers • Mobile Access Economic Concerns • IP Infrastructure Information Loss • Archival Repository Information Overload • Value Filtering Service Heterogeneity • Interoperability
Digital Libraries DL Interoperability Challenges • Growing number of players, formats, countries,... • Repositories Services • Dynamic artifacts • Reliability
DL Interoperability Challenges • Growing number of players, formats, countries,... • Repositories Services • Dynamic artifacts • Reliability Solution: InfoBus InterServ
InfoBus Example Q: Find Ti distributed (W) systems Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash
InfoBus Example Q: Find Ti distributed (W) systems Suggested: Folio, Dialog Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash
InfoBus Example Q: Find Ti distributed (W) systems Query Translation Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash Q’: Find Ti distributed AND systems
InfoBus Example Q: Find Ti distributed (W) systems Pay per View Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash
InterServ Dynamic Artifacts Services “Sophistication” Perpetual Activity InfoBus InfoBus Pro “Maturity”
Perpetual Activity Service Service register P.A.S. User Request state & plans
Perpetual Activity Service Service register restart service, use alternate P.A.S. check check User Request restore state, try alternatives state & plans
SDLIP • Simple Digital Library Interoperability Protocol • Goal: get InterLib (and DLI2) to interoperate!!
Search Protocol: Initial Goals • Trivial to implement! • Works over CORBA/COM, DASL/HTTP • Use XML • Does not prescribe query format • Does not prescribe result format • Small footprint (Desktop/Laptop/PDA) • Allows for stateful or stateless operation But lets you say whatyou’re using
Result AccessInterface Information Client DeliveryInterface InterLibWrapper SearchInterface SourceMetadataInterface Interface Consists of Four Components
SDLIP Status • Design Meeting June 22, 1999
SDLIP Status • Design Meeting June 22, 1999 • Client & Server Toolkits Available • Extensive Documentation • Seehttp://www-diglib.Stanford.EDU/~testbed/doc2/SDLIP/
Current SDLIP Sources • Some Web sources • People Lookup: www.switchboard.com • Altavista • IMDB (movies) • NCSTRL services: www.ncstrl.org • Dienst compliant services, e.g., CoRR? • Z39.50 servers • e.g., Library of Congress • Stanford WebBase • CDL • e.g., MELVYL gateway • DASL-compliant servers
Existing Clients • Java • command line • applet • C++ • Palm Pilot • TCL (Ray Larson) • DASL-compliant clients
Filtering Challenges • Too much information • Not controlled
Current Filtering textual similarity
Page Rank Filtering textual similarity page rank (Google)
Recursive Page Rank 2 1 2 1+2+1+2 = 6 4 1 6
Value Filtering access textual similarity opinions page rank context geography
Value Filtering Challenges • Collection of Value Information • Scalability • Privacy of Value Information • Understanding Page Rank • Searching Non-Text Objects • Combining Value Information • HCI Aspects
WebBase Goals • Manage very large collections of Web pages • Enable large-scale Web-related research • Locally provide a significant portion of the Web • Efficient wide-area Web data distribution
Huge information space Wide area distribution URL space (to remember while crawling) Web content (to store) Limited resources Disk Time Memory Bandwidth Server administrator tolerance Continuous evolution More pages Pages change/disappear Mirror sites installed Keeping data “fresh” Crawling issues Data ‘fiefdoms’: firewalls; access permissions; load controls Overhead per site: DNS lookups; processing robots.txt Parallelization Ability to interrupt & restart Challenges
Web Crawler Web Crawler Web Crawler Web Crawlers WebBase Architecture Client Client Webbase API WWW Retrieval Indexes Feature Repository Repository Multicast Engine Client Client Client Client
Mobile Access Challenges • Limited Resources • Transitions Between Devices • Exploiting Context
Mobile Access Challenges • Limited Resources • Transitions Between Devices • Exploiting Context Solutions: • Power Browsing • Information Tiles • Information Paging
Power Browsing • Techniques • Show only text headers • Show URLs, anchors, titles • Order URLs by page rank • Summarize text • Summarize set of pages • Low-resolution pictures • Display “relevant” text • ...
IP Management Challenges • Heterogeneity • Complexity of Interactions • Varied Information Appliances • Mobile Access • Security/Privacy
Fundamental Problem • Safeguards (security, privacy, authentication, payment, non-repudiation...) are afterthought • “Spaghetti” code for safeguards • Experience at Stanford: • InterPay, CommPacts, Copy Detection • Goal was interoperability • Correctness, complexity were problems
Example: Simple Pay Per View transfer(amt, account, libAccount) patron library bank view(docId, account, amt)
Example: Simple Payment transfer(amt, account, libAccount) patron library bank view(docId, account, amt) • Goals • Do not want others to see data • Do not want library to see account number • Need receipt from bank
Example: Simple Payment transfer(amt, account, libAccount) patron library bank view(docId, account, amt) • Goals • Do not want others to see data • Do not want library to see account number • Need receipt from bank Result: A Mess!!
Declarative Safeguards for DLs • Safeguards built in at system design time • Declare goals, not mechanisms • Players, data, ... • Who can see what, who can do what, ...(Note: access information can also be protected) Secure DLs Components: IP Mgmt, Wallets, ... Declarative Infrastructure
Solution • Extended Interface Definition Language • Corba or D-COM like • Example: class artRecord { authorized(policy) setOwner(encrypted string ownerName, encrypted(bank) int price, picture pic; ) …}
Declarative Safeguards for DLs Secure DLs Components: IP Mgmt, Wallets, ... Declarative Infrastructure
Information Preservation Challenges • Preserving the Bits • Evolving hardware • Evolving software • Evolving organizations • Preserving the Meaning
set Stanford Archival Repository • Object Identifier Signature handle • No Deletions (never ever!) set new version?
Repository Layers Intellectual Property Indexing, Naming Reliability Complex Objects Identity Object Store
Archiving the Web - Problem users Web Server File System