1 / 55

Stanford InterLib Technologies

Stanford InterLib Technologies. Hector Garcia-Molina and the Stanford DigLib Team. Stanford Digital Libraries Team. Faculty: Dan Boneh, Hector Garcia-Molina, Terry Winograd Research Scientist Andreas Paepcke Librarians Vicky Reich, Rebecca Wesley Partners:

candie
Download Presentation

Stanford InterLib Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stanford InterLib Technologies Hector Garcia-Molina and the Stanford DigLib Team

  2. Stanford Digital Libraries Team • Faculty: • Dan Boneh, Hector Garcia-Molina, Terry Winograd • Research Scientist • Andreas Paepcke • Librarians • Vicky Reich, Rebecca Wesley • Partners: • InterLib Partners, ACM, Dialog, Hitachi, IBM, Intel, Microsoft, NASA Ames Library, Stanford Libraries,SUL HighWire Press, Xerox

  3. Barriers to Effective DLs Physical Barriers Economic Concerns Information Loss Information Overload Service Heterogeneity

  4. Thrusts Physical Barriers • Mobile Access Economic Concerns • IP Infrastructure Information Loss • Archival Repository Information Overload • Value Filtering Service Heterogeneity • Interoperability

  5. Digital Libraries DL Interoperability Challenges • Growing number of players, formats, countries,... • Repositories  Services • Dynamic artifacts • Reliability

  6. DL Interoperability Challenges • Growing number of players, formats, countries,... • Repositories  Services • Dynamic artifacts • Reliability Solution: InfoBus  InterServ

  7. InfoBus Example Q: Find Ti distributed (W) systems Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash

  8. InfoBus Example Q: Find Ti distributed (W) systems Suggested: Folio, Dialog Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash

  9. InfoBus Example Q: Find Ti distributed (W) systems Query Translation Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash Q’: Find Ti distributed AND systems

  10. InfoBus Example Q: Find Ti distributed (W) systems Pay per View Query Trans Meta Data Con- tracts DLite Gloss U-Pai Dialog Proxy Folio Proxy DigiCash Proxy F.V. Proxy F.V. Folio Dialog DigiCash

  11. InterServ Dynamic Artifacts Services “Sophistication” Perpetual Activity InfoBus InfoBus Pro “Maturity”

  12. Perpetual Activity Service Service register P.A.S. User Request state & plans

  13. Perpetual Activity Service Service register restart service, use alternate P.A.S. check check User Request restore state, try alternatives state & plans

  14. SDLIP • Simple Digital Library Interoperability Protocol • Goal: get InterLib (and DLI2) to interoperate!!

  15. Search Protocol: Initial Goals • Trivial to implement! • Works over CORBA/COM, DASL/HTTP • Use XML • Does not prescribe query format • Does not prescribe result format • Small footprint (Desktop/Laptop/PDA) • Allows for stateful or stateless operation But lets you say whatyou’re using

  16. Result AccessInterface Information Client DeliveryInterface InterLibWrapper SearchInterface SourceMetadataInterface Interface Consists of Four Components

  17. SDLIP Status • Design Meeting June 22, 1999

  18. SDLIP Status • Design Meeting June 22, 1999 • Client & Server Toolkits Available • Extensive Documentation • Seehttp://www-diglib.Stanford.EDU/~testbed/doc2/SDLIP/

  19. Current SDLIP Sources • Some Web sources • People Lookup: www.switchboard.com • Altavista • IMDB (movies) • NCSTRL services: www.ncstrl.org • Dienst compliant services, e.g., CoRR? • Z39.50 servers • e.g., Library of Congress • Stanford WebBase • CDL • e.g., MELVYL gateway • DASL-compliant servers

  20. Existing Clients • Java • command line • applet • C++ • Palm Pilot • TCL (Ray Larson) • DASL-compliant clients

  21. Filtering Challenges • Too much information • Not controlled

  22. Current Filtering textual similarity

  23. Page Rank Filtering textual similarity page rank (Google)

  24. Initial Page Rank 1 4

  25. Recursive Page Rank 2 1 2 1+2+1+2 = 6 4 1 6

  26. Value Filtering access textual similarity opinions page rank context geography

  27. Value Filtering Challenges • Collection of Value Information • Scalability • Privacy of Value Information • Understanding Page Rank • Searching Non-Text Objects • Combining Value Information • HCI Aspects

  28. WebBase Goals • Manage very large collections of Web pages • Enable large-scale Web-related research • Locally provide a significant portion of the Web • Efficient wide-area Web data distribution

  29. Huge information space Wide area distribution URL space (to remember while crawling) Web content (to store) Limited resources Disk Time Memory Bandwidth Server administrator tolerance Continuous evolution More pages Pages change/disappear Mirror sites installed Keeping data “fresh” Crawling issues Data ‘fiefdoms’: firewalls; access permissions; load controls Overhead per site: DNS lookups; processing robots.txt Parallelization Ability to interrupt & restart Challenges

  30. Web Crawler Web Crawler Web Crawler Web Crawlers WebBase Architecture Client Client Webbase API WWW Retrieval Indexes Feature Repository Repository Multicast Engine Client Client Client Client

  31. Mobile Access Challenges • Limited Resources • Transitions Between Devices • Exploiting Context

  32. Mobile Access Challenges • Limited Resources • Transitions Between Devices • Exploiting Context Solutions: • Power Browsing • Information Tiles • Information Paging

  33. Power Browsing

  34. Power Browsing  • Techniques • Show only text headers • Show URLs, anchors, titles • Order URLs by page rank • Summarize text • Summarize set of pages • Low-resolution pictures • Display “relevant” text • ...

  35. PowerBrowser - Start Screen

  36. PowerBrowser - Hypertext View

  37. PowerBrowser - Text View

  38. PowerBrowser - History

  39. IP Management Challenges • Heterogeneity • Complexity of Interactions • Varied Information Appliances • Mobile Access • Security/Privacy

  40. Fundamental Problem • Safeguards (security, privacy, authentication, payment, non-repudiation...) are afterthought • “Spaghetti” code for safeguards • Experience at Stanford: • InterPay, CommPacts, Copy Detection • Goal was interoperability • Correctness, complexity were problems

  41. Example: Simple Pay Per View transfer(amt, account, libAccount) patron library bank view(docId, account, amt)

  42. Example: Simple Payment transfer(amt, account, libAccount) patron library bank view(docId, account, amt) • Goals • Do not want others to see data • Do not want library to see account number • Need receipt from bank

  43. Example: Simple Payment transfer(amt, account, libAccount) patron library bank view(docId, account, amt) • Goals • Do not want others to see data • Do not want library to see account number • Need receipt from bank Result: A Mess!!

  44. Declarative Safeguards for DLs • Safeguards built in at system design time • Declare goals, not mechanisms • Players, data, ... • Who can see what, who can do what, ...(Note: access information can also be protected) Secure DLs Components: IP Mgmt, Wallets, ... Declarative Infrastructure

  45. Solution • Extended Interface Definition Language • Corba or D-COM like • Example: class artRecord { authorized(policy) setOwner(encrypted string ownerName, encrypted(bank) int price, picture pic; ) …}

  46. Declarative Safeguards for DLs Secure DLs Components: IP Mgmt, Wallets, ... Declarative Infrastructure

  47. Information Preservation Challenges • Preserving the Bits • Evolving hardware • Evolving software • Evolving organizations • Preserving the Meaning

  48. set Stanford Archival Repository • Object Identifier  Signature handle • No Deletions (never ever!) set new version?

  49. Repository Layers Intellectual Property Indexing, Naming Reliability Complex Objects Identity Object Store

  50. Archiving the Web - Problem users Web Server File System

More Related