1 / 34

Digital Libraries: Re-inventing Scholarly Information Dissemination and Use

Digital Libraries: Re-inventing Scholarly Information Dissemination and Use. Robert Wilensky Principal Investigator David Forsyth Co-principal Investigator The UC Berkeley Digital Library Team. Central Thrusts. Provide tools to facilitate changing the publishing model

bjoseph
Download Presentation

Digital Libraries: Re-inventing Scholarly Information Dissemination and Use

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Libraries:Re-inventing Scholarly Information Dissemination and Use Robert Wilensky Principal Investigator David Forsyth Co-principal Investigator The UC Berkeley Digital Library Team

  2. Central Thrusts • Provide tools to facilitate • changing the publishing model • from centralized, linear, binary, expensive, “filter-then-disseminate” model, • to a much less costly, powerful, fully distributed “disseminate-filter-collaborate” cycle • without sacrificing good organization, peer review • treating non-textual material (photos, video, maps, primary data sets) as first class citizens

  3. Who We Are • Other Investigators • Henry Baird (Xerox PARC) • Bernie Hurley (UCB Library) • Pinar Duygulu (Middle East Technical University) • Students • Byunghoon Kang • Xiaofeng Ren • Sumeet Solanki • Staff • Ginger Ogle • Jeff Anderson-Lee • Howard Foster • Loretta Willis • Joyce Gross • Tom Phelps • PI and Co-PI: • Robert Wilensky (CS & SIMS) • David Forsyth (CS) • Faculty Investigators • Richard Fateman (CS) • Ray Larson (SIMS) • Jitendra Malik (CS) • Philip Stark (Statistics) • Doug Tygar (CS & SIMS) • Nancy Van House (SIMS) • Hal Varian (SIMS) • Marti Hearst (SIMS) • James Landay (CS) • Joe Hellerstein (CS) • Post-docs • Kobus Barnard • Tracy Riggs • Byunghoon Kang • Jon Traupman

  4. Partners • UCB Organizations • Museum of Vertebrate Zoology • Jebson Herbarium • U.C.B. Library • U.C.B. Instructional Technology Program • DLIB InterOp Project Partners • Stanford, UCSB • California Digital Library • SDSC • Not-for-profits • CalFlora • California Academy of Science • Fine Arts Museum of S.F. • California Department of Fish and Game • Corporate • Xerox PARC • Hewlett-Packard • IBM Almaden • NEC • SUN Microsystems • Microsoft • Sharp

  5. Some Technology • New Document Models • Multivalent Documents • GIS Viewer • Related Tools: TilePic and GISLite • Collaborative quality filtering as a proxy for academic review • Robust Linking • Personal Libraries • Image Analysis: • Better content-based image analysis • Combining image and text • Self-administrating Documents • Document recognition: • A turbo recognition DID-based approach to document layout analysis. • Collections: • Biologically-related large image and data collections • Rare books scanning effort

  6. Tools for Information Management and Collaboration • Multivalent Documents: A Platform for New Ideas • An “Anytime, Anywhere, Any Type, Every Way User-Improvable Digital Document Platform” • Not format-centric. radically extensibleto • support any format • perform standard document functionality • implement your new idea • Extensions work across all formats.

  7. Multivalent Architecture • Extensibility achieved by • behaviors and layers paradigm • behaviors written to conform to an open protocol • document tree (that includes UI) • So each document can be its own custom browser. • Conducive to developing a “digital library”-centric browser • E.g., easy to support distributed annotation.

  8. Multivalent Status • Multivalent Browser, DR4, available; beta ASN • An open source Java (1.4) application, at http://http.cs.berkeley.edu/~phelps/Multivalent/ • standard browser features (cache, UI, bookmarks, etc.), robust URL support • Implemented behaviors: • Media Adaptors: • HTML 3.2 + CSS • LaTex/DVI • ASCII • PDF • “enlivened scanned images” • “multi-page” • Span: hyperlink, highlight, copyeditor annotations, “anchored ink”, style, redaction • Lenses: “show OCR”, magnify, cypher, notes, rulers, etc. • Structural: alt. select-and-paste, Notemarks • Misc: search hit visualization, “managers”

  9. Multivalent Plans • Support project goals by providing • Complete media adaptors for common document formats (esp. HTML+CSS, XML, LaTeX/DVI, PDF) • More standard browser features (e.g., hierarchical bookmarks, preference editing) • Experiment with “history-enriched digital objects” • Mechanisms for manipulating multiple annotations • Support from document collaboration services • Support for (non-textual) data types • temporal and geographic extent, via JMF 2.0 • involving dynamic elements • data set elements

  10. Related Image-oriented Tools • GIS Viewer • 4.0 released to public • Related Tools: • TilePic • GIS Lite

  11. Robustness: The Challenge • How do we put together distributed applications • that rely on independently administered distributed resources • which change chaotically • yet whose performance degrades gracefully as the world changes? • One answer: Provide multiple, largely independent descriptions along uncontrolled network boundaries.

  12. Robust Linking • Robust Locations • Refer to locations within a resource, but can still be used to find the location after the page is edited. • Implemented in Multivalent Browser • Robust Hyperlinks • Refer to whole resources, but can still be used to find the resource after the page is moved, etc. • Available now: http://www.cs.berkeley.edu/~phelps/Robust

  13. Robust Hyperlink Example • Compute “lexical signature” of page http://www.eng.nsf.gov/engnews/2001/Dec01RobotLegos/dec01robotlegos.htm • which turns out to be jjarosz lambirth telesurgery jarosz simulating • Add to URL to make robust URL: • http://www.eng.nsf.gov/engnews/2001/Dec01RobotLegos/dec01robotlegos.htm/?lexical-signature= jjarosz+lambirth+telesurgery+jarosz+simulating • Feed signature to a search engine on URL failure:

  14. Robust Hyperlinks: Plans • Problem: No one wants to bother signing anything. • Proposal: Build a URL-signature data base; fail over to this upon 404 errors. • using Stanford’s WebBase

  15. Collaborative Quality Filtering • Idea: Traditional peer review is majorized by a good collaborative filtering system. • I.e., publishing = dissemination + collaborative quality filtering • Approach: • Good papers are ones good reviewers rate highly, etc. • Good reviewers are the ones that rate papers accurately. • Assumption: Good reviewers’ reviews should agree with the asymptotic average (looking forwards) • Use hubs-and-authorities type algorithm to establish credentials. • Note: • Can rate along multiple dimensions, e.g., importance and correctness • Later on, can add other factors, e.g., predication of asymptotic citation index, credentials, expertise

  16. Collaborative Quality Filtering (con’t) • Simple algorithm predicts users evaluations of reviewer in empirical study. • Parameters for number of items reviewer has reviewed, no. of reviews item receives, rank of review. • Advanced version incorporating notion of areas of expertise being tested. • Maintains reviewer ratings on a per document basis; computes document rating based on similarity of documents. • Initial implementations in collab. with NEC CiteSeer • More details are available.

  17. Personal Libraries • Goal: Make it easy for individuals and groups to build and manage document collections. • Seamlessly incorporating digital-born and legacy documents • Approach: Provide collection manager • Manages collections in distributed repositories • Initial prototype • Supports collection creation, population, editing, access by metadata searching, full-text indexing. • HTML, PDF, ASCII, scanned images, composites (prototype) • Provides affiliated repository service • Scan-to-collection service

  18. Personal Libraries (con’t) • Future directions: • Incorporate robust linking • Full support for composites • Automatic collection population • OceanStore backend? • Have begun experimental use by CS Division and SIMS

  19. Image Analysis for Access • BlobWorld++ • A new framework for segmentation (normalized cuts) • Shape • New algorithm developed for measuring image shape similarity using “shape contents” • Now hold world record for handwritten digit recognition • Combining Image Features with Text for Image Data Organization and Search • Kobus Barnard and David Forsyth

  20. Combining Image Features with Text • Idea: Use text and image features together • Text  semantic categories • Image features  visual similarity • Together  learn interesting relationships • Use statistical models to learn structure • Cluster on “blobs” and (disambiguated, hierarchically enhanced) words • Arrange clusters in hierarchy • Result is automatic organization of large collections for • Browsing (using GIS Viewer as interface, and using TilePic) • “auto-illustration” • “auto-captioning”

  21. Combining Image Features with Text • And here are some results for labeling image segments.

  22. Summary • We need to rethink the entire cycle of information use • creation, dissemination and collaboration • We must provide support for • finding and presenting non-textual material (photos, video, maps) • collection creation of primary data sources and informal “publication” • radically new modes of use • robustness in a chaotic world • We will need a lot of help!

More Related