1 / 22

Technologies of the Interspace Peer-Peer Semantic Indexing

Technologies of the Interspace Peer-Peer Semantic Indexing. Bruce Schatz CANIS Laboratory Graduate School of Library and Information Science University of Illinois at Urbana-Champaign www.canis.uiuc.edu, schatz@uiuc.edu. Graduate School of Informatics Kyoto University, November 21, 2001.

nzimmerman
Download Presentation

Technologies of the Interspace Peer-Peer Semantic Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technologies of the InterspacePeer-Peer Semantic Indexing Bruce Schatz CANIS LaboratoryGraduate School of Library and Information ScienceUniversity of Illinois at Urbana-Champaign www.canis.uiuc.edu, schatz@uiuc.edu Graduate School of Informatics Kyoto University, November 21, 2001

  2. THE THIRD WAVE OF NET EVOLUTION CONCEPTS OBJECTS PACKETS

  3. SCALABLE SEMANTICS • Automatic indexing • Domain-Independent indexing • Statistical clustering • Compute Context of • concepts within documents • documents within repositories

  4. CROSS-OVERS IN SEMANTIC INDEXING

  5. COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)

  6. SIMULATING A NEW WORLD • Obtain discipline-scale collection • MEDLINE from NLM, 10M bibliographic abstracts • human classification: Medical Subject Headings • Partition discipline into Community Repositories • 4 core terms per abstract for MeSH classification • 32K nodes with core terms (classification tree) • Community is all abstracts classified by core term • 40M abstracts containing 280M concepts • concept spaces took 2 days on NCSA Origin 2000 • Simulating World of Medical Communities • 10K repositories with > 1K abstracts (1K w/ > 10K)

  7. COMMUNITY PROCESSING

  8. Existing Technologies • Extracting Concepts (AI) • Canonical noun phrases • Generic statistical parser • Computing Context (IR) • Co-occurrence frequency, in collection • Useful interactively, not strict ordering

  9. CONCEPT NAVIGATION • Semantic Indexes for Community Repositories • Navigating Abstractions within Repository • concept space • category map • Interactive browsing by Community experts

  10. Category Map

  11. Category Navigation

  12. Concept Navigation

  13. Semantic region term Concept Space Concept Space CONCEPT SWITCHING • “Concept” versus “Term” • set of “semantically” equivalent terms • Concept switching • region to region (set to set) match

  14. Medicine Session

  15. Categories and Concepts

  16. Concept Switching

  17. Document Retrieval

  18. Future Technologies • Concept Switching • Spreading activation, similarity clusters • Path Matching • Aggregating indexes, many repositories • Dynamic Indexing • On-the-fly collections, during session

  19. Peer-Peer Computations • Local Interaction • Your PC does small computations • e.g. screensaver for SETI • Global Merging • Partition computation into small parts • Each local forms part of global whole • Large-Scale Distribution • 3M users of SETI@Home • Public Health. www.intel.com/cure

  20. THE NET OF THE 21st CENTURY • Beyond Objects to Concepts • Beyond Search to Analysis • Problem Solving via Cross-Correlating Multimedia Information across the Net • Every community has its own special library • Every community does semantic indexing

  21. Zen of Information Retrieval • Searching without Searching • Navigate concepts into documents • Based on interactive recognition • Indexing without Indexing • Compute context on dynamic collections • Based on distributed extraction • Sharing without Sharing • Record paths during user sessions • Based on community practices

More Related