220 likes | 233 Views
Technologies of the Interspace Peer-Peer Semantic Indexing. Bruce Schatz CANIS Laboratory Graduate School of Library and Information Science University of Illinois at Urbana-Champaign www.canis.uiuc.edu, schatz@uiuc.edu. Graduate School of Informatics Kyoto University, November 21, 2001.
E N D
Technologies of the InterspacePeer-Peer Semantic Indexing Bruce Schatz CANIS LaboratoryGraduate School of Library and Information ScienceUniversity of Illinois at Urbana-Champaign www.canis.uiuc.edu, schatz@uiuc.edu Graduate School of Informatics Kyoto University, November 21, 2001
THE THIRD WAVE OF NET EVOLUTION CONCEPTS OBJECTS PACKETS
SCALABLE SEMANTICS • Automatic indexing • Domain-Independent indexing • Statistical clustering • Compute Context of • concepts within documents • documents within repositories
COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)
SIMULATING A NEW WORLD • Obtain discipline-scale collection • MEDLINE from NLM, 10M bibliographic abstracts • human classification: Medical Subject Headings • Partition discipline into Community Repositories • 4 core terms per abstract for MeSH classification • 32K nodes with core terms (classification tree) • Community is all abstracts classified by core term • 40M abstracts containing 280M concepts • concept spaces took 2 days on NCSA Origin 2000 • Simulating World of Medical Communities • 10K repositories with > 1K abstracts (1K w/ > 10K)
Existing Technologies • Extracting Concepts (AI) • Canonical noun phrases • Generic statistical parser • Computing Context (IR) • Co-occurrence frequency, in collection • Useful interactively, not strict ordering
CONCEPT NAVIGATION • Semantic Indexes for Community Repositories • Navigating Abstractions within Repository • concept space • category map • Interactive browsing by Community experts
Semantic region term Concept Space Concept Space CONCEPT SWITCHING • “Concept” versus “Term” • set of “semantically” equivalent terms • Concept switching • region to region (set to set) match
Future Technologies • Concept Switching • Spreading activation, similarity clusters • Path Matching • Aggregating indexes, many repositories • Dynamic Indexing • On-the-fly collections, during session
Peer-Peer Computations • Local Interaction • Your PC does small computations • e.g. screensaver for SETI • Global Merging • Partition computation into small parts • Each local forms part of global whole • Large-Scale Distribution • 3M users of SETI@Home • Public Health. www.intel.com/cure
THE NET OF THE 21st CENTURY • Beyond Objects to Concepts • Beyond Search to Analysis • Problem Solving via Cross-Correlating Multimedia Information across the Net • Every community has its own special library • Every community does semantic indexing
Zen of Information Retrieval • Searching without Searching • Navigate concepts into documents • Based on interactive recognition • Indexing without Indexing • Compute context on dynamic collections • Based on distributed extraction • Sharing without Sharing • Record paths during user sessions • Based on community practices