620 likes | 940 Views
Concept Switching in the Interspace: Networking Infrastructure for Community Knowledge. Bruce Schatz CANIS Laboratory Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Graduate School of Informatics, Kyoto University
E N D
Concept Switching in the Interspace:Networking Infrastructure for Community Knowledge Bruce Schatz CANIS LaboratoryGraduate School of Library and Information ScienceUniversity of Illinois at Urbana-Champaign Graduate School of Informatics, Kyoto University schatz@kuis.kyoto-u.ac.jp, www.canis.uiuc.edu IEEE Knowledge Media Networking KMN’02 Keynote Address, CRL, Kyoto Japan, July 11, 2002
THE THIRD WAVE OF NET EVOLUTION CONCEPTS OBJECTS PACKETS
CONCEPT SPACES • from Objects to Concepts • from Syntax to Semantics • Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories
Technology Engineering FORMAL (manual) Electrical IEEE communities INFORMAL groups (automatic) individuals LEVELS OF INDEXES
THE DISTRIBUTED WORLD • Community Repositories in the Interspace • Peer to Peer Networking Infrastructure • Every Person performs Every Role USER request LIBRARIAN reference INDEXER classify PUBLISHER quality AUTHOR generate
Meta Data How to Represent the Community Knowledge Automatic and Interactive Representation Techniques for Capturing the Fundamental Structure
Meta Maps How to Locate the Community Knowledge Automatic and Interactive Location Techniques for Capturing the Fundamental Landscape
SCALABLE SEMANTICS • Automatic indexing • Domain-Independent indexing • Statistical clustering • Compute Context of • concepts within documents • documents within repositories
COMPUTING CONCEPTS ‘92: 4,000 (molecular biology) ‘93: 40,000 (molecular biology) ‘95: 400,000 (electrical engineering) ‘96: 4,000,000 (engineering) ‘98: 40,000,000 (medicine)
SIMULATING A NEW WORLD • Obtain discipline-scale collection • MEDLINE from NLM, 10M bibliographic abstracts • human classification: Medical Subject Headings • Partition discipline into Community Repositories • 4 core terms per abstract for MeSH classification • 32K nodes with core terms (classification tree) • Community is all abstracts classified by core term • 40M abstracts containing 280M concepts • concept spaces took 2 days on NCSA Origin 2000 • Simulating World of Medical Communities • 10K repositories with > 1K abstracts (1K w/ > 10K)
Semantic Indexing • Extracting Concepts (AI) • Canonical noun phrases • Generic statistical parser • Computing Context (IR) • Co-occurrence frequency, in collection • Useful interactively, not strict ordering
System Side Infrastructure Classification Technologies for Multimedia Documents • Phrases (multi-word nouns) • Concepts (generic phrases) • Types (identified concepts) • Clusters (grouped types) • Structures (semantic universals)
INTERSPACE NAVIGATION • Semantic Indexes for Community Repositories • Navigating Abstractions within Repository • concept space & category map • Interactive browsing by Community experts *www.canis.uiuc.edu/interspace-prototype
Navigation in MEDSPACE For a patient with Rheumatoid Arthritis • Find a drug that reduces the pain (analgesic) • but does not cause stomach (gastrointestinal) bleeding Choose Domain
User Side Infrastructure Navigation Technologies for Search Interfaces • Exact Match (noun phrases) • Relationship List (concept suggestions) • Cluster Comparison (groups to groups) • Spreading Activation (group intersections) • Artificial Landscapes (semantic distances)
SWITCHING In the Interspace… • each Community maintains its own repository • Switching is navigating Across repositories • use your vocabulary to search another specialty
Semantic region term Concept Space Concept Space CONCEPT SWITCHING • “Concept” versus “Term” • set of “semantically” equivalent terms • Concept switching • region to region (set to set) match
Future Technologies • Concept Switching • Spreading activation, type tagging • Dynamic Indexing • On-the-fly collections, during session • Path Matching • Aggregating indexes, many repositories
Semantic Analysis of Multimedia • Collections of Objects containing Units • Text: community repository (topic proximity) document abstracts containing noun phrases • Image: aerial photograph (spatial proximity) feature regions containing texture tiles • Units -- media-dependent (statistical parsers) • Indexes -- media-independent (statistical clusters)
Media Interoperability Model • text concept space & category map (geoscience) • 1M phrases in 500K abstracts from Georef and Petroleum Abstracts • image concept & category maps in aerial photos • visual thesaurus maps for 200K regions in 800 images (6M tiles) • geographic map (where) v. semantic map (what) • spatial gazetteer as bridge image<=>text<=>number
Text and Number Interoperability Integrated Result: Within the bounding geography location, 2 documents and 88 AVHRR records related to the integrated query are retrieved. Text and AVHRR Query: Show me information about Santa Barbara area with mild temperature and high vegetation density.
Image Concept Switching Image Query: By browsing a texture (tile) catalog, show me information about residential and farm land areas. Result: A set of related images are retrieved and shown in the Results Frame. The full-size image #368 is displayed with its place names and tile locations.
INFORMATION SPACEFLIGHT • Landscape as category map visualization • Valleys are semantic clusters • Hills are semantic distances • Traversal across multiple levels of abstraction
INFORMATION SPACEFLIGHT Flying through Cyberspace
THE NET OF THE 21st CENTURY • Beyond Objects to Concepts • Beyond Search to Analysis • Problem Solving via Cross-Correlating Multimedia Information across the Net • Every community has its own special library • Every community does semantic indexing • The Interspace is true Cyberspace