1 / 24

High-Performance Digital Library Classification Systems:

High-Performance Digital Library Classification Systems:. From Information Retrieval to Knowledge Management. PI : Hsinchun Chen, The University of Arizona. DLI-2 All-Projects Meeting. Cornell, October 18-19, 1999. Research Plan. Research Goals:.

aideen
Download Presentation

High-Performance Digital Library Classification Systems:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Performance Digital Library Classification Systems: From Information Retrieval to Knowledge Management PI: Hsinchun Chen, The University of Arizona DLI-2 All-Projects Meeting Cornell, October 18-19, 1999

  2. Research Plan Research Goals: • Automatic generation of large-scale classification systems (CL) • Integration of system and human-generated classification systems • High-performance simulation and visualization of Object Oriented Hierarchical Automatic Yellowpage (OOHAY)

  3. Research Plan 10 M 1 M 250 K 800 K 26 K 250 K Geoscience Medicine The Web Testbed: • Geoscience: Georef and Petroleum Abstracts (800K) and Georef thesaurus (26K terms) • Medicine: CancerLit (1M) and UMLS (250K concepts) • The Web: Indexable pages (10M) and Yahoo directory (250K nodes)

  4. Research Plan • Computing: PA • Collections: Georef Arizona Health Science Library Arizona Cancer Center Arizona Science and Engineering Library • User Evaluation: Partners:

  5. The Field Knowledge Management/Knowledge Networking: Definition “The Knowledge Networking (KN) initiative focuses on the integration of knowledge from different sources and domains across space and time... KN research aims to move beyond connectivity to achieve new levels of interactivity, increasing the semantic bandwidth, knowledge bandwidth, activity bandwidth, an cultural bandwidth among people, organizations, and communities.”

  6. The Field Knowledge Management Functionality: (Source: GartnerGroup, 1998) Concept “Yellow Pages” Retrieved Knowledge • Clustering — categorization “table of contents” • Semantic Networks “index” • Dictionaries • Thesauri • Linguistic analysis • Data extraction • Collaborative filters • Communities • Trusted advisor • Expert identification Semantic Value “Recommendation” Collaboration

  7. Techniques Illinois DLI-1 project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies • Text Tokenization • Part-of-speech-tagging • Noun phrase generation Natural Language Processing Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  8. Techniques • Text Tokenization • Part-of-speech-tagging • Noun phrase generation Natural Language Processing Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  9. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis • Heuristic term weighting • Weighted co-occurrence analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  10. Techniques • Heuristic term weighting • Weighted co-occurrence analysis Co-occurrence analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  11. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis Neural Network Analysis • Document clustering • Category labeling • Optimization and parallelization Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  12. Techniques • Document clustering • Category labeling • Optimization and parallelization Neural Network Analysis Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  13. Techniques Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Co-occurrence analysis Neural Network Analysis Advanced Visualization • 1D: alphabetic listing of categories • 2D: semantic map listing of categories • 3D: interactive, helicopter fly-through using VRML Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

  14. Techniques • 1D, 2D, 3D Advanced Visualization Automatic Generation of CL:

  15. Techniques Automatic Generation of CL: (Continued) • Entity Extraction and Co-reference based on TREC and MUG • Text segmentation and summarization based on Textile and Wavelets • Visualization techniques based on Fisheye, Fractal, and Spotlight

  16. Techniques Integration of CL: • Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) • Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) • Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) • Spreading-activation based term suggestion (e.g., Hopfield net)

  17. Techniques High-performance Simulation and Visualization: • Algorithmic optimization and parallelization on NCSA supercomputers (time machine) • Advanced, interactive 2D/3D visualization via Java, VRML, and OpenGL

  18. Research Status Y A H O O Y A H O O Y A H O O Y A H O O O O H A Y O O H A Y O O H A Y O Y H A O From YAHOO! To OOHAY? Y A H O O ! Object Oriented Hierarchical Automatic Yellowpage ?

  19. Research Status Arizona DLI-2 project: “From Interspace to OOHAY?” Research goal: automatic and dynamic categorization and visualization of ALL the web pages in US (and the world, later) Technologies: OOHAY techniques Multi-threaded spiders for web page collection High-precision web page noun phrasing and entity identification Multi-layered, parallel, automatic web page topic directory/hierarchy generation Dynamic web search result summarization and visualization Adaptive, 3D web-based visualization OOHAY: Visualizing the Web

  20. Research Status ROCK MUSIC … 50 6 OOHAY: Visualizing the Web

  21. For project information and free download: http://ai.bpa.arizona.edu Research Status OOHAY: CI Spider, Meta Spider, Med Spider 1. Enter Starting URLs and Key Phrases to be searched 2. Search results from spiders are displayed dynamically

  22. For project information and free download: http://ai.bpa.arizona.edu Research Status OOHAY: CI Spider, Meta Spider, Med Spider 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. 4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results.

  23. Research Status Digital Library Research on New York Times, Cover article, Sep 30, 1999

  24. Research Status • IEEE Computer, May 1996 (Schatz/Chen) • IEEE Computer, February 1999 (Schatz/Chen) DL Special Issues and Activities: • Second Asia DL Workshop, November 8-9, 1999, Taipei, Taiwan • JASIS, 2000, forthcoming (Chen) Berkeley (Wilensky), UCSB (Hill/Smith), Maryland (Greene/Shneiderman), Xerox PARC (Baldonado), IBM (Liu), Texas A&M (Shipman/Furuta), NASA (Kaplan)

More Related