1 / 74

Knowledge Management Systems: Development and Applications Part III: Case Studies and Future

Knowledge Management Systems: Development and Applications Part III: Case Studies and Future. Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation.

zeus-alston
Download Presentation

Knowledge Management Systems: Development and Applications Part III: Case Studies and Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Management Systems: Development and ApplicationsPart III: Case Studies and Future Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAP 美國亞歷桑那大學, 陳炘鈞博士

  2. Knowledge Management Systems: Case Studies

  3. Multi-lingual Knowledge Portal (1M):Meta searching, post-retrieval analysis, summarization, categorization, AI Lab tooolkits

  4. Knowledge Portals are online searching systems that provide large amount of information resources and services within a specific domain. Providing frequently updated and highly domain-specific information. Providing efficient and precise searching service. Providing advanced analysis functionalities which can help users find the information needed among huge amount of data. Providing additional tools such as Personalization and Alerting System to facilitate the searching tasks.

  5. NanoPort: Knowledge Portal for Nanotechnology Researchers Goal: Providing information services to nanotechnology researchers. The design of the content and function is based on the feedback of Nanoscale Science and Engineering (NSSE) experts. Content: 1,000,000 high quality nanotechnology-related webpages in database. Meta-search 4 search engines, 5 online databases and 3 online journals Key Features: Dynamic summarization Folder display Visualization using self-organizing map (SOM) Patent nalysis Funding: US National Science Foundation (NSF) Nano Initiative Demo: http://nanoport.org/

  6. Folder display Visualization with SOM The original page Input keywords Summary Select search engines Select online databases Summarize result dynamically Select online journals Highlight the summary in the original page with corresponding color Click on the summary sentence and jump to its position in the original page Folder display Visualization using SOM

  7. MedTextus: English Medical Intelligence Goal: Providing information services to researchers in medical domain. Content: Meta-search 5 large medicine-related online databases and journals. Key Features: Keyword suggester Folder display Visualization using SOM Funding: US National Library of Medicine (NLM) Demo: http://ai23.bpa.aizona.edu/medtextus/

  8. Select databases Input keywords Keyword suggested by the system Keyword suggester Advanced search options Folder display Visualization with SOM Result page

  9. eBizPort: English Business Intelligence Goal: Providing business, trading and financial information services to commercial users. Content: 500,000 high quality webpages in database. Meta-search 10 authoritative online business magazines. Key Features: Search by date Keyword suggester Dynamic summarization Folder display Visualization using SOM Demo: http://ai18.bpa.arizona.edu:8080/ebizport/

  10. Result page Keyword suggester Keyword suggested by the system Limit the date of the result pages Date of the result page Folder display and SOM

  11. Chinese Medical Intelligence (CMI) Goal: Providing medical and health information services to both researchers and public. Content: 350,000 high quality medical-related webpages collected from mainland China, Hong Kong and Taiwan. Meta-search 3 large general Chinese search engines. Key Features: Built-in Simplified/Traditional Chinese encoding conversion Dynamic summarization for both Simplified and Traditional Chinese Automatic categorization Visualization using SOM Demo: http:// 128.196.40.169:8000/gbmed/

  12. Results are from both Simplified and Traditional Chinese Select websites from mainland China, Hong Kong and Taiwan Original encoding of the result Simplified/Traditional Chinese summarization Select search engines from mainland China, Hong Kong and Taiwan Traditional Chinese results haven been converted into simplified Chinese Chinese folder display Simplified Chinese summary Chinese visualization with SOM Traditional Chinese summary

  13. Chinese Business Intelligence (CBI) Goal: Providing business, trading and financial information services to Chinese commercial users. Content: 300,000 high quality webpages collected from Mainland China, Hong Kong and Taiwan. Key Feature: Built-in Simplified/Traditional Chinese encoding conversion Dynamic summarization for both Simplified and Traditional Chinese Folder display Visualization using SOM Demo http://ai14.bpa.arizona.edu:8081/nanoport/

  14. The largest business, trading and financial websites in mainland China, Hong Kong and Taiwan Both Simplified and Traditional results are retured Chinese folder display Simplified Chinese summary Chinese summarizer Traditional Chinese summary Chinese visualization with SOM

  15. Detailed directory of Spanish business resources on the Web Keyword suggestion from Scirus and Concept Space Supports boolean searching and allows the display of 10, 20, 30, 50, or 100 results per each meta searchers Meta searches 7 major sources and provides searching of its own collection (PIN) Spanish Business Intelligence Portal Keyword: comercio electronico Search, Organize, or Visualize results Search, Organize, or Visualize results Search, Organize, or Visualize results

  16. Search Page Automatic keyword suggestion Results organized by meta searchers Summarize in 3 or 5 sentences A three-sentence summary on left Original page shown on right Result Page Summarizer Web pages visualized by self-organizing map (SOM) algorithm Categorizer Visualizer Web pages grouped by key phrases extracted by mutual information algorithm (non-exclusive categorization)

  17. Search Page Spanish Business Taxonomy Web sites about the topic “Electronic Commerce” in Spanish speaking countries

  18. Provides a virtual Arabic keyboard to facilitate input Arabic Medical Intelligence Portal Search Page Result Page Categorizer Visualizer

  19. Lessons Learned The content selection and functionality design of knowledge portal should meet the need of real users. Using meta-search together with other traditional data collecting methods can improve the recall without sacrificing the precision of the knowledge portal. The structure of the webpage may introduce noise into the dynamic summary. The AI Lab toolkits support scalable multi-lingual spidering, indexing, searching, summarization, and categorization New Spanish and Arabic portals completed New cross-lingual web retrieval engine completed

  20. Biomedical Informatics (10M):Biomedical content, biomedical ontologies, linguistic phrasing, categorization, text mining

  21. HelpfulMED Search of Medical Websites

  22. What does database cover? Search which databases? How many documents? Enter search term HelpfulMED search of Evidence-based Databases

  23. Enter search term Select relevant search terms New terms are posted Search again... Or find relevant webpages Consulting HelpfulMED Cancer Space (Thesaurus)

  24. 1 Visual Site Browser Top level map 2 3 Diagnosis, Differential 4 Brain Neoplasms 5 Brain Tumors Browsing HelpfulMED Cancer Map

  25. Genescene Overview Knowledge Base Integrate gene relations from literature and outside databases and provide knowledge for learning and evaluation in data mining Data Mining Process gene expression data (and existing knowledge) and use different algorithms to extract regulatory networks Text Mining Process Medline abstracts and extract gene relations automatically from the text Interface & Visualization Allow searching for keywords, display a map of the relations extracted from the text and/or from the microarray

  26. Genescene Overview JIF External Databases Ontologies HUGO Medline GO Publications & Meta Information Knowledge Base Publications XML Parser UMLS Titles & Abstracts GeneScene GeneScene Text Mart Text Mining Relation Parsers Information Retrieval Visualization GeneScene Data Mart Concept Space AZ Noun Phraser POS Tagging Data Mining Adjuster & Tagger Full Parser Lexical lookup Relations in flat files Spring Algorithm FSA Relation Grammar Micro Array Data Co-occurrence relations Bayesian Networks UMLS Relations in flat files Feature Structures Association Rule Mining

  27. Problem: Gene Pathway • Title Key roles for E2F1 in signaling p53-dependent apoptosis and in cell division within developing tumors. • Abstract: Apoptosis induced by the p53 tumor suppressor can attenuate cancer growth in preclinical animal models. Inactivation of the pRb proteins in mouse brain epithelium by the T121 oncogene induces aberrant proliferation and p53-dependent apoptosis. p53 inactivation causes aggressive tumor growth due to an 85% reduction in apoptosis. Here, we show that E2F1 signals p53-dependent apoptosis since E2F1 deficiency causes an 80% apoptosis reduction. E2F1 acts upstream of p53 since transcriptional activation of p53 target genes is also impaired. Yet, E2F1 deficiency does not accelerate tumor growth. Unlike normal cells, tumor cell proliferation is impaired without E2F1, counterbalancing the effect of apoptosis reduction. These studies may explain the apparent paradox that E2F1 can act as both an oncogene and a tumor suppressor in experimental systems Expert errs and corrects Final graph

  28. Prepositions: OF/BY/IN

  29. Example Map (one abstract)

  30. Select interesting relations to visualize

  31. Overview Double click to expand

  32. Expanded node

  33. Finding the truth: p38 acts as a negative feedback for Ras signaling

  34. Lessons Learned: Biomedical information is precise but terminologies fluid SOM performance for medical documents = 80% Biomedical professionals need search and analysis help Biomedical linguistic parsing and ontologies are promising for biomedical text mining The need for integrated biomedical data (gene microarray) and text mining (literature) New testbeds completed: p53, AP1, and yeast

  35. COPLINK Crime Data Mining (10M):Intelligence and security informatics, crime association, crime network analysis and visualization

  36. COPLINK Connect Consolidating & Sharing Information promotes problem solving and collaboration Records Management Systems (RMS) Gang Database Mugshots Database

  37. Generic, common XML based criminal elements representation Data migration (batch and incremental) and mapping for all major databases and legacy systems Database independent: ODBC compliance data warehouse Multi-layered Web-based architecture: database server, Web server, browser Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc. Graphical browser-based GUI interface for ease of use, training and maintenance COPLINK Connect Functionality H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.

  38. COPLINK Detect Consolidated information enables targeted problem solving via powerful investigative criminal association analysis

  39. Simple association rule mining applied to criminal elements relationships Generic, common XML based representation for criminal relationships Incremental data migration and association analysis on databases Support powerful, multi-attribute queries using partial crime information Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval COPLINK Detect Functionality H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.

  40. COPLINK Detect 2.0/2.5

  41. Systems stable and shown useful. Commercialized and supported by KCC Systems deployed at: TPD, UAPD, PPD, Phoenix, Huntsville (TX), Des Moines (Iowa), Ann Arbor (Michigan), Boston (Massachusetts), Montgomery county (sniper investigation) Systems under deployment: Salt River (AZ), Cambridge (Massachusetts), Redmond (Washington), many others COPLINK acclaims at LA Times and New York Times, Newsweek (sniper investigation) COPLINK Connect/Detect Status

  42. COPLINK Criminal Network Analysis: Association Tree, Association Network Analysis, Temporal-Spatial Visualization P1000: A Picture is worth 1000 words. Use visual representations and effective HCI to assist in more efficient and effective crime analysis Leverage different representations and algorithms: hyperbolic trees, network placement algorithms, structural analysis, geo-spatial mapping, time visualization COPLINK Visual Data Mining Research H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.

  43. A 9/11 Terrorist Network

  44. COPLINK Association Tree and Network (2nd generation)

  45. Criminal association identification Using shortest-path algorithms to find the strongest associations between two or more criminals in a network SNA (Social Network Analysis) Using blockmodel analysis to detect subgroups and patterns of interactions between groups Identifying leaders, gatekeepers, and outliers from a criminal network COPLINK Criminal Structural Analysis (3rd generation) J. Xu & H. Chen, “Criminal Network Analysis: A Data Mining Perspective,” Decision Support Systems, 2004, forthcoming.

  46. The proposed framework

  47. Data Sets TPD incident summaries Time period—Narcotics: 2000-present; Gangs: 1995-present Size Two testing networks Narcotics (60 individuals) Gang (24 individuals) COPLINK SNA Experiment

  48. A narcotic network example

More Related