750 likes | 901 Views
Knowledge Management Systems: Development and Applications Part III: Case Studies and Future. Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation.
E N D
Knowledge Management Systems: Development and ApplicationsPart III: Case Studies and Future Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAP 美國亞歷桑那大學, 陳炘鈞博士
Knowledge Management Systems: Case Studies
Multi-lingual Knowledge Portal (1M):Meta searching, post-retrieval analysis, summarization, categorization, AI Lab tooolkits
Knowledge Portals are online searching systems that provide large amount of information resources and services within a specific domain. Providing frequently updated and highly domain-specific information. Providing efficient and precise searching service. Providing advanced analysis functionalities which can help users find the information needed among huge amount of data. Providing additional tools such as Personalization and Alerting System to facilitate the searching tasks.
NanoPort: Knowledge Portal for Nanotechnology Researchers Goal: Providing information services to nanotechnology researchers. The design of the content and function is based on the feedback of Nanoscale Science and Engineering (NSSE) experts. Content: 1,000,000 high quality nanotechnology-related webpages in database. Meta-search 4 search engines, 5 online databases and 3 online journals Key Features: Dynamic summarization Folder display Visualization using self-organizing map (SOM) Patent nalysis Funding: US National Science Foundation (NSF) Nano Initiative Demo: http://nanoport.org/
Folder display Visualization with SOM The original page Input keywords Summary Select search engines Select online databases Summarize result dynamically Select online journals Highlight the summary in the original page with corresponding color Click on the summary sentence and jump to its position in the original page Folder display Visualization using SOM
MedTextus: English Medical Intelligence Goal: Providing information services to researchers in medical domain. Content: Meta-search 5 large medicine-related online databases and journals. Key Features: Keyword suggester Folder display Visualization using SOM Funding: US National Library of Medicine (NLM) Demo: http://ai23.bpa.aizona.edu/medtextus/
Select databases Input keywords Keyword suggested by the system Keyword suggester Advanced search options Folder display Visualization with SOM Result page
eBizPort: English Business Intelligence Goal: Providing business, trading and financial information services to commercial users. Content: 500,000 high quality webpages in database. Meta-search 10 authoritative online business magazines. Key Features: Search by date Keyword suggester Dynamic summarization Folder display Visualization using SOM Demo: http://ai18.bpa.arizona.edu:8080/ebizport/
Result page Keyword suggester Keyword suggested by the system Limit the date of the result pages Date of the result page Folder display and SOM
Chinese Medical Intelligence (CMI) Goal: Providing medical and health information services to both researchers and public. Content: 350,000 high quality medical-related webpages collected from mainland China, Hong Kong and Taiwan. Meta-search 3 large general Chinese search engines. Key Features: Built-in Simplified/Traditional Chinese encoding conversion Dynamic summarization for both Simplified and Traditional Chinese Automatic categorization Visualization using SOM Demo: http:// 128.196.40.169:8000/gbmed/
Results are from both Simplified and Traditional Chinese Select websites from mainland China, Hong Kong and Taiwan Original encoding of the result Simplified/Traditional Chinese summarization Select search engines from mainland China, Hong Kong and Taiwan Traditional Chinese results haven been converted into simplified Chinese Chinese folder display Simplified Chinese summary Chinese visualization with SOM Traditional Chinese summary
Chinese Business Intelligence (CBI) Goal: Providing business, trading and financial information services to Chinese commercial users. Content: 300,000 high quality webpages collected from Mainland China, Hong Kong and Taiwan. Key Feature: Built-in Simplified/Traditional Chinese encoding conversion Dynamic summarization for both Simplified and Traditional Chinese Folder display Visualization using SOM Demo http://ai14.bpa.arizona.edu:8081/nanoport/
The largest business, trading and financial websites in mainland China, Hong Kong and Taiwan Both Simplified and Traditional results are retured Chinese folder display Simplified Chinese summary Chinese summarizer Traditional Chinese summary Chinese visualization with SOM
Detailed directory of Spanish business resources on the Web Keyword suggestion from Scirus and Concept Space Supports boolean searching and allows the display of 10, 20, 30, 50, or 100 results per each meta searchers Meta searches 7 major sources and provides searching of its own collection (PIN) Spanish Business Intelligence Portal Keyword: comercio electronico Search, Organize, or Visualize results Search, Organize, or Visualize results Search, Organize, or Visualize results
Search Page Automatic keyword suggestion Results organized by meta searchers Summarize in 3 or 5 sentences A three-sentence summary on left Original page shown on right Result Page Summarizer Web pages visualized by self-organizing map (SOM) algorithm Categorizer Visualizer Web pages grouped by key phrases extracted by mutual information algorithm (non-exclusive categorization)
Search Page Spanish Business Taxonomy Web sites about the topic “Electronic Commerce” in Spanish speaking countries
Provides a virtual Arabic keyboard to facilitate input Arabic Medical Intelligence Portal Search Page Result Page Categorizer Visualizer
Lessons Learned The content selection and functionality design of knowledge portal should meet the need of real users. Using meta-search together with other traditional data collecting methods can improve the recall without sacrificing the precision of the knowledge portal. The structure of the webpage may introduce noise into the dynamic summary. The AI Lab toolkits support scalable multi-lingual spidering, indexing, searching, summarization, and categorization New Spanish and Arabic portals completed New cross-lingual web retrieval engine completed
Biomedical Informatics (10M):Biomedical content, biomedical ontologies, linguistic phrasing, categorization, text mining
What does database cover? Search which databases? How many documents? Enter search term HelpfulMED search of Evidence-based Databases
Enter search term Select relevant search terms New terms are posted Search again... Or find relevant webpages Consulting HelpfulMED Cancer Space (Thesaurus)
1 Visual Site Browser Top level map 2 3 Diagnosis, Differential 4 Brain Neoplasms 5 Brain Tumors Browsing HelpfulMED Cancer Map
Genescene Overview Knowledge Base Integrate gene relations from literature and outside databases and provide knowledge for learning and evaluation in data mining Data Mining Process gene expression data (and existing knowledge) and use different algorithms to extract regulatory networks Text Mining Process Medline abstracts and extract gene relations automatically from the text Interface & Visualization Allow searching for keywords, display a map of the relations extracted from the text and/or from the microarray
Genescene Overview JIF External Databases Ontologies HUGO Medline GO Publications & Meta Information Knowledge Base Publications XML Parser UMLS Titles & Abstracts GeneScene GeneScene Text Mart Text Mining Relation Parsers Information Retrieval Visualization GeneScene Data Mart Concept Space AZ Noun Phraser POS Tagging Data Mining Adjuster & Tagger Full Parser Lexical lookup Relations in flat files Spring Algorithm FSA Relation Grammar Micro Array Data Co-occurrence relations Bayesian Networks UMLS Relations in flat files Feature Structures Association Rule Mining
Problem: Gene Pathway • Title Key roles for E2F1 in signaling p53-dependent apoptosis and in cell division within developing tumors. • Abstract: Apoptosis induced by the p53 tumor suppressor can attenuate cancer growth in preclinical animal models. Inactivation of the pRb proteins in mouse brain epithelium by the T121 oncogene induces aberrant proliferation and p53-dependent apoptosis. p53 inactivation causes aggressive tumor growth due to an 85% reduction in apoptosis. Here, we show that E2F1 signals p53-dependent apoptosis since E2F1 deficiency causes an 80% apoptosis reduction. E2F1 acts upstream of p53 since transcriptional activation of p53 target genes is also impaired. Yet, E2F1 deficiency does not accelerate tumor growth. Unlike normal cells, tumor cell proliferation is impaired without E2F1, counterbalancing the effect of apoptosis reduction. These studies may explain the apparent paradox that E2F1 can act as both an oncogene and a tumor suppressor in experimental systems Expert errs and corrects Final graph
Overview Double click to expand
Finding the truth: p38 acts as a negative feedback for Ras signaling
Lessons Learned: Biomedical information is precise but terminologies fluid SOM performance for medical documents = 80% Biomedical professionals need search and analysis help Biomedical linguistic parsing and ontologies are promising for biomedical text mining The need for integrated biomedical data (gene microarray) and text mining (literature) New testbeds completed: p53, AP1, and yeast
COPLINK Crime Data Mining (10M):Intelligence and security informatics, crime association, crime network analysis and visualization
COPLINK Connect Consolidating & Sharing Information promotes problem solving and collaboration Records Management Systems (RMS) Gang Database Mugshots Database
Generic, common XML based criminal elements representation Data migration (batch and incremental) and mapping for all major databases and legacy systems Database independent: ODBC compliance data warehouse Multi-layered Web-based architecture: database server, Web server, browser Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc. Graphical browser-based GUI interface for ease of use, training and maintenance COPLINK Connect Functionality H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.
COPLINK Detect Consolidated information enables targeted problem solving via powerful investigative criminal association analysis
Simple association rule mining applied to criminal elements relationships Generic, common XML based representation for criminal relationships Incremental data migration and association analysis on databases Support powerful, multi-attribute queries using partial crime information Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval COPLINK Detect Functionality H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
Systems stable and shown useful. Commercialized and supported by KCC Systems deployed at: TPD, UAPD, PPD, Phoenix, Huntsville (TX), Des Moines (Iowa), Ann Arbor (Michigan), Boston (Massachusetts), Montgomery county (sniper investigation) Systems under deployment: Salt River (AZ), Cambridge (Massachusetts), Redmond (Washington), many others COPLINK acclaims at LA Times and New York Times, Newsweek (sniper investigation) COPLINK Connect/Detect Status
COPLINK Criminal Network Analysis: Association Tree, Association Network Analysis, Temporal-Spatial Visualization P1000: A Picture is worth 1000 words. Use visual representations and effective HCI to assist in more efficient and effective crime analysis Leverage different representations and algorithms: hyperbolic trees, network placement algorithms, structural analysis, geo-spatial mapping, time visualization COPLINK Visual Data Mining Research H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
Criminal association identification Using shortest-path algorithms to find the strongest associations between two or more criminals in a network SNA (Social Network Analysis) Using blockmodel analysis to detect subgroups and patterns of interactions between groups Identifying leaders, gatekeepers, and outliers from a criminal network COPLINK Criminal Structural Analysis (3rd generation) J. Xu & H. Chen, “Criminal Network Analysis: A Data Mining Perspective,” Decision Support Systems, 2004, forthcoming.
Data Sets TPD incident summaries Time period—Narcotics: 2000-present; Gangs: 1995-present Size Two testing networks Narcotics (60 individuals) Gang (24 individuals) COPLINK SNA Experiment