160 likes | 278 Views
Bioinformatics and Machine Learning: Building Probabilistic Models of Gene Expression from Microarray Data. William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University
E N D
Bioinformatics and Machine Learning:Building Probabilistic Modelsof Gene Expression from Microarray Data William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases http://www.kddresearch.org/Groups/Bioinformatics
Overview • Computer Science: What We Do • Software: operating systems, programming languages, software engineering, databases • Hardware: logic design, organization and architecture • Theory of Computation: algorithms, complexity, languages • Artificial Intelligence (AI): learning, reasoning, planning, agents • Computer Graphics, Geometry, and Vision • Computational Science and Engineering (CSE) • Artificial Intelligence (AI) – Fields of Study • Areas: learning, planning, vision, robotics • Applications in science, engineering, business, and defense • Computer Graphics – Some Current Projects and Fun Stuff • Computer-Aided Design (CAD) and Engineering (CAE) • Information Visualization • Computer-Generated Images (CGI) and Animation (CGA) • High-Performance Computing: Linux and Beowulf
6500 news stories from the WWW in 1997 SPIRIX software ThemeScapes http://www.cartia.com Information Retrieval (IR) and Text Mining: Commercial Applications
Genetic Wrapper for Change of Representation and Inductive Bias Control [2] Representation Evaluator for Learning Problems Dtrain(Inductive Learning) D: Training Data Dval(Inference) : Inference Specification f(α) Representation Fitness α Candidate Representation [1] Genetic Algorithm Optimized Representation Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [1]
[2] Representation Evaluator for Input Specifications [A] Inductive Learning (Parameter Estimation from Training Data) Dtrain(Model Training) h Hypothesis [B] Validation (Measurement of Inferential Loss) Dval(Model Validation by Inference) : Evidence Specification f(α) Specification Fitness (Inferential Loss) α Candidate Input Specification Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [2]
Learning Environment [A] Structure Learning G2 G1 G4 G5 D: Microarray Data G3 G = (V, E) Graph Component of BN [B] Parameter Estimation G2 G1 G4 G5 B = (V, E, ) BN with Probabilities G3 Specification Fitness (Inferential Loss) Dval(Model Validation by Inference)
A Gene Network for Yeast[Friedman, Nachman, Linial, Pe’er, 2000]
Publication (e.g., PubMed) Experiment Source (e.g., Taxonomy) Gene (e.g., GenBank) Sample Hybridization Array Normalization/ Discretization Data Components of A Microarray Experiment:Hybridization
Computational Workflows (e.g., myGrid) Pathway & Network Learning Specification Feature Selection Specification Experimental Services & Metadata (Mage-ML XML) Gene Expression Model Data Preprocessing Specification Parameter Learning Specification Model Analysis Specification Discretization Use Case Data Mining Use Case Validation (e.g., Bootstrap) Use Case Components of A Microarray Experiment:Computational Gene Expression Modeling
DESCRIBER: An ExperimentalIntelligent Filter • Example Queries: • What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? • What codes and microarray data were used, and why? Users of Scientific Document Repository DESCRIBER Learning and Inference Components Historical Use Case & Query Data Personalized Interface New Queries Domain-Specific Collaborative Filtering Decision Support Models Interface(s) to Distributed Repository Domain-Specific Repositories Experimental Data Source Codes and Specifications Data Models Ontologies Models
Module 2 Learning & Validation of Bayesian Network Models for Use Cases Estimation of Constraint Parameters Module 3 Graphical Models of Use Cases Historical Use Case & Query Data Module 4 Learning & Validation of Bayesian Network Models for MAGE Data & Codes Data Personalized Interface Module 5 MAGE Data Model User New Queries Module 1 Intelligent Collaborative Filtering Front-End Relational Modelsof MAGE Data Constrained Models of Use Cases DESCRIBEROverview
Module 1 Personalized Interface New Query from User Intelligent Collaborative Filtering Front-End Response to User Relational Models of (Domain-Specific) Data Integrated Reasoning Component: XML Validator and Constraint Checker Relational Probabilistic Model Constraint Selector Constraints on Repository Content Constrained Models of Use Cases DESCRIBERCollaborative Filtering Module