1 / 13

Overview of Biomedical Informatics

Overview of Biomedical Informatics. Vipin Kumar University of Minnesota kumar@cs.umn.edu www.cs.umn.edu/~kumar Team Members: Michael Steinbach, Rohit Gupta, Gowtham Atluri, Gang Fang, Gaurav Pandey, Sanjoy Dey, Vanja Paunic

ncastro
Download Presentation

Overview of Biomedical Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Biomedical Informatics Vipin Kumar University of Minnesota kumar@cs.umn.edu www.cs.umn.edu/~kumar Team Members: Michael Steinbach, Rohit Gupta, Gowtham Atluri, Gang Fang, Gaurav Pandey, Sanjoy Dey, Vanja Paunic Collaborators: Brian Van Ness, Bill Oetting, Gary L. Nelsestuen, Christine Wendt, Piet C. de Groen, Michael Wilson Research Supported by NSF, IBM, BICB-UMR, Pfizer Nov 12th, 2009 Understanding Biotechnology – The Science of the ‘Omics’

  2. Biomedical Informatics • Recent technological advances are helping to generate large amounts of biomedical data • Data from high-throughput experimental techniques • Gene expression data • Biological networks • Proteomics and metabolomics data • Single Nucleotides Polymorphism (SNP) data • Electronic Medical Records • IBM-Mayo clinic partnership has created a DB of 5 million patients • Great potential benefits from the analysis of these large-scale data sets: • Automated analysis of patients history for customized treatment • Discovery of biomarkers for complex diseases and other phenotypes • Cheminformatics and drug discovery 2

  3. Large-scale Data is Everywhere! There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies New mantra Gather whatever data you can whenever and wherever possible. Expectations Gathered data will have value either for the purpose collected or for a purpose not envisioned. Homeland Security Business Data Geo-spatial data Computational Simulations Sensor Networks Scientific Data

  4. Data Clustering Predictive Modeling Anomaly Detection Association Rules Milk Data Mining • Automated techniques for analyzing large data sets. • Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems.

  5. Find a model for class attribute as a function of the values of other attributes Predictive Modeling: Classification Model for predicting credit worthiness Class

  6. Discovering biomarkers • Gene Expression Data • Given: n labeled subjects, each with expression levels of p genes • Objectives: build a predictive model to identify cancer subtypes Genes Classical study of cancer subtypes Golub et al. (1999) identification of diagnostic genes • SNP Data • Given: n labeled subjects, each with genotypes of p SNPs • Objectives: build a model using genotypes to predict labels.

  7. Predicting short-term vs. long-term survivors among myeloma subjects 3404 SNPs (Selected according to potential relevance to Myeloma) Cases: 70 Patients who survived shorter than 1 year Controls: 73 Patients survived longer than 3 years SNPs cases Brian Van Ness et al, Genomic Variation in Myeloma: Design, content and initial application of the Bank On A Cure SNP Panel to detect associations with progression free survival, BMC Medicine, Volume 6, pp 26, 2008. controls

  8. Clustering Applications: Finding groups of similar genes or proteins based upon their expression profiles Clustering of patients based on phenotypic and genotypic factors for efficient disease diagnosis Market Segmentation Document Clustering Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Courtesy: Michael Eisen Michael Eisen et al, 1999

  9. Association Pattern Discovery Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Biological applications Identifying functional modules in protein interaction networks Identifying transcription modules in gene expression data Identifying biological entities associated with disease phenotypes Biomarker discovery from genomic data, e.g. gene expression, Single-nucleotide polymorphism(SNP), metabolite data etc. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

  10. Discovery of Discriminative Patterns from Lung Cancer Gene Expression Data 67 Normal samples, 102 cancer patients, 8787 genes [Stearman et al. 2005], [Su et al. 2007], [Bhattacharjee et al. 2001] Visualization of a size-10 pattern using a new discriminative pattern finding technique Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression  Analysis: Problem Definition and A General Approach, In the Proceedings of the 15th Pacific Symposium on Biocomputing (PSB), pp. 145-156, 2010. Enriched with the TNF/NFkB signaling pathway which is well-known to be related to lung cancer P-value: 1.4*10-5 (6/10 overlap with the pathway)

  11. Discriminative Metabolite Patterns from Liver Cirrhosis Data 41 alcoholic liver cirrhosis (row 1-41), 19 controls (row 42-60), 3610 metabolites Data from Gary Nelsestuen et al. A sample group of five metabolites having very similar (in relative terms) intensity values in cases, but mostly absent in controls. (a) The rank values (black is 10, white is 0), (b) original intensity values. Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers and Vipin Kumar, An Association Analysis Approach to Biclustering, Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 677-686, 2009. (a) (b)

  12. Summary • Data mining techniques hold great promise for data-driven hypothesis generation in the biomedical domain. • Ample scope exists for the development and application of novel techniques for the analysis of different types of biomedical data.

  13. For further information… • Visit www.cs.umn.edu/~kumar/dmbio. • Send email to kumar@cs.umn.edu. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005.

More Related