1 / 21

Data Management and Mining in BioArray Informatics

Data Management and Mining in BioArray Informatics. Prof. Yike Guo Dept. of Computing, Imperial College, London. Goal:. Understand the basic bioarray technology including microarray technology for gene expression, protein chips, NMR spectroscopy and other high throughout devices

bowen
Download Presentation

Data Management and Mining in BioArray Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management and Mining in BioArray Informatics Prof. Yike Guo Dept. of Computing, Imperial College, London

  2. Goal: • Understand the basic bioarray technology including microarray technology for gene expression, protein chips, NMR spectroscopy and other high throughout devices • Learn the basic analytical technology and its applications to the bioarray information • Learn the analysis processes of processing and analysing bioarray data (e.g. gene expression analysis)

  3. Lecture Overview • Lecture One : BioArray Informatics Introduction • Lecture Two : BioArray Technology • Lecture Three : Analysis Technology (1)—Data Normalisation and Transformation • Lecture Four : Analysis Technology (2)--Clustering and Classification • Lecture Five : Analysis Technology (3)– Multivariate Statistics • Lecture Six : Analysis Applications (1)—Gene Expression Analysis • Lecture Seven: Analysis Application (2)—Integrative Analysis of BioArray Data

  4. BioArray Informatics: Integrative Analysis of BioArray Data within the Biological Context secondary structure tertiary structure polymorphism patient records epidemiology expression patterns physiology sequences alignments receptors signals pathways ATGCAAGTCCCT AAGATTGCATAA GCTCGCTCAGTT linkage maps cytogenetic maps physical maps

  5. “-OMICS WORLD” Gene Profile Time Time Protein Profile Time Time Time Metabolic Profile Functional -Omics Analysis “REAL WORLD” “INPUTS” NOXIOUS AGENT/STRESSOR “OUTPUTS” “BIOLOGICAL END-POINTS” PATHOLOGY ALTERED PHYSIOLOGY AND METABOLISM

  6. Metabolites RNA A Dynamics in BioArray Informatics Interactions Environment DNA Protein Growth rate Expression

  7. forwards-propagated correlations metabolites protein mRNA time event A mathematical model

  8. Gene 1 Receptor Ligand 2 3 9 8 4,5,6 Protein 7 BioArray Provides the Means for Revealing the Interaction Relations 1- gene homologs 2- gene encodes a protein 3- protein can regulate the expression of a gene 4- protein phosphorylates another protein 5- protein binds to another protein 6- protein lyses another protein 7- Proteins can sometimes be receptors 8- Receptors bind a ligand 9- Receptors (if bound) activate other proteins

  9. ORF • Averaged PM-MM • “presence” • feature statistics • 25-mers Affymetrix2 25-bp hybridization PM MM BioArray: Quantitative Measurement of Biological Concepts Microarrays1 ~1000 bp hybridization experiment ORF • R/G ratios • R, G values • quality indicators control

  10. Quantitative Analysis Reproducibility confidence intervals to find significant deviations

  11. BioArray Informatics: BioArray is the data, everything else is Informatics • Data Engineering • Data Warehousing • Data Integration • Data Analysis • Knowledge Discovery • Discovery Integration • Discovery Validation • Knowledge Integration • Knowledge Warehousing

  12. KEGG Sample & Clinical Data BioArray Data Unigene Genbank Experimental/Sample Database Expression Database Function Annotations Structure Annotations Data Warehousing Data Sources External Data Sources Operational Data Sources Data Warehousing:

  13. Example - ArrayExpress

  14. ExPASy SwissProt PDB ExPASy Enzyme LocusLink MGD SPAD NCBI dbSNP UniGene Data Schema in Warehousing :A Gene Expression Example Gene Expression Warehouse OMIM Enzyme Disease Protein Affy Fragment Known Gene Sequence Pathway SNP Metabolite Sequence Cluster Genbank KEGG NMR

  15. GXDW A Workflow of Gene Expression Database Data Reduction Queries Warehousing Output Comparisons Profile Report between 2 samples Set Fold Change Comparisons (e.g., > 2X) between multiple Data in User defined samples analysis dataset Set higher avg difference value (e.g., >200) Visualisation A->P/ P->A stringency (e.g., 80%) Advanced Gene Expression Analysis

  16. Queries, Queries….. • Query to the data • Which genes are linked ? • Which genes are expressed similarly to my gene XYZ? • Which genes are co-expressed in differing conditions ? • classification (of tumors, diseased tissues etc.): which patterns are characteristic for a certain class of samples, which genes are involved? • functional classification of genes: Are changes clustered in particular classes? • metabolic pathway information: Is a certain pathway/route in a pathway affected? • disease information & clinical follow up: correlation to expression patterns. • phenotype information for mutants: Are there correlations between particular phenotypes and expression patterns?

  17. Gene Expression Data Analysis Work Flow Data in Knowledge Deliverables Interactive Analysis Procedures analysis Cluster by genes Study outliers Correlate clinical measurements Literature analysis Time course analysis Defined subsets of genes Classic drug targets [Examples, not Known disease association exhaustive] Cross species indices

  18. (Un)fortunately, Scientists never think linearly • Why those genes are co-expressed? • What do their protein products do? • What is the common regulatory motifs of a co-expressed gene set? • Can we patent them? • Do we know which metabolic pathway they are in? If there is no, can I synthesis one? • Are there HTS results for any proteins in the pathway? • Are there any compounds in the HTS library that hit selectively and consistently against those proteins? • Which ones have good activity, availability and toxicity?

  19. Discovery Annotation and Validation E.X. Annotating a set of co-expressed genes with some conserved regulatory motifs E.X. Scoring a co-expression pattern with pathways E.X. Literature analysis to annotate biological semantics Integrative Analysis E.X. Multi-modality Analysis E.X. Cross Annotation of Discovered Patterns Modelling and Simulation E.X. Pathway Synthesis E.X. Virtual Cell Modelling Advanced Analysis

  20. P1 Pathway Scoring

  21. GPE-Score(Pathway) Analysis of Gene Expression Data with Pathway Scores Our Approach

More Related