960 likes | 978 Views
Bioinformatics. Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006. http://www.esat.kuleuven.ac.be/~kmarchal/ Course material: course notes + powerpoint files Exercises. Overview. MICROARRAY PREPROCESSING Gene expression Omics era Transcript profiling
E N D
Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006
http://www.esat.kuleuven.ac.be/~kmarchal/ • Course material: course notes + powerpoint files • Exercises
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Experiment design • Preprocessing • Exercises
+1 protein Gene expression DNA transcription mRNA translation protein
Bacterial cell out in FNR box cytN cytO cytQ cytP Gene expression Adaptation of cell to its environment Signal 1 Signal 2 • Adaptation of a cell: • response on environmental signals • response to e.g. hormones (cell differentiation) ? ? Cellular response determined by the genes which are switched on upon a signal
Gene expression Action of genetic networks underlie the observed phenotypical behavior
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Experiment design • Preprocessing • Exercises
Structural Genomics Comparative Genomics Functional genomics
Omics era Traditional molecular biology • Directed toward understanding the role of a particular gene or protein in a molecular biological process • Northern analysis • Mutational analysis • Expression by reporter fusions Omics era Measurement of the expression of 1000 of genes, proteins simultaneously • The function or the expression of a gene in a global context of the cell • Holistic approaches allow better understanding of fundamental molecular biological processes Because a gene does not act on its own, it is always embedded in a larger network (systems biology)
Reference sample Test sample RNA RNA Reference Test Omics era cDNA cDNA Detection transcriptomics
Omics era proteomics
Omics era metabolomics
Omics era Consider the cell as a system SYSTEMS BIOLOGY
Omics era SYSTEMS BIOLOGY Mechanistic insight in the biological system at molecular biological level High throughput data
Omics era • analysis of such large scale data is no longer trivial => computational challenges • Low signal/ noise • High dimensionality • Simple spreadsheet analysis such as excel are no longer sufficient • More advanced datamining procedures become necessary • Another urgent problem is also how to store and organize all the information. Bioinformatics
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Principle of microarray • Applications • Experiment design • Preprocessing • Exercises
Reference sample Test sample RNA RNA Reference Test Transcript profiling transcriptomics cDNA cDNA Detection
Transcript profiling • Previously: measure expression level of one gene: Northern blot analysis • Novel techniques: measure expression level of all genes simultaneously => EXPRESSION PROFILING Principle: hybridisation mRNA: 5’ –UGACCUGACG- 3’ cDNA 3’ -ACTGGACTGC-5’ Hybridize : stick together
allows to gain a general insight in the global cell behavior (holistic) Transcript profiling • Monitor molecular activities on a global level • protein levels proteomics, • enzyme activities • Metabolites • gene expression (mRNA), transcriptomics = transcript profiling Molecular biological methods • RT-PCR • SAGE • Protein arrays • Microarray analysis
+1 Transcript profiling cDNA array Gene (DNA) Transcript (mRNA) cDNA Spotted cDNA Glass side Upscaled Northern hybridisation
Transcript profiling • Preparation of probes • Collect cDNA clones • Amplify target cDNA insert by PCR • Check yield & specificity by electrophoresis • Spot + PCR products on glass slides
Reference sample Test sample RNA RNA Reference Test Transcript profiling cDNA cDNA Detection
Transcript profiling Signal 1 Signal 2 2. mRNA isolation 1. Cell culture 3. labeling numerical value 4. Hybridization + washing 5. scanning 6. Image analysis
Transcript profiling http://www.bio.davidson.edu/courses/genomics/chip/chip.html
Transcript profiling Superimposed color image * Transform into color images * Superimpose color images from R and G channel good alignment bad alignment
Transcript profiling Superimposed color image black spots : gene was neither expressed in test nor in control sample green : gene was only expressed in control sample red : gene was only expressed in test sample yellow : gene was expressed both in test and in control sample
Transcript profiling Signal intensity is proportional with the amount of cDNA present in the sample signal cy3 -> numerical value signal cy5 -> numerical value Image analysis Data analysis
Transcript profiling Data representation Gene profile Experiment profile
Transcript profiling Spotted DNA microarray High density oligonucleotide array
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Experiment design • Preprocessing • Exercises
Experiment Design • Depending on experimental design other mathematical approach • Comparison of 2 samples (black/white) • Comparison of multiple arrays • Global dynamic profiling • Static experiment: Comparison of samples (mutants, patients)
Experiment Design 2 sample design Control sample Induced sample Statistical testing Retrieve statistically over or under expressed genes Type1: Comparison of 2 samples
Experiment Design • black/white experiment description (array V mice genes) • Condition 1 : pygmee mouse 10 days old (test) • Condition 2 : normal mouse 10 days old (ref) detect differentially expressed genes Experiment design (Latin Square) Array 1 Per gene, per condition 4 measurements available Array 2
Experiment Design • Measure expression of all genes • During time (dynamic profile) • In different conditions Multiple array design Clustering Identify coexpressed genes Motif Finding Identify mechanism of coregulation
Experiment Design Multiple array design • Study of Mitotic cell cycle of Saccharomyces cerevisiae with oligonucleotide arrays (Cho et al.1999) - 15 time points (E=18) • time points 90 & 100 min deleted (Zhang et al. 1999, Tavazoie et al., 1999) Original dataset : 6178 genes • Preprocessing: • select 4634 most variable (25 % most variable) • variance normalized • adaptive quality based clustering (32 clusters) (95%)
Experiment Design Reference design: e.g. Spellman dataset • Reference: unsynchronized cells • Condition: synchronized cells during cell cycle at distinct time intervals Array 1
Experiment Design Loop design
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Experiment design • Preprocessing • Sources of Variation • General normalization steps • Slide by slide normalization • ANOVA normalization
Preprocessing Sources of variation • Overshine effects • Dye effect • Spot effects • Array effect • Consistent errors • Consistent errors complicate direct comparison of measurements of the same gene/condition • Consistent errors need to be removed by preprocessing/normalization • Tedious • Influences downstream measurements
Preprocessing Signal 1 Signal 2 Dye effect 2. mRNA isolation 1. Cell culture 3. labeling numerical value 4. Hybridization + washing 5. scanning 6. Image analysis
Preprocessing Dye, condition effect:within slide variation Measurement error: • Preparation mRNA • Labeling &reverse transcription Overall signal in one channel more pronounced than in other channel Normalization Global normalization assumption
Preprocessing Signal 1 Signal 2 2. mRNA isolation 1. Cell culture 3. labeling numerical value 4. Hybridization + washing 5. scanning 6. Image analysis Array effect
Preprocessing Array effects: between slide variation Differences in global intensity between slides Hybridization differences Comparison between slides impossible • normalization within slide • ratio
Preprocessing Array effects: Between slide variation
Preprocessing Pin main effects: spot effects Measurement error: Different quantity of DNA in spot Difference in duplicate spots Absolute levels between genes incomparable Spot effect Ratio: compare differential expression between genes Gene 1: test: 4 ref:2 R/G:2 Gene 2: test: 8 ref:4 R/G:2
Preprocessing Overshine effects: within slide variation Non specific signal Cy5 or Cy3 resulting from overshining = emission from neighboring spots Background intensity increases with the intensity of the neighboring spots
Preprocessing • Removing sources of variation is obligatory step • To make comparisons within a slide possible • E.g. find differentially expressed genes • To allow interslide comparisons • E.g. combining the replica’s of the original experiment and the color flip
Overview MICROARRAY PREPROCESSING • Gene expression • Omics era • Transcript profiling • Experiment design • Preprocessing • Sources of Variation • General normalization steps • Slide by slide normalization • ANOVA normalization ANOVA
Preprocessing Array by array approach ANOVA based Background corr Background corr Log transformation Log transformation Filtering Filtering normalization Linearisation Ratio Test statistic (T-test) Bootstrapping