590 likes | 775 Views
Functional Genomics I - Microarrays. Bioinformatics Dr. Víctor Treviño vtrevino@itesm.mx A7-421. Transcriptomics Proteomics Metabolomics Genomics SNP (Single Nucleotide Polymorphisms ) CNV ( Copy Number Variation , CGH) Epigenomics. Functional Genomics Technologies.
E N D
Functional Genomics I - Microarrays BioinformaticsDr. Víctor Treviñovtrevino@itesm.mxA7-421
Transcriptomics • Proteomics • Metabolomics • Genomics • SNP (Single NucleotidePolymorphisms) • CNV (CopyNumberVariation, CGH) • Epigenomics FunctionalGenomics Technologies
Technology that provides measurments of thousands of molecules in the same experiment and reasonable prices and precision Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick) Microarrays
Biological Question Experimental Design Microarray Experiment Image Analysis Background Pre-processing Transformation Normalization Sumarization Differential Expression … Clustering Prediction Biology: Verification and Interpretation
Microarrays Google Images
Gene Expression Gene Expression Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed)
RWPE-1 DU-145 PC-3 100 bp ladder - + - + - + 200bp 100bp 107 copies 106 copies 10 copies 105 copies 103 copies 104 copies 102 copies http://www.bio168.com/mag/1B8B368B092A/20-3.jpg Measuring Gene Expression mRNA, Gene X PCR QPCR
Microarray - Hibridisation Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
DNA Microarray Technology http://www.well.ox.ac.uk/genomics/facilitites/Microarray/Welcome.shtml
Microarrays Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
http://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.htmlhttp://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.html http://www.nrc-cnrc.gc.ca/multimedia/picture/life/nrc-bri_micro-array_e.html www.niaid.nih.gov/dir/services/rtb/microarray/overview.asp http://learn.genetics.utah.edu/units/biotech/microarray/genechip.jpg Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003 http://metherall.genetics.utah.edu/Protocols/Microarray-Spotting.html
Microarray Technologies two-dyes Affymetrix Images – 1 dye
Microarray Quality Affymetrix Inkjet arrays Spotted Arrays Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Microarrays Dr. Hugo Barrera Microarrays Course EMBO-INER 2005, Mexico City
PROCESS TWO-DYES PRODUCT ONE-DYE REFERENCE TEST Healty/Control Disease/Treatement Sample mRNA Extraction (and amplification) mRNA/cDNA Labelling Labeled mRNA Hybridization Microarray Scanning Digital Image Image Analysis & Data Processing Gene: A 1 B 1 C 1 D 0 Gene: E 4 F 1 G 1 H 2 Gene: I 2 J 0 K 5 L 2 Gene: A 1-1 B 1-0 C 3-3 D 0-3 Gene: E 3-0 F 0-1 G 1-1 H 2-0 Gene: I 2-2 J 0-0 K 3-0 L 2-1 Data Statistical Analysis Gene D 0.001 Gene E 0.005 Gene K 0.001 Gene J 0.003 Selected Genes Gene D 0.001 Gene E 0.005 Gene K 0.001 TEST
Microarray Scanning Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
5m Laser 10m Laser Microarray – Laser and the Scanned Image Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico City Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Microarray - Pre-Processing Purpose Input: Scanned Image File Image Analysis Background Pre-processing Transformation Normalization Sumarization Output: Data File(unique "global relative" measure of expression for every gene withminimal experimental error)
Microarray Image Analysis TECHNOLOGIES ysectors (~=3) mprobsets (~100) x sectors (~=3) n probsets (~100) Target (cDNA, PCR products, etc.) Oligos ~20 40nt Probes DNA Probeset Usually 3 Copies per gene Usually 1 Organization Sectors (print-tip) n x m probsets i x j spots (18x20) Sectors Empty spots landing lights perfect match probes (pm)mismatch probes (mm) Controls
Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene = 14,400,00 values 10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene = 2,400,00 values Image Analysis Pre-processing only 10,000 values only 10,000 values
Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Addressing Done by GeneChip Affymetrix software
Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Addressing (by grid, GenePix)
Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Segmentation Irregular feature shape Circular feature Finally compute Average
Background Reduction Extraction: DeterminingBackground
Image Analysis Affymetrix Results .cel file "results" for one array (raw - no background reduced) 10,000 genes ~ 400,000 values 2-Color Results (GenePix) .gpr file "results" for one array 10,000 genes ~ 30,000 values (.gal files 1 file for a "list" of array)
Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 Image Analysis Segmentation (Spot detection) Background Estimation Value Value = Spot Intensity – Spot Background
Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 R=Sample 1 G=Sample 1 Log2 Data Transformation – two dyes Log2 R=Sample 1 G=Sample 1
R G Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 R=Sample 1 MA-Plot G=Sample 1 M 1 value? A Data Transformation – two dyes (log2 scale)
Normalization – 2 dyes "With-in"(2 color technologies) (assumption: Majority No change) M A
Normalization – 2 dyes "With-in" (2 color technologies) (assumption: Majority No change) Before After
Normalization – 2 dyes "With-in" Spatial(2 color technologies) Aftter loessby Sector (print-tip)Normalization Aftter loessGlobal Normalization Before Normalization
Log2 Data Transformation – one dye Sample 1 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298
Normalization – 1 or 2 dyes Between-slides After normalization Before normalization quantile MAD (median absolute deviation) scale qspline invariantset loess
Summarization – Affymetrix Oligonucleotide dependent technologies PM MM Sumarization = "Average"(Intensities) • Usual Methods: • tukey-biweight • av-diff • median-polish The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc) Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning) Microarrays – Filtering / Treating Undefined Values
More than 10,000 genes • Too many data increases Computation Time and analysis complexity • Remove • Genes that do not change significantly • Undefined Genes • Low expression • Keeping • Large signal to noise ratio • Large statistical significance • Large variability • Large expression Microarray – Data Filtering
b) Image Analysis and Background Subtraction Affymetrix IntensityValue Image Scanning Spot Detection Microarray Two-dyes c) Transformation d) Normalization M=log2(R/G) Within Between A=log2(R*G)/2 Microarray Pre-Processing Summary a) Data Processing Image Analysis` Background Detection & Subtraction Background Subtraction Transformation Normalization Summarization Filtering
Microarray Applications Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Microarray Data Matrix Class A Samples Class B Samples …. Gene 1 Gene 2 Gene 3 . . . . Gene N ….
Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection … Microarrays – What can be done with data?
Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Differential Expression Samples A Samples B Samples A Samples B µ=d µ=d Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Expression Level Positive Negative Gene Selection Differential Expression p-value FDR q-Value
Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Biomarker Detection Samples Class A Samples Class B Samples Class A Samples Class B µ=d µ=d Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Expression Level Positive Negative Gene Selection Biomarker Discovery
1 2 3 4 5 6 7 8 9 Unsupervised Classification Unsupervised Sample Classification Expression Low High Co-Expressed Genes B A C G B H E D I K M L Samples a b
Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Genes Associated to Survival Times and Risk Kaplan-Meier Plot Kaplan-Meier Plot 1.0 1.0 + + + + + + + + + + + + + + + + + + + + + + + + Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Hazard Hazard + + + + + + 0.0 0.0 0.0 Time Time 0.0 Positive Negative Gene Selection Survival Times
Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Regression: Gene Association to outcome Dependent Variable Dependent Variable Slope ≠ 0 Slope = 0 Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Gene Expression Gene Expression Negative Positive Gene Selection Regression
Unmethylated Fraction Hypermethylated Fraction Sample Sample Control Control M M M M M M M M Cleavage with TasI Csp6I Cleavage with methylation-sensitive restriction enzyme M M M M M M M M CpG specific Adaptor Ligation Adaptor Ligation M M M M M M M M Cleavage with methylation-sensitive restriction enzyme CpG specific cleavage with McrBC M M M Adaptor-specific amplification Adaptor-specific amplification X X Unmethylated fraction Hypermetylation fraction Cy5 (red) Cy3 (green) Cy5 (red) Cy3 (green) Microarray Microarray CpG Methylation
Labelling Hybridisation Detection SNP1 SNP2 SNP3 a SNP1 SNP2 SNP3 PCR products + DNA polymerase Labelled ddNTPs GC TA TA CG GC GC … + Extension (1nt) A G T C AA CG CC … 5' 5' 5' 5' 5' 5' b ddNTPs (one labelled) Transcribed RNA + reverse transcriptase SNP1 SNP2 SNP3 … 5' 5' + Extension C^A A^C TA GC CG GC GC 5' A G T C 5' AA CG CC … 5' 5' 5' 5' 5' 5' c Products of 1nt primer extension (in solution) G G 5' T T G C SNP1 SNP2 SNP3 SNP3 5' SNP1 T G … Capture 5' AA CG CC … G SNP2 3' 3' 3' 3' 3' 3' 5' C
Fusion of Tag sequence into TF gene Transcription Factor Tag Incubation DNA-Tagged TF Antibody against tag peptide Precipitation of Antibody-TF-DNA complex Labelling of precipitated DNA Microarray Hybridisation Chromatin Immuno-Precipitation (ChIP-on-Chip)