1 / 59

Bioinformatics Dr. Víctor Treviño vtrevino@itesm.mx A7-421

Functional Genomics I - Microarrays. Bioinformatics Dr. Víctor Treviño vtrevino@itesm.mx A7-421. Transcriptomics Proteomics Metabolomics Genomics SNP (Single Nucleotide Polymorphisms ) CNV ( Copy Number Variation , CGH) Epigenomics. Functional Genomics Technologies.

sanam
Download Presentation

Bioinformatics Dr. Víctor Treviño vtrevino@itesm.mx A7-421

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Genomics I - Microarrays BioinformaticsDr. Víctor Treviñovtrevino@itesm.mxA7-421

  2. Transcriptomics • Proteomics • Metabolomics • Genomics • SNP (Single NucleotidePolymorphisms) • CNV (CopyNumberVariation, CGH) • Epigenomics FunctionalGenomics Technologies

  3. Technology that provides measurments of thousands of molecules in the same experiment and reasonable prices and precision Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick) Microarrays

  4. Biological Question Experimental Design Microarray Experiment Image Analysis Background Pre-processing Transformation Normalization Sumarization Differential Expression … Clustering Prediction Biology: Verification and Interpretation

  5. Microarrays Google Images

  6. Gene Expression Gene Expression Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed)

  7. RWPE-1 DU-145 PC-3 100 bp ladder - + - + - + 200bp 100bp 107 copies 106 copies 10 copies 105 copies 103 copies 104 copies 102 copies http://www.bio168.com/mag/1B8B368B092A/20-3.jpg Measuring Gene Expression mRNA, Gene X PCR QPCR

  8. Microarray - Hibridisation Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

  9. DNA Microarray Technology http://www.well.ox.ac.uk/genomics/facilitites/Microarray/Welcome.shtml

  10. Microarrays Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

  11. http://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.htmlhttp://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.html http://www.nrc-cnrc.gc.ca/multimedia/picture/life/nrc-bri_micro-array_e.html www.niaid.nih.gov/dir/services/rtb/microarray/overview.asp http://learn.genetics.utah.edu/units/biotech/microarray/genechip.jpg Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003 http://metherall.genetics.utah.edu/Protocols/Microarray-Spotting.html

  12. Microarrays – Probe Production

  13. Microarray Technologies two-dyes Affymetrix Images – 1 dye

  14. Microarray Quality Affymetrix Inkjet arrays Spotted Arrays Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

  15. Microarrays Dr. Hugo Barrera Microarrays Course EMBO-INER 2005, Mexico City

  16. PROCESS TWO-DYES PRODUCT ONE-DYE REFERENCE TEST Healty/Control Disease/Treatement Sample mRNA Extraction (and amplification) mRNA/cDNA Labelling Labeled mRNA Hybridization Microarray Scanning Digital Image Image Analysis & Data Processing Gene: A 1 B 1 C 1 D 0 Gene: E 4 F 1 G 1 H 2 Gene: I 2 J 0 K 5 L 2 Gene: A 1-1 B 1-0 C 3-3 D 0-3 Gene: E 3-0 F 0-1 G 1-1 H 2-0 Gene: I 2-2 J 0-0 K 3-0 L 2-1 Data Statistical Analysis Gene D 0.001 Gene E 0.005 Gene K 0.001 Gene J 0.003 Selected Genes Gene D 0.001 Gene E 0.005 Gene K 0.001 TEST

  17. Microarray Scanning Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

  18. 5m Laser 10m Laser Microarray – Laser and the Scanned Image Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico City Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003

  19. Microarray - Pre-Processing Purpose Input: Scanned Image File Image Analysis Background Pre-processing Transformation Normalization Sumarization Output: Data File(unique "global relative" measure of expression for every gene withminimal experimental error)

  20. Microarray Image Analysis TECHNOLOGIES ysectors (~=3) mprobsets (~100) x sectors (~=3) n probsets (~100) Target (cDNA, PCR products, etc.) Oligos ~20 40nt Probes DNA Probeset Usually 3 Copies per gene Usually 1 Organization Sectors (print-tip) n x m probsets i x j spots (18x20) Sectors Empty spots landing lights perfect match probes (pm)mismatch probes (mm) Controls

  21. Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene = 14,400,00 values 10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene = 2,400,00 values Image Analysis Pre-processing only 10,000 values only 10,000 values

  22. Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Addressing Done by GeneChip Affymetrix software

  23. Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Addressing (by grid, GenePix)

  24. Image Analysis Addressing:Estimate location of spot centers. Segmentation:Classify pixels as foreground or background. Extraction:For each spot on the array and each dye • foreground intensities • background intensities • quality measures. Segmentation Irregular feature shape Circular feature Finally compute Average

  25. Background Reduction Extraction: DeterminingBackground

  26. Image Analysis Affymetrix Results .cel file "results" for one array (raw - no background reduced) 10,000 genes ~ 400,000 values 2-Color Results (GenePix) .gpr file "results" for one array 10,000 genes ~ 30,000 values (.gal files 1 file for a "list" of array)

  27. Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 Image Analysis Segmentation (Spot detection) Background Estimation Value Value = Spot Intensity – Spot Background

  28. Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 R=Sample 1 G=Sample 1 Log2 Data Transformation – two dyes Log2 R=Sample 1 G=Sample 1

  29. R G Sample 1 Sample 1 98 4209 2 . . 9711 . . 28 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298 R=Sample 1 MA-Plot G=Sample 1 M 1 value? A Data Transformation – two dyes (log2 scale)

  30. Normalization – 2 dyes "With-in"(2 color technologies) (assumption: Majority No change) M A

  31. Normalization – 2 dyes "With-in" (2 color technologies) (assumption: Majority No change) Before After

  32. Normalization – 2 dyes "With-in" Spatial(2 color technologies) Aftter loessby Sector (print-tip)Normalization Aftter loessGlobal Normalization Before Normalization

  33. Log2 Data Transformation – one dye Sample 1 Gene 1 Gene 2 Gene 3 . . Gene k . . Gene N 100 209 -7 . . 9882 . . 2298

  34. Normalization – 1 or 2 dyes Between-slides After normalization Before normalization quantile MAD (median absolute deviation) scale qspline invariantset loess

  35. Summarization – Affymetrix Oligonucleotide dependent technologies PM MM Sumarization = "Average"(Intensities) • Usual Methods: • tukey-biweight • av-diff • median-polish The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.

  36. Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc) Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning) Microarrays – Filtering / Treating Undefined Values

  37. More than 10,000 genes • Too many data increases Computation Time and analysis complexity • Remove • Genes that do not change significantly • Undefined Genes • Low expression • Keeping • Large signal to noise ratio • Large statistical significance • Large variability • Large expression Microarray – Data Filtering

  38. b) Image Analysis and Background Subtraction Affymetrix IntensityValue Image Scanning Spot Detection Microarray Two-dyes c) Transformation d) Normalization M=log2(R/G) Within Between A=log2(R*G)/2 Microarray Pre-Processing Summary a) Data Processing Image Analysis` Background Detection & Subtraction Background Subtraction Transformation Normalization Summarization Filtering

  39. Microarray Repositories

  40. Microarray Applications Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007

  41. Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Microarray Data Matrix Class A Samples Class B Samples …. Gene 1 Gene 2 Gene 3 . . . . Gene N ….

  42. Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection … Microarrays – What can be done with data?

  43. Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Differential Expression Samples A Samples B Samples A Samples B µ=d µ=d Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Expression Level  Positive Negative Gene Selection Differential Expression p-value  FDR  q-Value

  44. Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Biomarker Detection Samples Class A Samples Class B Samples Class A Samples Class B µ=d µ=d Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Expression Level  Positive Negative Gene Selection Biomarker Discovery

  45. 1 2 3 4 5 6 7 8 9 Unsupervised Classification Unsupervised Sample Classification Expression Low High Co-Expressed Genes B A C G B H E D I K M L Samples a b

  46. Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Genes Associated to Survival Times and Risk Kaplan-Meier Plot Kaplan-Meier Plot 1.0 1.0 + + + + + + + + + + + + + + + + + + + + + + + + Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Hazard Hazard + + + + + + 0.0 0.0 0.0 Time  Time  0.0 Positive Negative Gene Selection Survival Times

  47. Class A Samples Class B Samples Gene 1 Gene 2 Gene 3 . . . . Gene N Regression: Gene Association to outcome Dependent Variable  Dependent Variable  Slope ≠ 0 Slope = 0 Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … Gene Expression  Gene Expression  Negative Positive Gene Selection Regression

  48. Unmethylated Fraction Hypermethylated Fraction Sample Sample Control Control M M M M M M M M Cleavage with TasI Csp6I Cleavage with methylation-sensitive restriction enzyme M M M M M M M M CpG specific Adaptor Ligation Adaptor Ligation M M M M M M M M Cleavage with methylation-sensitive restriction enzyme CpG specific cleavage with McrBC M M M Adaptor-specific amplification Adaptor-specific amplification X X Unmethylated fraction Hypermetylation fraction Cy5 (red) Cy3 (green) Cy5 (red) Cy3 (green) Microarray Microarray CpG Methylation

  49. Labelling Hybridisation Detection SNP1 SNP2 SNP3 a SNP1 SNP2 SNP3 PCR products + DNA polymerase Labelled ddNTPs GC TA TA CG GC GC … + Extension (1nt) A G T C AA CG CC … 5' 5' 5' 5' 5' 5' b ddNTPs (one labelled) Transcribed RNA + reverse transcriptase SNP1 SNP2 SNP3 … 5' 5' + Extension C^A A^C TA GC CG GC GC 5' A G T C 5' AA CG CC … 5' 5' 5' 5' 5' 5' c Products of 1nt primer extension (in solution) G G 5' T T G C SNP1 SNP2 SNP3 SNP3 5' SNP1 T G … Capture 5' AA CG CC … G SNP2 3' 3' 3' 3' 3' 3' 5' C

  50. Fusion of Tag sequence into TF gene Transcription Factor Tag Incubation DNA-Tagged TF Antibody against tag peptide Precipitation of Antibody-TF-DNA complex Labelling of precipitated DNA Microarray Hybridisation Chromatin Immuno-Precipitation (ChIP-on-Chip)

More Related