490 likes | 632 Views
Microarrays Wednesday, March 1, 2006 Dr. Tim Hughes CCBR – 160 College St. – Room 1302 t.hughes@utoronto.ca Outline: Microarray experiments Normalization Different types of microarrays Other applications besides expression profiling Clustering and interpretation. Suggested reading
E N D
Microarrays • Wednesday, March 1, 2006 • Dr. Tim Hughes • CCBR – 160 College St. – Room 1302 • t.hughes@utoronto.ca • Outline: • Microarray experiments • Normalization • Different types of microarrays • Other applications besides expression profiling • Clustering and interpretation
Suggested reading Eisen et al., 1998 HARTIGAN, J.A., Clustering Algorithms, Wiley, New York and London (1975). My understanding is that it is no longer in print, but is available on CD. Jain et al., ACM Computing Surveys, 31(3) 1999 “Data Clustering: a review”. (http://www.amk.alt-neustadt.at/diplom/papers/Clustering/p264-jain.pdf) Hegde et al., A concise guide to cDNA microarray analysis. Biotechniques. 2000 Sep;29(3):548-50, 552-4, 556 Sherlock G. Analysis of large-scale gene expression data. Curr Opin Immunol. 2000 Apr;12(2):201-5.
Nucleic Acid Hybridization www.accessexcellence.org/AB/GG/nucleic.html
Microarray expression profiling by 2-color assay (“cDNA arrays”) Array: PCR products 6250 yeast ORFs hybridized cDNAs: green = control red = experiment *Schena et al., 1995
“cDNA microarrays” are essentially dot-blots on glass slides 0.45 mm • This slide was made with 16 pins • 4.5 mm pin spacing matches 384-well plates (16 x 24) • Done with robotics • Slides usually coated with poly-lysine • Spots are usually 100-150 microns • Spot spacing is usually 200-300 microns. • Slides are 25 x 75 mm • Easy to deposit 20K spots/slide http://arrayit.com/Products/Printing/Stealth/stealth.html
Common ways to “label” nucleic acids Random priming of double-stranded DNA: Poly-T primed cDNA synthesis: Direct labelling (fluors only): Amplification: * * Reaction contains labelled nucleotides * AAAAAAAA * AAAAAAAA TTTTTTTTTT-T7 promoter AAAAAAAA “second strand” synthesis AAAAAAAA-T7 promoter TTTTTTTTTT-T7 promoter Reaction contains labelled nucleotides AAAAAAAA T7 reaction contains labelled nucleotides * * * * TTTTTTTTTT * * *
Typical use of cDNA microarrays: “Internal” normalization using two colors x x x y x x y y y z z z x y z treatment (drug, mutation) control cDNA pools up down unchanged not present
532 nm laser (green) excites Cy3 Cy3 detected with an emission filter that passes 557-592 nm 635 nm (red) excites Cy5 Cy5 detected with an emission filter that passes 650-690 nm. Both are detected by a photomultiplier tube. Excitation Emission Cy3 NHS Ester Cy5 NHS Ester http://www.jacksonimmuno.com/2001site/home/catalog/f-cy3-5.htm http://www.ope-tech.com/doc/Cy5_structure.htm
The primary data: two grayscale TIFF files Cy3 channel (“green”) Cy5 channel (“red”) http://www.axon.com/GN_GenePix4000.html
Image processing and normalization: what is microarray data? Microarray data is summary information from image files that come out of the scanner. Image processing: line up grids, flag bad spots, quantitate.
Looking at data from a single experiment 3-AT vs. No drug 2 1 0 -1 -2 log10(ratio) -2 -1 0 1 2 wild-type vs. wild-type 2 1 0 -1 -2 log10(ratio) -2 -1 0 1 2 log10(average intensity)
Lowess smoothing: The names "lowess" and "loess" are derived from the term "locally weighted scatter plot smooth," as both methods use locally weighted linear regression to smooth data. http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/ch_data7.html
Find spots Manual edit Quantitate Normalize (“Lowess smoothing”) (Locally weighted scatterplot smoothing) Confirm spots outside envelope Save data, images, spot map
Selected tricks for processing and normalization • High-pass spatial detrending. See: O. Shai, Q. Morris, and B.J. Frey, (2003) Spatial Bias Removal in Microarray Images, University of Toronto Technical Report PSI-2003-21, http://www.psi.utoronto.ca/~ofer/detrendingReport.pdf • (2) VSN – “Variance Stabilizing Normalization”. See: • Huber, W., Von Heydebreck, A., Sultmann, H., Poustka, A., & Vingron, M. • Variance stabilization applied to microarray data calibration and to the quantification of differential expression. • Bioinformatics 18, Suppl 1, S96-S104 (2002). Q. Morris, B. Frey, O. Shai
Photolithographic arrays (Affymetrix) Building up oligonucleotides on a surface: http://www.affymetrix.com/technology/manufacturing/index.affx
Photolithographic arrays (Affymetrix) Arrays are typically 25-mers, with “mismatch” control for specificity aka “GeneChip”
Photolithographic arrays (Affymetrix) Advantages: Density is limited essentially by the 5 micron resolution of scanners (solution: larger arrays). Well-developed protocols. “Industry standard” (largely self-driven). Disadvantages: Not all probes work well. Affymetrix has evolved a complicated system to compensate for this, but even “believers” use at least four probes per gene, and usually more. Single color. Sample preparation typically requires amplification. Single supplier; historically intellectual property issues. (i.e. comparisons)
Ink-jet arrays (Agilent) G A G T C A C G G G C T G A A • 25,000 oligos / 1 x 3 inches • Sequence completely flexible • 60-mers Hughes TR et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001 Apr;19(4):342-7.
Ink-jet arrays generally agree with spotted cDNA arrays r = 0.97 Yeast IJS array: ~8 oligos per gene Spo vs. SC r = 0.96 multiple oligos single oligo HXT4 HXT3 HXT1 cDNA array cDNA array
Ink-jet arrays (Agilent) Advantages: User-specified sequences; “no questions asked” Sensitivity and specificity are defined and exceed requirement for most expression profiling applications; no amplification required Virtually every 60-mer is functional Data correlates well with spotted cDNA arrays Disadvantages: Density currently limited to ~45,000 spots per array. Single supplier (although a protocol is in press for making your own synthesizer!)
“Maskless” arrays (Nimblegen) http://www.nimblegen.com/technology/manufacture.html
“Maskless” arrays (Nimblegen) Advantages: User-specified sequences. Density is limited essentially by the 5 micron resolution of scanners. Disadvantages: New to arena. Performance in initial publication (Nuwaysir et al., Genome Research, 2002) suggests that sensitivity and specificity may be lower than that of Agilent arrays. Single supplier – although all the parts are there for academics to build one. Possible IP issues. Hybs are done in Iceland to bypass Affy IP. Nimblegen web site boasts of new partnership with Affymetrix.
Applications beyond expression profiling • DNA copy number • Genotyping • Protein-DNA associations • Molecular “Barcoding” • Protein arrays • Transformation arrays
Identifying DNA binding sites Science 2000 Dec 22;290(5500):2306-9 Genome-wide location and function of DNA binding proteins. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA.
Analysis of multiple experiments • Comparisons • Clustering • Predicting gene functions • Finding promoter elements
Comparing data from two experiments scatter plot of ratios (intensity not displayed) r = 0.88 r = 0.09 1.5 1 0.5 0 -0.5 -1.0 -1.5 1.5 1 0.5 0 -0.5 -1.0 -1.5 MRT4 VMA8 log10(ratio), cup5 / WT log10(ratio), cup5 / WT CUP5 CUP5 -1.5 -1 -0.5 0 0.5 1.0 1.5 -1.5 -1 -0.5 0 0.5 1.0 1.5 log10(ratio), vma8 / WT log10(ratio), mrt4 / WT The behavior of two genes over many experiments can be compared in the same fashion
-10 -5 -2 1 2 5 10 fold repression fold induction 2-D clustering Step 1: cut experiments and transcripts falling below P-value and ratio thresholds transcript response index 44 experiments x 407 genes experiment index
ste mutants RHO O/X PKC O/X treatment with alpha-factor Data from Roberts et al., Science (2000) -10 -5 -2 1 2 5 10 fold repression fold induction 2-D clustering Step 2: cluster experiments and transcripts transcript response index experiment index
There are many types of clustering. One example: K-means (must choose K) See: Sherlock G. Analysis of large-scale gene expression data. Curr Opin Immunol. 2000 Apr;12(2):201-5. K = 10 #1 #2 #3
Basics of clustering freeware: Eisen’s “Cluster” and “Treeview” Mike Eisen's web site: rana.lbl.gov/EisenSoftware.htm “Cluster” loads an Excel file (save as tab-delimited text) in the following format: Cluster Treeview (also: “TreeArrange” - http://monod.uwaterloo.ca/downloads/treearrange/) There are also many commercial programs available.
protein mRNA nucleus cell
cis, trans regulators Microarray expression data Co-regulated groups of genes Functional categories Predict functions of new genes
Non-overlapping yeast gene expression clusters Cluster label amino acid metabolism arginine biosynthesis arginine catabolism aromatic AA metabolism asparagine biosynthesis branched chain AA synth lysine biosynthesis methionine biosynthesis sulfur AA tnsprt, metab adenine biosynthesis aldehyde metabolism biotin biosynthesis citrate metabolism ergosterol biosynthesis fatty acid biosynthesis gluconeogenesis NAD biosynthesis one-carbon metabolism pyridoxine metabolism thiamin biosynthesis 1 thiamin biosynthesis 2 hexose transport sodium ion transport polyamine transport nucleocytoplasmic transport ribosome/RNA biogenesis ribosomal proteins translational elongation protein folding secretion protein glycosylation vesicle-mediated transport proteasome vacuole fusion mitoribosome/respiration Mitochond. electron trans. iron transport/TCA cycle Chromatin/transcription histones MCM2/3/6/CDC47 DNA replication mitotic cell cycle CLB1/CLB6/BBP1 cytokinesis development pheromone response conjugation sporulation/meiosis response to oxidative stress stress/heat shock Sample genes TRP4, HIS3 ARG1, ARG3 CAR1, CAR2 ARO9, ARO10 ASN1, ASN2 ILV1,2,3,6 LYS2, LYS9 MET3,16,28 MUP1, MHT1 ADE1,4,8 AAD4,14,16 BIO3,4 CIT1,2 ERG1,5,11 FAS1,FAS2 PGK1, TDH1,2,3 BNA4,6 GCV1,2,3 SNO1, SNZ1 THI5,12 THI2,20 HXT4,GSY1 ENA1,2,5 TPO2,3 KAP123,NUP100 MAK16,CBF5 RPS1A,RPL28 TEF1,2 SSA1,HSP60 VTH1,KRE11 ALG6,CAX4 VPS5,IMH1 RPN6,RPT5 VTC1,3,4,PHO84 MRPL1,MRPS5 ATP1,COX4 FRE1,FET3 SNF2,CHD1,DOT6 HTA1,HHF1 MCM2,3,6 RFA1,POL12 SPC110,CIN8 CLB1,6 CTS1,EGT2 PAM1,GIC2 FUS3,FAR1 CIK1,KAR3 SPO11,SPO19 GDH3,HYR1 HSP104,SSA4 Candidate regulator GCN4 ARG80/81 ARG80/81/UME6/RPD3 ARO80 GCN4/HAP1/HAP2 LEU3, GCN4 LYS14 CBF1, MET28, MET32 MET31,MET32 BAS1, BAS2, GCN4 RTG3 ECM22/UPC2 INO4 GCR1 THI2/THI3 THI2/THI3 GCR1 NRG1,MIG1 HAA1 RRPE-binding factor PAC/RRPE-binding factors HAC1,ROX1 RLM1 XBP1 RPN4 PHO4 HAP2/3/4/5 MAC1/RCS1/AFT1/PDR1/3 HIR1,HIR2 ECB MCB HCM1 FKH1 ACE2,SWI4 MATALPHA2,STE12 KAR4 NDT80 ROX1,MSN2,MSN4 MSN2,MSN4 424 experiments 249 genes 1,226 genes Chua et al., 2004
Analyzing clusters: amino acid biosynthesis (p<10-14) amino acid metabolism (p<10-14) methionine metabolism (p=1.07×10-7) Some web resources for promoter analysis: YRSA (http://forkhead.cgb.ki.se/YRSA/define1.htm) AlignACE (http://atlas.med.harvard.edu/cgi-bin/fullanalysis.pl)
GO-Biological Process categories # annotated genes (mouse) metabolism 1548 Very Broad development 2341 vision 163 Broad CNS development 137 eye morphogenesis 21 ATP biosynthesis 36 Mid-level pigment metabolism 25 striated muscle contraction 33 eye pigment metabolism 3 Narrow 4 insulin secretion
GO-Biological Process hierarchy metabolism development CNS development pigment metabolism eye morphogenesis eye pigment metabolism
Other types of categorical annotations: KEGG, EC numbers (describe biochemical “pathways”) MIPS, YPD (yeast databases – older than GO) Results of individual studies (localization, 2-hybrid screens, protein complexes, etc. Sequence motifs, structural domains (pfam, SMART) Other people’s microarray clusters etc. **When testing clusters against many different types of categorical annotations, should consider correcting for multiple-testing, and also consider that categories are often not independent
protein mRNA nucleus cell
Big questions: To what degree are functional pathways coordinately regulated? What controls the observed regulations?
Exploring mouse gene expression using Ink-jetOligonucleotide Arrays • 22,000 oligos / 1 x 3 inches • Sequence completely flexible • Mouse “42K” array: NCBI • GenomeScan predictions • (“XM”) on mouse draft • sequence G A G T C A C G G G C T G A A • Includes: • 25K with cDNA • (75% of 18K RefSeq genes) • 30K with cDNA or EST • 12K potential new genes **Wen Zhang
Exploring mouse gene expression using Ink-jetOligonucleotide Arrays Collect 55 different mouse tissues from experts: Janet Rossant Jane Aubin Derek van der Kooy Michael Fehlings Benoit Bruneau** Analyze mRNA levels on arrays (1 mg poly-A) **Wen Zhang
Testis Olfactory bulb Brain Eye ES Skel.l Muscle Liver Femur Teeth Placenta Prostate Lymph node Spleen Digit Tongue Trachea Large intestine Colon Testis Olfactory bulb Brain Eye ES Skel. Muscle Liver Femur Teeth Placenta Prostate Lymph node Spleen Digit Tongue Trachea Large intestine Colon Analysis of 55 mouse tissues: QC Unchar. cDNA EST Gene trap Transcription factor RNA binding/RS domain DescriptionAccession Hypothetical protein FLJ20519 Testis nuclear RNA binding ptn (Tenr) DEAD box polypeptide 4 (Ddx4) Deleted in azoospermia-like (Dazl) RIKEN cDNA 1700001N01 LOC235045 Sim. to serine protease inhibitor RIKEN cDNA 1700067I02 LOC245536 (LOC245536), mRNA Hematopoietic cell transcript 1 Chr 7 expressed (D7Wsu180e) Sim. to orphan receptor (LOC215448) Poly(rC) binding ptn. 3 (Pcbp3) Voltage-dep. R-type Ca++ channel a-1E Ataxin 2 binding protein 1 Sim. to HuC Ventral neuron-specific ptn 1 NOVA1 Poly(rC) binding ptn 4 (Pcbp4) LOC217874 LOC239368 Zinc finger protein 97 RIKEN cDNA 2400008B06 Metal-response element tx factor 2 (Mtf2) LOC231661 LOC231903 Related to CG7582 (LOC232810) RIKEN cDNA 1300006E06 Sim. to protease (LOC211700) Hypothetical protein FLJ22774 RIKEN cDNA 5430427O21 Nuclear RNA export factor 6 (Nxf6) Sim. to serine protease inhibitor 14 Sim. to serine protease inhibitor 13 Hypothetical ZNF protein KIAA0961 KIAA0215 gene product LOC227582 Sim. to HMG-BOX tx factor BBX LOC214566 FN5 protein (Fn5) LOC229850 LOC229555 Ribonuclease L (Rnasel) (2-5)oligo(A) synthetase 1A XM_131066.1 XM_124039.1 XM_127536.1 XM_123141.1 XM_125027.1 XM_134745.1 XM_144364.1 XM_132042.1 XM_159329.1 XM_125337.1 XM_124875.1 XM_122095.1 XM_122063.1 XM_123530.1 XM_147994.1 XM_134734.1 XM_138026.1 XM_125213.1 XM_127170.1 XM_139399.1 XM_134010.1 XM_134886.1 XM_132195.1 XM_132381.1 XM_149717.1 XM_133152.1 XM_128315.1 XM_136425.1 XM_139234.1 XM_132158.1 XM_142153.1 XM_147352.1 XM_122538.1 XM_145503.1 XM_135809.1 XM_149095.1 XM_147194.1 XM_150017.1 XM_147333.1 XM_149402.1 XM_130999.1 XM_136286.1 XM_132373.1 GAPDH **Malina Bakowski, Blencowe lab
Are functional pathways coordinately regulated? Compiled annotations from 992 GO “Biological process” categories for 7,779 genes on the array (from EBI and MGI/JAX) (considered only categories with >3 and <500 genes) **GO evidence codes (and manual inspection) indicate that very few annotations are based purely on expression
Polyamine biosynthesis Oxidative phosphorylation Muscle contraction Epidermal differentiation Cell:cell adhesion Regulation of neurotransmitter levels Synaptic transmission Axonogenesis RNA splicing Cytokinesis Microtubule-based movement M phase Gene expression reflects gene function Serine biosynthesis Preganancy Fertilization Bone remodeling Skeletal development Ratio over median <1 3 7 >20 55 mouse tissues/samples