180 likes | 285 Views
Introduction to DNA Microarrays. Todd Lowe BME 88a March 11, 2003. Topics. Goal – study many genes at once Major types of DNA microarray How to roll your own Designing the right experiment Many pretty spots – Now what? Interpreting the data. The Goal. “Big Picture” biology –
E N D
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003
Topics • Goal – study many genes at once • Major types of DNA microarray • How to roll your own • Designing the right experiment • Many pretty spots – Now what? • Interpreting the data
The Goal “Big Picture” biology – • What are all the components & processes taking place in a cell? • How do these components & processes interact to sustain life? One approach: What happens to the entire cell when one particular gene/process is perturbed?
Genome Sequence Flood • Typical results from initial analysis of a new genome by the best computational methods: For 1/3 of the genes we have a “good”idea what they are doing (high similarity to exp. studied genes) For 1/3 of the genes, we have a guess at what they are doing (some similarity to previously seen genes) For 1/3 of genes, we have no idea what they are doing (no similarity to studied genes)
Large Scale Approaches • Geneticists used to study only one (or a few) genes at a time • Now, thousands of identified genes to assign biological function to • Microarrays allow massively parallel measurements in one experiment (3 orders of magnitude or greater)
Several types of arrays • Spotted DNA arrays • Developed by Pat Brown’s lab at Stanford • PCR products of full-length genes (>100nt) • Affymetrix gene chips • Photolithography technology from computer industry allows building many 25-mers • Ink-jet microarrays from Agilent • 25-60-mers “printed directly on glass slides • Flexible, rapid, but expensive
Basis: The Southern Blot Basic DNA detection technique that has been used for over 30 years, known as Southern blots: • A “known” strand of DNA is deposited on a solid support (i.e. nitocellulose paper) • An “unknown” mixed bag of DNA is labelled (radioactive or flourescent) • “Unknown” DNA solution allowed to mix with known DNA (attached to nitro paper), then excess solution washed off • If a copy of “known” DNA occurs in “unknown” sample, it will stick (hybridize), and labeled DNA will be detected on photographic film
Massive Increase in Measurements • Most commonly, 5-50 samples can be tested in each traditional Southern experiment • Affymetrix chips have >250,000 oligos per chip (multiple oligos per gene) • Microarray “spotters” are high-precision robots with metal pins that dip into DNA solution & tap down on glass slide (pins work like a fountain pen) • Currently, ~48,000 different DNA spots can fit on one glass microscope slide
Spotted Arrays relative cheap to make (~$10 slide) flexible - spot anything you want Cheap so can repeat experiments many times highly variable spot deposition usually have to make your own Accuracy at extremes in range may be less Affy Gene Chips expensive ($500 or more) limited types avail, no chance of specialized chips fewer repeated experiments usually more uniform DNA feaures Can buy off the shelf Dynamic range may be slightly better Pros/Cons of Different Technologies
Types of Array Exp • mRNA transcription analysis • Single experiment (control v. experimental) • Time course (multiple samples in same exp) • Genomic DNA -- similarity of genomes • Genetic Footprinting • Species cross hybridization (existence of a specific pathway in a related species)
Image Analysis & Data Visualization Cy5 Cy3 Cy5 Cy3 log2 Cy3 Cy5 Experiments 8 4 2 fold 2 4 8 Underexpressed Overexpressed Genes
What do we want to know? • Genes involved in a specific biological process (i.e. heat shock) • “Guilt by association” - assumption that genes with same pattern of changes in expression are involved the same pathway • Tumor classification - predict outcome / prescribe appropriate treatment based on clustering with “known outcome” tumors
Developing New Methods • How do you know when your method performs better than a previous method? • A “gold standard” test set for benchmarking array data doesn’t exist • There is too much biology we don’t know: if a new method classifies a gene in the “wrong” gene group, is it recognizing new biology, or just getting it wrong??
Limitations of Arrays • Do not necessarily reflect true levels of proteins - protein levels are regulated by translation initiation & degradation as well • Generally, do not “prove” new biology - simply suggest genes involved in a process, a hypothesis that will require traditional experimental verification • Expensive! $20-$100K to make your own / buy enough to get publishable data
Array + Sequence Analysis Promoter motif extraction (Church/ • Cluster / classify genes with common response pattern • Align upstream promoter regions (Gibb’s sampler) or count over-represented X-mers • Develop profile / motif from set & search genome for new candidates w/ motif • Return to array data, look for supporting evidence for new members • Carry out experiment to support hypothesis