600 likes | 614 Views
BIONF/BENG 203: Functional Genomics. Lecture TI 1,2 Trey Ideker UCSD Departments of Medicine & Bioengineering. Sources of Functional Data Lectures 1 and 2. 1. Instructors. Trey Ideker Vineet Bafna Anand Patel (TA). Grading. 40% Problem Sets (best 4 of 5) 30% Midterm 30% Final Project.
E N D
BIONF/BENG 203:Functional Genomics • Lecture TI 1,2 • Trey Ideker • UCSD Departments of Medicine & Bioengineering Sources of Functional DataLectures 1 and 2 1
Instructors • Trey Ideker • Vineet Bafna • Anand Patel (TA)
Grading • 40% Problem Sets (best 4 of 5) • 30% Midterm • 30% Final Project 3
Topics Covered By This Course • Signal detection in bioinformatics • Large-scale data generation platforms • Understanding next-gen sequencing data • Understanding mass spectrometry data • Clustering and Classification • Genotype-phenotype association • Understanding physical & genetic networks • Gene network inference and evolution
Bioinformatics as Signal Detection Ideker, Dutkowski, Hood. Cell 2011
Power, FDR, and all that... Test Statistic t Ideker, Dutkowski, Hood. Cell 2011
Power, FDR, and all that... Test Statistic t
An Example:Pathway-Level Integration of Genome-wide Association Studies Segrè et al., 2010 A.V. Segrè, L. Groop, V.K. Mootha, M.J. Daly and D. Altshuler, PLoS Genet. 6 (2010), p. e1001058.
Classes of biological measurements 1) Molecular States 2) Molecular Networks • Protein-protein interactions: • Two-hybrid system, coIP, protein antibody array • Protein-DNA interactions: • Chromatin IP (chip) sequencing • Protein-compound • DNA sequence / genotype:Next-gen sequencing, SNP & CNV arrays • Gene expression: • DNA microarrays, mRNA sequencing • Protein levels, locations, mods: • Mass spectrometry, fluorescence microscopy, protein arrays 3) Phenotypic traits • Physiological or disease state, binary or quantitative • Growth rate, response to stimulus or stress • Behaviors
Pyrosequencing Note: No actual houses are burned down in pyrosequencing
Pyrosequencing(Life Sciences / Roche 454) • A luciferase is an enzyme which emits light in the presence of ATP. Several organisms, such as the American firefly and the poisonous Jack-o-lantern mushroom, produce luciferases.
Detecting polymerase activity • Recall: Pyrophosphate is also known as PPi, also known as “two phosphate groups stuck together”. During replication, each addition of a dNTP releases pyrophosphate • In the reaction mixture, PPi allows adenosine phosphosulfate (APS) to be converted to ATP; this ATP allows luciferase to luciferate (emit light). • Measures strand extension as it happens
Pyrosequencing cycle • Add dATP. If light is emitted, your sequence starts with A. If not, the dATP is degraded (or elutes past immobilized primer). • Add dGTP. If light is emitted, the next base must be a G. • Then add T, then C. You now know at least one (maybe more) base of the sequence. • Repeat!
Pyrosequencing output Runs of bases produce higher peaks – for instance, the sequence for (a) is GGCCCTTG. Sample (c) comes from a heterozygous individual (hence the heights in multiples of ½)
The X Prize Foundation In October 2006, the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies, called the Archon X Prize, intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome.” http://genomics.xprize.org/
Gene and Protein Expression • The transcriptome is the full complement of RNA molecules produced by a genome • The proteome is the full complement of proteins enabled by the transcriptome • DNA RNA protein • Genome transcriptome proteome • 30,000 genes ??? RNAs ??? proteins? • For example, the drosophila gene Dscam can generate 40,000 distinct transcripts through alternative splicing. • What is the minimum number of exons that would be required? 26
mRNA Expression: Two dominant approaches RNA sequencing DNA Microarrays Others / older approaches: • EST sequencing • RT-PCR • Differential display • SAGE • Massively parallel signature sequencing (MPSS) 27
Microarrays • Monitors the level of each gene: • Is it turned on or off in a particular biological condition? • Is this on/off state different between two biological conditions? • Microarray is a rectangular grid of spots printed on a glass microscope slide, where each spot contains DNA for a different gene 28
Two-color DNA microarray design Reverse Transcription 29
Types of microarrays • Spotted (cDNA) • Robotic transfer of cDNA clones or PCR products • Spotting on nylon membranes or glass slides coated with poly-lysine • Synthetic (oligo) • Direct oligo synthesis on solid microarray substrate • Uses photolithography (Affymetrix) or ink-jet printing (Agilent) • 100,000 features per cm2 • All configurations assume the DNA on the array is in excess of the hybridized sample—thus the kinetics are linear and the spot intensity reflects that amount of hybridized sample. • Labeling can be radioactive, fluorescent (one-color), or two-color 30
Microarrayconfocal scanner • Collects sharply defined optical sections from which 3D renderings can be created • The key is spatial filtering to eliminate out-of-focus light or glare in specimens whose thickness exceeds the immediate plane of focus. • Two lasers for excitation • Two color scan in less than 10 minutes • High resolution, 10 micron pixel size
Next-Gen Sequencing of mRNAs cDNA = complementary or copy DNA EST = Expressed Sequence Tag • The microarray could be described as a “closed system” because information about RNAs is limited by the targets available for hybridization. RNAs not represented on the array are not interrogated. • Direct sequencing of cDNAs overcomes this problem by large-scale random sampling of sequences from a whole-cell RNA extract • Statistical counting of distinct sequences provides a precise estimate of expression level • cDNA library can be normalized to capture rare messages • Has been dramatically enabled by large scale sequencing
mRNA Sequencing:Preparation of a cDNA library in phage λ vector
Proteomics • MS / MS1D and 2D SDS PAGE 36
Mass spectrometry Mass spectrometers consist of 3 essential parts • Ionization source: Converts peptides into gas-phase ions (MALDI + ESI) • Mass analyzer: Separates ions by mass to charge (m/z) ratio (Ion trap, time of flight, quadrupole) • Ion detector: Current over time indicates amount of signal at each m/z value 37
A raw fragmentation spectrum By calculating the molecular weight difference between ions of the same type the sequence can be determined. Algorithms like SEQUEST use the fragmentation pattern to search through a complete protein database to identify the sequence which best fits the pattern.
X X X X X X X X Isotope Coded Affinity Tags (ICAT) Mass spec based method for measuring relative protein abundances between two samples Heavy reagent: d8-ICAT(X=deuterium) Normal reagent: d0-ICAT (X=hydrogen) ICATReagents: O N N O O O I N O N O S Thiol specific reactive group Biotin tag Linker (d0 or d8)
Protein Quantification & Identification via ICAT Strategy 100 Mixture 1 Light Heavy 0 550 560 570 580 m/z ICAT-labeled cysteines Quantitation 100 NH2-EACDPLR-COOH Combine and proteolyze (trypsin) Affinity separation (avidin) Mixture 2 0 200 400 600 800 m/z ICAT Flash animation: http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/method/ICAT/ICAT.html Protein identification
ICAT continued • The heavy (blue) and light (gray) peptides are separated and quantified to produce a ratio for each peptide – here, a single peptide ratio is shown • Each peptide is subjected to CID fragmentation in the second MS stage in order to identify it
Gene replacement for yeast & other model species Using HR-based gene replacement, genes can be replaced with drug resistance cassette, tagged with GFP, epitope tagged, etc.
Systematic phenotyping yfg1Δ yfg2Δ yfg3Δ Barcode (UPTAG): CTAACTC TCGCGCA TCATAAT … Deletion Strain: Growth 6hrs in minimal media (how many doublings?) Rich media Harvest and label genomic DNA
Systematic phenotyping with a barcode arrayRon Davis and friends… • These oligo barcodes are also spotted on a DNA microarray • Growth time in minimal media: • Red: 0 hours • Green: 6 hours
YFP tagging for protein localization YPF is green, transmitted light is red NIC96 Nuclear Pore TUB1 Tubulin cytoskeleton HHF2 Histone Nucleus BNI4 Bud neck Images courtesy T. Davis lab See also work byWeissman and O’Shea labs at UCSF