600 likes | 621 Views
BIONF/BENG 203: Functional Genomics. Lecture TI 1 Trey Ideker UCSD Department of Bioengineering. Sources of Functional Data Lectures 1 and 2. Grading. 40% Problem Sets (best 4 of 5) 30% Midterm 30% Final Project. Outline of the course. Biological data sources (2).
E N D
BIONF/BENG 203:Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional DataLectures 1 and 2
Grading • 40% Problem Sets (best 4 of 5) • 30% Midterm • 30% Final Project
Outline of the course Biological data sources (2) Data pre-processing (6) Total of 17 lectures Project Presentations (2)
Functional Genomics Data • Expression mRNA, protein • Molecular interactions Protein, mRNA, small molecules • Knockout phenotypes 1st, 2nd, higher orders • SNP sequence (polymorphism) data • Imaging data Sub-cellular localization Cell morphology • Gene ontology
Dividing the data into two classes of information:Biological Networks and Network States • Directly observe the network “wires” themselves • Protein-protein interactions: • Two-hybrid system, coIP, protein antibody arrays • BIND, DIP • Protein-DNA interactions: • Chromatin IP • BIND, Transfac, SCPD • Other types not yet possible: • e.g., protein-small molecule • Observe molecular states that result from the interaction wiring • DNA/RNA Gene expression: • DNA microarrays, SAGE • Protein levels, locations, and modifications: • Mass spectrometry, fluorescence microscopy, protein arrays • Gross phenotypes: • e.g., growth rates of single and double deletion strains 1) 2)
High-throughput methods for measuring cellular states • Gene expression levels: RT-PCR, arrays • Protein levels, modifications: mass specProtein locations: fluorescent tagging • Metabolite levels: NMR and mass spec • Systematic phenotyping
The transcriptome and proteome • The transcriptome is the full complement of RNA molecules produced by a genome • The proteome is the full complement of proteins enabled by the transcriptome • DNA RNA protein • Genome transcriptome proteome • 30,000 genes ??? RNAs ??? proteins? • For example, the drosophila gene Dscam can generate 40,000 distinct transcripts through alternative splicing. • What is the minimum number of exons that would be required?
Expression: High-throughput approaches RNA • DNA Microarrays • cDNA / EST sequencing • RT-PCR • Differential display • SAGE • Massively parallel signature sequencing (MPSS) Proteins • 2D PAGE • Mass spectrometry
Gene expression arrays They are really, really, really, really, really, really, really, really, really, really, really, really, really important
Microarrays • Monitors the level of each gene: • Is it turned on or off in a particular biological condition? • Is this on/off state different between two biological conditions? • Microarray is a rectangular grid of spots printed on a glass microscope slide, where each spot contains DNA for a different gene
Two-color DNA microarray design Reverse Transcription
Types of microarrays • Spotted (cDNA) • Robotic transfer of cDNA clones or PCR products • Spotting on nylon membranes or glass slides coated with poly-lysine • Synthetic (oligo) • Direct oligo synthesis on solid microarray substrate • Uses photolithography (Affymetrix) or ink-jet printing (Agilent) • All configurations assume the DNA on the array is in excess of the hybridized sample—thus the kinetics are linear and the spot intensity reflects that amount of hybridized sample. • Labeling can be radioactive, fluorescent (one-color), or two-color
Microarrays (continued) Imaging • Radioactive 32P labeling: Autoradiography or phosphorimager • Fluorescent labeling: Confocal microscope (invented by Marvin Minsky!!) Feature density • Nylon membrane macroarrays 100-1000 features • Glass slide spotted array 5,000 features / cm2 • Synthesized arrays 50,000 features / cm2
Microarrayconfocal scanner • Collects sharply defined optical sections from which 3D renderings can be created • The key is spatial filtering to eliminate out-of-focus light or glare in specimens whose thickness exceeds the immediate plane of focus. • Two lasers for excitation • Two color scan in less than 10 minutes • High resolution, 10 micron pixel size
cDNA / EST sequencing projects • cDNA = complementary or copy DNA • EST = Expressed Sequence Tag • The microarray could be described as a “closed system” because information about RNAs is limited by the targets available for hybridization. RNAs not represented on the array are not interrogated. • Direct sequencing of cDNAs (yielding ESTs) overcomes this problem by large-scale random sampling of sequences from a whole-cell RNA extract • Statistical counting of distinct sequences provides an estimate of expression level • Conversely, cDNA library can be normalized to capture rare messages • Requires large scale sequencing to get statistical significance
cDNA / EST Sequencing:Preparation of a cDNA library in phage l vector
SAGE Technology SerialAnalysis ofGeneExpression Takes idea of sequence sampling to the extreme Generates short ESTs (9-14nt) which are joined into long concatamers and then sequenced 49 is 262,144, ~5-fold the number of human genes The count of each type of tag estimates RNA copy number >50X more efficient than cDNA sequencing because many RNAs are represented in a single sequencing run
Steps to SAGE • Copy mRNA ds cDNA using biotinylated (dT) • Cleave with anchoring enzyme (AE) which cleaves within ~250bp of poly-A tail at 3’ end. • Capture this segment on streptavidin beads • Ligate to linkers containing a type IIs restriction site, which cleave DNA 14 bp away from this site. • Ligate sequences to each other and PCR amplify • Cleave with AE to remove linkers • Concatenate, clone, and sequence
Velculescu et al. Science (1995) WHY DI-TAGS? Ditags are used to detect bias in the PCR amplification step. The probability of any two tags being coupled in the same ditag is small. Biased amplification can be detected as many ditags always having the same 2 tags present. B A B A B A PrimerA PrimerB PrimerA PrimerB
SAGE (continued) Example of a concatemer: CATGACCCACGAGCAGGGTACGATGATACATGGAAACCTATGCACCTTGGGTAGCACATG TAG1 TAG2 TAG3 TAG4 Counting the tags:
Proteomics SDS PAGE 2D PAGE MS/MS
An example SDS-PAGE How many proteins are in a band? Protein stains: Silver Copper Coomassie Blue
2D-PAGE Dimension 2: size Dimension 1: Isoelectric focusing gel
Mass spectrometry Mass spectrometers consist of three essential parts • Ionization source: Converts peptides into gas-phase ions (MALDI + ESI) • Mass analyzer: Separates ions by mass to charge (m/z) ratio (Ion trap, time of flight, quadrupole) • Ion detector: Current over time indicates amount of signal at each m/z value
A raw fragmentation spectrum By calculating the molecular weight difference between ions of the same type the sequence can be determined. SEQUEST uses the fragmentation pattern to search through a complete protein database to identify the sequence which best fits the pattern.
X X X X X X X X Isotope Coded Affinity Tags (ICAT) Mass spec based method for measuring relative protein abundances between two samples Heavy reagent: d8-ICAT(X=deuterium) Normal reagent: d0-ICAT (X=hydrogen) ICATReagents: O N N O O O I N O N O S Thiol specific reactive group Biotin tag Linker (d0 or d8)
Protein Quantification & Identification via ICAT Strategy 100 Mixture 1 Light Heavy 0 550 560 570 580 m/z ICAT-labeled cysteines Quantitation 100 NH2-EACDPLR-COOH Combine and proteolyze (trypsin) Affinity separation (avidin) Mixture 2 0 200 400 600 800 m/z ICAT Flash animation: http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/method/ICAT/ICAT.html Protein identification
ICAT continued • The heavy (blue) and light (gray) peptides are separated and quantified to produce a ratio for each peptide – here, a single peptide ratio is shown • Each peptide is subjected to CID fragmentation in the second MS stage in order to identify it
Metabolomic measurements 2D NMR or mass spectrometry Currently not global and in less widespread use than microarrays, but have tremendous potential
Gene knockout and RNAi libraries for model speciesExample from yeast: Replacement of yeast ORFS with kanMX gene flanked by unique oligo barcodes– Yeast Deletion Project Consortium
YFP tagging for protein localization YPF is green, transmitted light is red NIC96 Nuclear Pore TUB1 Tubulin cytoskeleton HHF2 Histone Nucleus BNI4 Bud neck Images courtesy T. Davis lab See also recent work byWeissman and O’Shea labs at UCSF
yfg1D yfg2D yfg3D Systematic phenotyping Barcode (UPTAG): CTAACTC TCGCGCA TCATAAT … Deletion Strain: Growth 6hrs in minimal media (how many doublings?) Rich media Harvest and label genomic DNA
Systematic phenotyping with a barcode arrayRon Davis and friends… These oligo barcodes are also spotted on a DNA microarray Growth time in minimal media: • Red: 0 hours • Green: 6 hours
Molecular Interactions Among proteins, mRNA, small molecules, and so on…
Protein→DNA interactions ▲Chromatin IP ▼DNA microarray Gene levels (on/off) Protein—protein interactions ▲Protein coIP ▼Mass spectrometry Protein levels (present/absent) Biochemical reactions ▲Not yet!!! Metabolic flux ▼measurements Biochemical levels
Also like sequence, protein interaction data are exponentially growing… EMBL Database Growthtotal nucleotides (gigabases) DIP Database Growthtotal interactions 10 5 0 1980 1990 2000 (As are the false positives!!!)
High-throughput methods for measuring interaction networks • 2-hybrid • co-immunoprecipitation w/ mass spec • chIP-on-chip • systematic genetic analysis
Yeast two-hybrid method Fields and Song
Detection of protein interactions with antibody arrays McBeath and Schreiber
Kinase-target interactions Mike Snyder and colleagues