1 / 60

BIONF/BENG 203: Functional Genomics

BIONF/BENG 203: Functional Genomics. Lecture TI 1 Trey Ideker UCSD Department of Bioengineering. Sources of Functional Data Lectures 1 and 2. Grading. 40% Problem Sets (best 4 of 5) 30% Midterm 30% Final Project. Outline of the course. Biological data sources (2).

derekr
Download Presentation

BIONF/BENG 203: Functional Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIONF/BENG 203:Functional Genomics Lecture TI 1 Trey Ideker UCSD Department of Bioengineering Sources of Functional DataLectures 1 and 2

  2. Grading • 40% Problem Sets (best 4 of 5) • 30% Midterm • 30% Final Project

  3. Outline of the course Biological data sources (2) Data pre-processing (6) Total of 17 lectures Project Presentations (2)

  4. Functional Genomics Data • Expression mRNA, protein • Molecular interactions Protein, mRNA, small molecules • Knockout phenotypes 1st, 2nd, higher orders • SNP sequence (polymorphism) data • Imaging data Sub-cellular localization Cell morphology • Gene ontology

  5. Dividing the data into two classes of information:Biological Networks and Network States • Directly observe the network “wires” themselves • Protein-protein interactions: • Two-hybrid system, coIP, protein antibody arrays • BIND, DIP • Protein-DNA interactions: • Chromatin IP • BIND, Transfac, SCPD • Other types not yet possible: • e.g., protein-small molecule • Observe molecular states that result from the interaction wiring • DNA/RNA Gene expression: • DNA microarrays, SAGE • Protein levels, locations, and modifications: • Mass spectrometry, fluorescence microscopy, protein arrays • Gross phenotypes: • e.g., growth rates of single and double deletion strains 1) 2)

  6. High-throughput methods for measuring cellular states • Gene expression levels: RT-PCR, arrays • Protein levels, modifications: mass specProtein locations: fluorescent tagging • Metabolite levels: NMR and mass spec • Systematic phenotyping

  7. The transcriptome and proteome • The transcriptome is the full complement of RNA molecules produced by a genome • The proteome is the full complement of proteins enabled by the transcriptome • DNA  RNA  protein • Genome  transcriptome  proteome • 30,000 genes  ??? RNAs  ??? proteins? • For example, the drosophila gene Dscam can generate 40,000 distinct transcripts through alternative splicing. • What is the minimum number of exons that would be required?

  8. Expression: High-throughput approaches RNA • DNA Microarrays • cDNA / EST sequencing • RT-PCR • Differential display • SAGE • Massively parallel signature sequencing (MPSS) Proteins • 2D PAGE • Mass spectrometry

  9. Gene expression arrays They are really, really, really, really, really, really, really, really, really, really, really, really, really important

  10. Microarrays • Monitors the level of each gene: • Is it turned on or off in a particular biological condition? • Is this on/off state different between two biological conditions? • Microarray is a rectangular grid of spots printed on a glass microscope slide, where each spot contains DNA for a different gene

  11. Two-color DNA microarray design Reverse Transcription

  12. cDNA-chip of brain glioblastoma

  13. Types of microarrays • Spotted (cDNA) • Robotic transfer of cDNA clones or PCR products • Spotting on nylon membranes or glass slides coated with poly-lysine • Synthetic (oligo) • Direct oligo synthesis on solid microarray substrate • Uses photolithography (Affymetrix) or ink-jet printing (Agilent) • All configurations assume the DNA on the array is in excess of the hybridized sample—thus the kinetics are linear and the spot intensity reflects that amount of hybridized sample. • Labeling can be radioactive, fluorescent (one-color), or two-color

  14. Microarray Spotter

  15. Affymetrix High Density Arrays

  16. Microarrays (continued) Imaging • Radioactive 32P labeling: Autoradiography or phosphorimager • Fluorescent labeling: Confocal microscope (invented by Marvin Minsky!!) Feature density • Nylon membrane macroarrays  100-1000 features • Glass slide spotted array  5,000 features / cm2 • Synthesized arrays  50,000 features / cm2

  17. Microarrayconfocal scanner • Collects sharply defined optical sections from which 3D renderings can be created • The key is spatial filtering to eliminate out-of-focus light or glare in specimens whose thickness exceeds the immediate plane of focus. • Two lasers for excitation • Two color scan in less than 10 minutes • High resolution, 10 micron pixel size

  18. cDNA / EST sequencing projects • cDNA = complementary or copy DNA • EST = Expressed Sequence Tag • The microarray could be described as a “closed system” because information about RNAs is limited by the targets available for hybridization. RNAs not represented on the array are not interrogated. • Direct sequencing of cDNAs (yielding ESTs) overcomes this problem by large-scale random sampling of sequences from a whole-cell RNA extract • Statistical counting of distinct sequences provides an estimate of expression level • Conversely, cDNA library can be normalized to capture rare messages • Requires large scale sequencing to get statistical significance

  19. cDNA / EST Sequencing:Preparation of a cDNA library in phage l vector

  20. SAGE Technology SerialAnalysis ofGeneExpression Takes idea of sequence sampling to the extreme Generates short ESTs (9-14nt) which are joined into long concatamers and then sequenced 49 is 262,144, ~5-fold the number of human genes The count of each type of tag estimates RNA copy number >50X more efficient than cDNA sequencing because many RNAs are represented in a single sequencing run

  21. Steps to SAGE • Copy mRNA  ds cDNA using biotinylated (dT) • Cleave with anchoring enzyme (AE) which cleaves within ~250bp of poly-A tail at 3’ end. • Capture this segment on streptavidin beads • Ligate to linkers containing a type IIs restriction site, which cleave DNA 14 bp away from this site. • Ligate sequences to each other and PCR amplify • Cleave with AE to remove linkers • Concatenate, clone, and sequence

  22. Velculescu et al. Science (1995) WHY DI-TAGS? Ditags are used to detect bias in the PCR amplification step. The probability of any two tags being coupled in the same ditag is small. Biased amplification can be detected as many ditags always having the same 2 tags present. B A B A B A PrimerA PrimerB PrimerA PrimerB

  23. SAGE (continued) Example of a concatemer: CATGACCCACGAGCAGGGTACGATGATACATGGAAACCTATGCACCTTGGGTAGCACATG TAG1 TAG2 TAG3 TAG4 Counting the tags:

  24. Proteomics SDS PAGE 2D PAGE MS/MS

  25. An example SDS-PAGE How many proteins are in a band? Protein stains: Silver Copper Coomassie Blue

  26. 2D-PAGE Dimension 2: size Dimension 1: Isoelectric focusing gel

  27. 2D gel from macrophage phagosomes

  28. Mass spectrometry Mass spectrometers consist of three essential parts • Ionization source: Converts peptides into gas-phase ions (MALDI + ESI) • Mass analyzer: Separates ions by mass to charge (m/z) ratio (Ion trap, time of flight, quadrupole) • Ion detector: Current over time indicates amount of signal at each m/z value

  29. MS/MS Overview

  30. MS/MS Overview

  31. A raw fragmentation spectrum By calculating the molecular weight difference between ions of the same type the sequence can be determined. SEQUEST uses the fragmentation pattern to search through a complete protein database to identify the sequence which best fits the pattern.

  32. Tandem Mass Spec (MS/MS)

  33. Typical nanoelectrospray source

  34. X X X X X X X X Isotope Coded Affinity Tags (ICAT) Mass spec based method for measuring relative protein abundances between two samples Heavy reagent: d8-ICAT(X=deuterium) Normal reagent: d0-ICAT (X=hydrogen) ICATReagents: O N N O O O I N O N O S Thiol specific reactive group Biotin tag Linker (d0 or d8)

  35. Protein Quantification & Identification via ICAT Strategy 100 Mixture 1 Light Heavy 0 550 560 570 580 m/z ICAT-labeled cysteines Quantitation 100 NH2-EACDPLR-COOH Combine and proteolyze (trypsin) Affinity separation (avidin) Mixture 2 0 200 400 600 800 m/z ICAT Flash animation: http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/method/ICAT/ICAT.html Protein identification

  36. ICAT continued • The heavy (blue) and light (gray) peptides are separated and quantified to produce a ratio for each peptide – here, a single peptide ratio is shown • Each peptide is subjected to CID fragmentation in the second MS stage in order to identify it

  37. Metabolomic measurements 2D NMR or mass spectrometry Currently not global and in less widespread use than microarrays, but have tremendous potential

  38. Gene knockout and RNAi libraries for model speciesExample from yeast: Replacement of yeast ORFS with kanMX gene flanked by unique oligo barcodes– Yeast Deletion Project Consortium

  39. YFP tagging for protein localization YPF is green, transmitted light is red NIC96 Nuclear Pore TUB1 Tubulin cytoskeleton HHF2 Histone Nucleus BNI4 Bud neck Images courtesy T. Davis lab See also recent work byWeissman and O’Shea labs at UCSF

  40. yfg1D yfg2D yfg3D Systematic phenotyping Barcode (UPTAG): CTAACTC TCGCGCA TCATAAT … Deletion Strain: Growth 6hrs in minimal media (how many doublings?) Rich media Harvest and label genomic DNA

  41. Systematic phenotyping with a barcode arrayRon Davis and friends… These oligo barcodes are also spotted on a DNA microarray Growth time in minimal media: • Red: 0 hours • Green: 6 hours

  42. Molecular Interactions Among proteins, mRNA, small molecules, and so on…

  43. Protein→DNA interactions ▲Chromatin IP ▼DNA microarray Gene levels (on/off) Protein—protein interactions ▲Protein coIP ▼Mass spectrometry Protein levels (present/absent) Biochemical reactions ▲Not yet!!! Metabolic flux ▼measurements Biochemical levels

  44. Also like sequence, protein interaction data are exponentially growing… EMBL Database Growthtotal nucleotides (gigabases) DIP Database Growthtotal interactions 10 5 0 1980 1990 2000 (As are the false positives!!!)

  45. High-throughput methods for measuring interaction networks • 2-hybrid • co-immunoprecipitation w/ mass spec • chIP-on-chip • systematic genetic analysis

  46. Yeast two-hybrid method Fields and Song

  47. Detection of protein interactions with antibody arrays McBeath and Schreiber

  48. Kinase-target interactions Mike Snyder and colleagues

More Related