400 likes | 542 Views
Genome Function Project. UCSC George Church 24 Aug 2001. We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont. Post-Structural Genomics Data.
E N D
Genome Function Project UCSC George Church 24 Aug 2001 We thank for support: Government and private grant agencies:NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont
Post-StructuralGenomics Data gcggatttagctcagttgggag agcgc cagact gaaga tttgga ggtcctgtgtt cgatc cacagaattcgcacca
Post-300 Genome Sequences 0.5 to 7 Mbp 10 Mbp to 1000 Gbp figure
Function Genomics Measures & Models Environment Metabolites Interactions RNA DNA Protein Growth rate Expression
Exponential technologies 1993 first browser 1994 commercial www
Agenda 1. mapping human variation (haplotype map) 2. obtaining a complete and validated set of human genes including - multiple alleles, transcripts, protein or structural RNA products - regulatory elements 3. understanding the diversity of life through genomic analysis of many organisms, and understanding how one organism works by comparative genomics with others - how genomes evolved 4. creating a new quantitative systems biology, beyond drawing circles and arrows on paper and labeling them with names nobody can remember - mapping the key interactions - mathematical/computational models of pathways and systems - dealing with multiple levels from atoms to cells
In vitro minigenome Steve Blackwell, HMS: pure IF, EF Tony Forster, BWH: tRNAs & modified bases Manz Ehrenberg, Dieter Soll : tRNA-synthetases Josh LaBaer, HMS-HIP: Expression constructs Jingdong Tian, HMS: Protein synthesis Rob Mitra & Xiaohua Huang, HMS: Polymerases, RCA Gloria Culver, Iowa State: ribosomal proteins & rRNA Harry Noller, UCSC: ribosomes
In vitro minigenome A) From atoms to evolving minigenomes and cells. This could improve in vitro transcription/translation/replication systems and conceptually link atomic (mutational) changes via molecular and systems modeling to population evolution. The synthesis of pure systems of proteins with natural or novel modifications would be or great significance. This could give an incredible focus to structural genomics. B) From cells to tissues. Modeling the effects of combinations of membrane signals and genome-programming on RNA and protein expression profiles, would allow, among other things, manipulating stem-cell fate and stability. Stability would be key to both cell culture and to long-term avoidance of cancerous stem-cell proliferation. The ability of "programmed" cells to replace or augment small molecule drugs could be rigorously assessed. C) From tissues to systems Computational programming of cell and tissue morphology can develop quantitative concepts in complexity, chaos, robustness, evolvability to engineer useful models such as sensor-effector neural feedback systems where macro aspects of the system determine the past (Darwinian) or future (prosthetic) function of the altered genomes.
Grand Challenges: goals (& details) • The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere) • The Apollo Project ’62-69: Send a person to the moon (& back) • The Smallpox Eradication ’66-77: from the whole globe (including freezers) • The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy & searchable)
Grand Challenges: goals (& details) • The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere) • The Apollo Project ’62-69: Send a person to the moon (& back) • The Smallpox Eradication ’66-77: from the whole globe (including military freezers?) • The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy with comparisons) • The BioSystems Project ’02- ??
Potential BioSystems Project Challenges Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNA Programming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populations Programming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells
Potential BioSystems Project Challenges Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNA Programming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populations Programming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells
Why the genome project worked Ulam’61-74, Staden’79, Lipman’87, Myers’87, Green’93... Sequence searching Hood’75-00, Hunkapiller’77-00, Carruthers’79... Polymer synthesis & sequencing Tabor’93, Karger’94, Mathies’96, Mullis’84... Chemistry Shotgun & mapping Sanger’77, Brenner’72-02, Sulston’90, Olson’80-00... Infrastructure Wada’82, DeLisi’84, Gilbert’87, Watson’88, Venter’91...
Metrics for structural & functional data Automate Data Model Similarity quality quality search X-ray 1960 resolution |o-c|/o DALI,etc. diffraction < 0.2nm R < 0.2 Sequence 1988 discrepancy conserved BLAST bp <0.01% proteins Expression 1999 cc, t-test shared motifs, Biclustering shared function Interact/growth outliers optimality as above?
Types of Systems Interaction Models Quantum Electrodynamics subatomic Quantum mechanics electron clouds Molecular mechanics spherical atoms nm-fs Master equations stochastic single molecules Fokker-Planck approx. stochastic Macroscopic rates ODE Concentration & time (C,t) Flux Balance Optima dCik/dt optimal steady state Thermodynamic models dCik/dt = 0 k reversible reactions Steady State SdCik/dt = 0 (sum k reactions) Metabolic Control Analysis d(dCik/dt)/dCj (i = chem.species) Spatially inhomogenous dCi/dx Population dynamics as above km-yr Increasing scope, decreasing resolution
Capillary electrophoresis $300,000 (DNA Sequencing) : 0.4Mb/day Chromatography-Mass Spectrometry (eg. peptide LC-ESI-MS) : 20Mb/day Microarray scanners (eg. RNA) : 300 Mb/day mpg Reagent costs: mpg Electrophoresis (DNA Sequencing) : 10 ul per 0.5 Kb Microarray reactions: 10 ul per 1000 Kb Sources of Data for BioSystems Modeling: Intel cmos microscope $99
RNA quantitation Aach, Rindone, Church, (2000) Genome Research 10: 431-445. experiment ORF • R/G ratios • R, G values • quality indicators control • Microarrays1 • Affymetrix2 • SAGE3 ORF • Averaged PM-MM • “presence” • feature statistics • 25-mers PM MM ORF SAGE Tag • Counts of SAGE 14-mers sequence tags for each ORF concatamers 1 DeRisi, et.al., Science278:680-686 (1997) 2 Lockhart, et.al., Nat Biotech14:1675-1680 (1996) 3 Velculescu, et.al, Serial Analysis of Gene Expression, Science270:484-487 (1995)
Array opportunities • 22 bp ds-RNAi array modulates single cell type • Drug array time-release or photo-release • Primer pair arrays for haplotyping • Gene & genome synthesis (DARPA)
Polypeptide arrays Photo-deprotect peptides (Affymax) Piezo or contact spotting (Harvard-CGR, Stanford) Phage or ribosome display capture (Bulyk) In situ ribosomal synthesis (Tian) Harvard Inst. Proteomics, FLEXGene consortium
B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Primer A has 5’ immobilizing (Acrydite) modification. Single Molecule From Library A’ Primer is Extended by Polymerase A 1st Round of PCR
3’ 3’ 5’ 5’ B B B’ B’ A G T C G T G . . . . Sequence polonies by sequential, fluorescent single-base extensions 1. Remove 1 strand of DNA. 2. Hybridize Universal Primer. 3. Add Red(Cy3) dTTP. 4. Wash; Scan Red Channel
B B B’ B’ Sequence polonies by sequential, fluorescent single-base extensions 5. Add Green(FITC) dCTP 6. Wash; Scan Green Channel 3’ 5’ 3’ 5’ C G A T C G C G T . . .
Polony Template T A T T G T T A A A G T G T G T C C T T T G T C G A T A C T G G T A …5’ 3’ P’ A T A A C A A T T T C A C A C A G G A A A C A G C T A T G A C C A T 5’ P Primer Extension 26 cycles, 34 Nucleotides Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43 FITC ( C) CY3 ( T)
Polony haplotyping Trans Cis
Function Genomics Measures & Models Environment Metabolites RNAi Insertions SNPs RNA DNA Protein Growth rate microbes stem cells cancer cells multicellular organisms
lysC 1 2 10.4 Competition among multiple mutations & multiple homologous domains thrA 1 2 3 1.1 6.7 metL 1 2 3 1.8 1.8 Selective disadvantage in minimal media probes
Multiple mutations per gene Correlation between two selection experiments
predictions number of genes negatively selected not negatively selected essential 143 80 63 reduced growth rate 46 24 22 non essential 299 119 180 Comparison of selection data with FBO predictions(scale up from79 to 488 genes) > Novel duplicates? < Position effects? P-value Chi Square = 0.004
Function Genomics Measures & Models Environment Metabolites RNA Protein DNA Expression
RNA quantitation(Frequently Asked Questions) Is less than a 2-fold RNA-ratio ever important? Yes; 1.5-fold in trisomies. Why oligonucleotides rather than cDNAs? Alternative RNAs, gene families. Using a subset of the genome or ratios to various control RNAs? Trouble for later (meta) analyses.
Lpp mRNA start & structure See: Selinger et al Nat Biotech
gene sequences generate candidate oligos predict cross-hybridization filter & select oligos experimental results parameters (Tm, length, ...) gene-specific oligos background sequences generate chip layout generate control, border oligos controls, text, border oligos chip layout Oligo selection • PGA/Smith group already designing software for oligo selection • Church Lab / Lipper Center has additional tools • Unique oligos (cu-15s) • RNA string matching program Figure courtesy of Adnan Derti
Combinatorial arrays for binding constants (EGR1) HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen Choo ds-DNA array
pVIII pIII Antibodies Phage Combinatorial arrays for binding constants Combinatorial DNA-binding protein domains ds-DNA array
Combinatorial arrays for binding constants Phycoerythrin - 2º IgG Phage Combinatorial DNA-binding protein domains ds-DNA array Martha Bulyk et al
Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA Recognition Isalan et al., Biochemistry (‘98) 37:12026-12033
Wildtype EGR1 Microarray high [DNA] (+) ctrl sequence for wt binding etc. alignment oligos
Motifs weight all 64 Kaapp Wildtype RSDHLTT TGG 2.8 nM GCG 16 nM 2.5 nM TAT 5.7 nM AAA,AAT,ACT,AGA, AGC,AGT,CAT,CCT, CGA,CTT,TTC,TTT AAT 240 nM RGPDLAR REDVLIR LRHNLET KASNLVS