380 likes | 453 Views
Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3. George Church 1-May-02. Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI,
E N D
Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3 George Church 1-May-02 Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran
gggatttagctcagttgggagagcgccagactgaa gat ttg gag gtcctgtgttcgatccacagaattcgcacca Post- 300 genomes & 3D structures
Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate Microbes Cancer & stem cells Darwinian In vitro replication Small multicellular organisms
Functional Genomics Challenges • Systems dynamics and optimality modeling. • Multiple genetic domains per gene: high density • readout of whole genome mutant phenotypes. • Multiple RNAs & regulatory proteins per gene. • Many causative genes & haplotypes per disease. • Polony RNA exon-typing • Multiplex in situ RNA & protein analyses • Automated differentiation • Homologous recombination genome engineering
Human Red Blood CellODE model200 measured parameters ADP ATP 1,3 DPG NADH 3PG NAD GA3P 2PG 2,3 DPG FDP DHAP ADP PEP ATP ADP F6P ATP PYR R5P GA3P F6P NADH G6P GL6P GO6P RU5P NAD LACi LACe X5P S7P E4P ADP NADP NADP NADPH NADPH ATP GLCe GLCi Cl- 2 GSH GSSG GA3P F6P ADP K+ NADPH NADP pH ATP Na+ ADP HCO3- ADO AMP ADE ADP ATP PRPP INO IMP ATP ADOe AMP PRPP ATP INOe Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286. R5P R1P ADEe HYPX (http://atlas.med.harvard.edu/gmc/rbc.html)
Modeling suboptimality: Segre, Edwards, Vitkup
Calculated & Observed Fluxes in wt Wild type, C 0.4-limited CC=0.97 Calculted Flux Observed Fluxes in wt
Replication rate of a whole-genome set of mutants Badarinarayana, et al. (2001) Nature Biotech.19: 1060
lysC 1 2 10.4 Replication rate challenge met: multiple homologous domains thrA 1 2 3 1.1 6.7 metL 1 2 3 1.8 1.8 Selective disadvantage in minimal media probes
Multiple mutations per gene Correlation between two selection experiments Badarinarayana, et al. (2001) Nature Biotech.19: 1060
predictions number of genes negatively selected not negatively selected essential 143 80 63 reduced growth rate 46 24 22 non essential 299 119 180 Comparison of selection data with Flux Balance Optimization predictions on 488 genes > Novel duplicates? < Position effects, toxin accumulation, non-opt? P-value Chi Square = 0.004
Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms
RNA quantitation issues Small fold changes in RNA are important. Example: 1.5-fold in trisomies. Cross-hybridizing RNAs. Alternative RNAs, gene families. Mixed tissues. In situ hybridization has low multiplex.
Gene Expression database Aach, Rindone, Church, (2000) Genome Research 10: 431-445. experiment ORF • R/G ratios • R, G values • quality indicators control • Microarrays1 • Affymetrix2 • Lynx-MPSS3, SAGE4 ORF • Averaged PM-MM • “presence” • feature statistics • 25-mers PM MM • Counts of 14-mers sequence tags for each ORF agactagcag 1 DeRisi, et.al., Science278:680-686 (1997) 2 Lockhart, et.al., Nat Biotech14:1675-1680 (1996) 3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000) 4 Velculescu, et.al, Serial Analysis of Gene Expression, Science270:484-487 (1995)
RNA Cluster Analyses: Cell Cycle N = 186 Number of sites Number of sites Distance from ATG (b.p.) Distance from ATG (b.p.) MCB SCB Number of ORFs Number of ORFs Tavazoie, et al. 1999 Nature Genetics 22:281. CLUSTER CLUSTER
Combining mouse knockouts with RNA array analysis (homeobox gene Crx-/-) Livesey, Furukawa, Steffen, Church, Cepko (2000) Current Biol.10:301. sp
Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms
Combinatorial arrays for binding constantsHuman/Mouse EGR1 HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen Choo ds-DNA array
Combinatorial arrays for binding constants pVIII pIII Antibodies Phage Combinatorial DNA-binding protein domains ds-DNA array
Combinatorial arrays for binding constants Phycoerythrin - 2º IgG Phage Combinatorial DNA-binding protein domains ds-DNA array Martha Bulyk et al
Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA Recognition Isalan et al., Biochemistry (‘98) 37:12026-12033
Wildtype EGR1 Microarray high [DNA] (+) ctrl sequence for wt binding etc. alignment oligos
Motifs weight all 64 Kaapp Wildtype RSDHLTT TGG 2.8 nM GCG 16 nM 2.5 nM TAT 5.7 nM AAA,AAT,ACT,AGA, AGC,AGT,CAT,CCT, CGA,CTT,TTC,TTT AAT 240 nM RGPDLAR REDVLIR LRHNLET KASNLVS
Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms
Common diseases: billions of “new” allelesplus a millions of balanced polymorphisms • 60 new mutations per generation * 5,000 generations since major bottleneck(s) which set up the linkage patterns (=300,000 per genome) • Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T,D • 600,000 of each SNP on earth (spread over the common haplotypes). • The population frequency will be <0.01%. • (Aach et al, 2001 Nature 409: 856) • Functional genomics (FG) may provide better leads for • therapies & diagnostics. (Accuracy goal 1 ppb?)
Projected costs affect our view of what is possible. In 1985, the dawn of the genome project, $10 per bp, would have been $30B per genome. In 2002, Perlegen or Lynx: $3M (103bits/$, 4 logs) In 2001, the cost of video data collection? 1013 bits/$ Genotyping & functional genomics demand will probably be as high as permitted by costs.
Why lower-cost, high quality “sequencing”? Environmental, food, & biodiversity monitoring Human genome haplotyping RNA splicing & editing immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34
B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Primer A has 5’ immobilizing (Acrydite) modification. Single Molecule From Library A’ Primer is Extended by Polymerase A 1st Round of PCR
Sequence polonies by sequential,fluorescent single-base extensions 3’ 3’ 5’ 5’ B B B’ B’ A G T C G G T . . . . 1. Remove 1 strand of DNA. 2. Hybridize Universal Primer. 3. Add Red(Cy3) dTTP. 4. Wash; Scan Red Channel
Sequence polonies by sequential, fluorescent single-base extensions B B B’ B’ 5. Add Green(FITC) dCTP 6. Wash; Scan Green Channel 3’ 5’ 3’ 5’ C G A T C G C G T . . .
Primer Extension 26 cycles, 34 Nucleotides Polony Template T A T T G T T A A A G T G T G T C C T T T G T C G A T A C T G G T A …5’ 3’ P’ A T A A C A A T T T C A C A C A G G A A A C A G C T A T G A C C A T 5’ P Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43 FITC ( C) CY3 ( T)
Why lower-cost, high quality “sequencing”? • Environmental, food, & biodiversity monitoring • Human genome haplotyping • RNA splicing & editing • immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34
Why lower-cost, high quality “sequencing”? • Environmental, food, & biodiversity monitoring • Human genome haplotyping • RNA splicing & editing • immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34
RNA Exon typing • Single molecules of RNA dispersed. • Multiplex polonies spanning all likely variable exons • Sequential probing of each exon.
Functional Genomics Challenges • Systems dynamics and optimality modeling. • Multiple genetic domains per gene: high density • readout of whole genome mutant phenotypes. • Multiple RNAs & regulatory proteins per gene. • Many causative genes & haplotypes per disease. • Polony RNA exon-typing • Multiplex in situ RNA & protein analyses • Automated differentiation • Homologous recombination genome engineering