1 / 38

Thanks to the Lipper Center for Computational Genetics

Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3. George Church 1-May-02. Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI,

malini
Download Presentation

Thanks to the Lipper Center for Computational Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3 George Church 1-May-02 Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran

  2. gggatttagctcagttgggagagcgccagactgaa gat ttg gag gtcctgtgttcgatccacagaattcgcacca Post- 300 genomes & 3D structures

  3. Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate Microbes Cancer & stem cells Darwinian In vitro replication Small multicellular organisms

  4. Functional Genomics Challenges • Systems dynamics and optimality modeling. • Multiple genetic domains per gene: high density • readout of whole genome mutant phenotypes. • Multiple RNAs & regulatory proteins per gene. • Many causative genes & haplotypes per disease. • Polony RNA exon-typing • Multiplex in situ RNA & protein analyses • Automated differentiation • Homologous recombination genome engineering

  5. Human Red Blood CellODE model200 measured parameters ADP ATP 1,3 DPG NADH 3PG NAD GA3P 2PG 2,3 DPG FDP DHAP ADP PEP ATP ADP F6P ATP PYR R5P GA3P F6P NADH G6P GL6P GO6P RU5P NAD LACi LACe X5P S7P E4P ADP NADP NADP NADPH NADPH ATP GLCe GLCi Cl- 2 GSH GSSG GA3P F6P ADP K+ NADPH NADP pH ATP Na+ ADP HCO3- ADO AMP ADE ADP ATP PRPP INO IMP ATP ADOe AMP PRPP ATP INOe Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286. R5P R1P ADEe HYPX (http://atlas.med.harvard.edu/gmc/rbc.html)

  6. Modeling suboptimality: Segre, Edwards, Vitkup

  7. Calculated & Observed Fluxes in wt Wild type, C 0.4-limited CC=0.97 Calculted Flux Observed Fluxes in wt

  8. Replication rate of a whole-genome set of mutants Badarinarayana, et al. (2001) Nature Biotech.19: 1060

  9. lysC 1 2 10.4 Replication rate challenge met: multiple homologous domains thrA 1 2 3 1.1 6.7 metL 1 2 3 1.8 1.8 Selective disadvantage in minimal media probes

  10. Multiple mutations per gene Correlation between two selection experiments Badarinarayana, et al. (2001) Nature Biotech.19: 1060

  11. predictions number of genes negatively selected not negatively selected essential 143 80 63 reduced growth rate 46 24 22 non essential 299 119 180 Comparison of selection data with Flux Balance Optimization predictions on 488 genes > Novel duplicates? < Position effects, toxin accumulation, non-opt? P-value Chi Square = 0.004

  12. Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms

  13. RNA quantitation issues Small fold changes in RNA are important. Example: 1.5-fold in trisomies. Cross-hybridizing RNAs. Alternative RNAs, gene families. Mixed tissues. In situ hybridization has low multiplex.

  14. Gene Expression database Aach, Rindone, Church, (2000) Genome Research 10: 431-445. experiment ORF • R/G ratios • R, G values • quality indicators control • Microarrays1 • Affymetrix2 • Lynx-MPSS3, SAGE4 ORF • Averaged PM-MM • “presence” • feature statistics • 25-mers PM MM • Counts of 14-mers sequence tags for each ORF agactagcag 1 DeRisi, et.al., Science278:680-686 (1997) 2 Lockhart, et.al., Nat Biotech14:1675-1680 (1996) 3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000) 4 Velculescu, et.al, Serial Analysis of Gene Expression, Science270:484-487 (1995)

  15. RNA Cluster Analyses: Cell Cycle N = 186 Number of sites Number of sites Distance from ATG (b.p.) Distance from ATG (b.p.) MCB SCB Number of ORFs Number of ORFs Tavazoie, et al. 1999 Nature Genetics 22:281. CLUSTER CLUSTER

  16. Combining mouse knockouts with RNA array analysis (homeobox gene Crx-/-) Livesey, Furukawa, Steffen, Church, Cepko (2000) Current Biol.10:301. sp

  17. Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms

  18. Combinatorial arrays for binding constantsHuman/Mouse EGR1 HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen Choo ds-DNA array

  19. Combinatorial arrays for binding constants pVIII pIII Antibodies Phage Combinatorial DNA-binding protein domains ds-DNA array

  20. Combinatorial arrays for binding constants Phycoerythrin - 2º IgG Phage Combinatorial DNA-binding protein domains ds-DNA array Martha Bulyk et al

  21. Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA Recognition Isalan et al., Biochemistry (‘98) 37:12026-12033

  22. Wildtype EGR1 Microarray high [DNA] (+) ctrl sequence for wt binding etc. alignment oligos

  23. Motifs weight all 64 Kaapp Wildtype RSDHLTT TGG 2.8 nM GCG 16 nM 2.5 nM TAT 5.7 nM AAA,AAT,ACT,AGA, AGC,AGT,CAT,CCT, CGA,CTT,TTC,TTT AAT 240 nM RGPDLAR REDVLIR LRHNLET KASNLVS

  24. Biosystems Measures & Models Environment Metabolites RNAi Insertions SNPs Protein: in vivo & in vitro interactions RNA DNA Replication rate microbes cancer & stem cells In vitro replication small multicellular organisms

  25. Common diseases: billions of “new” allelesplus a millions of balanced polymorphisms • 60 new mutations per generation * 5,000 generations since major bottleneck(s) which set up the linkage patterns (=300,000 per genome) • Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T,D • 600,000 of each SNP on earth (spread over the common haplotypes). • The population frequency will be <0.01%. • (Aach et al, 2001 Nature 409: 856) • Functional genomics (FG) may provide better leads for • therapies & diagnostics. (Accuracy goal 1 ppb?)

  26. Projected costs affect our view of what is possible. In 1985, the dawn of the genome project, $10 per bp, would have been $30B per genome. In 2002, Perlegen or Lynx: $3M (103bits/$, 4 logs) In 2001, the cost of video data collection? 1013 bits/$ Genotyping & functional genomics demand will probably be as high as permitted by costs.

  27. Why lower-cost, high quality “sequencing”? Environmental, food, & biodiversity monitoring Human genome haplotyping RNA splicing & editing immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34

  28. B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Primer A has 5’ immobilizing (Acrydite) modification. Single Molecule From Library A’ Primer is Extended by Polymerase A 1st Round of PCR

  29. Sequence polonies by sequential,fluorescent single-base extensions 3’ 3’ 5’ 5’ B B B’ B’ A G T C G G T . . . . 1. Remove 1 strand of DNA. 2. Hybridize Universal Primer. 3. Add Red(Cy3) dTTP. 4. Wash; Scan Red Channel

  30. Sequence polonies by sequential, fluorescent single-base extensions B B B’ B’ 5. Add Green(FITC) dCTP 6. Wash; Scan Green Channel 3’ 5’ 3’ 5’ C G A T C G C G T . . .

  31. Primer Extension 26 cycles, 34 Nucleotides Polony Template T A T T G T T A A A G T G T G T C C T T T G T C G A T A C T G G T A …5’ 3’ P’ A T A A C A A T T T C A C A C A G G A A A C A G C T A T G A C C A T 5’ P Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43 FITC ( C) CY3 ( T)

  32. Why lower-cost, high quality “sequencing”? • Environmental, food, & biodiversity monitoring • Human genome haplotyping • RNA splicing & editing • immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34

  33. Why lower-cost, high quality “sequencing”? • Environmental, food, & biodiversity monitoring • Human genome haplotyping • RNA splicing & editing • immune B&T cell receptor spectra & How? Femtoliter (10-15) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Mitra & Church Nucleic Acids Res. 27: e34

  34. RNA Exon typing • Single molecules of RNA dispersed. • Multiplex polonies spanning all likely variable exons • Sequential probing of each exon.

  35. Functional Genomics Challenges • Systems dynamics and optimality modeling. • Multiple genetic domains per gene: high density • readout of whole genome mutant phenotypes. • Multiple RNAs & regulatory proteins per gene. • Many causative genes & haplotypes per disease. • Polony RNA exon-typing • Multiplex in situ RNA & protein analyses • Automated differentiation • Homologous recombination genome engineering

  36. For more information:arep.med.harvard.edu

More Related