1 / 43

Analyzing Genomes & Transcriptomes with Structural and Functional Categories

1 Known Folds in Genomes Shared and Common Folds. Fold Tree , Top-10 . bab . Biases . Extent of MG fold assignment (65%) . 2 Predicted Folds in Genomes TM’s Zipf law. Worm 7-TMs. Same 2º comp. but diff. a.a. comp. Biases: Extrapolation from known struct. to genomes?

acton
Download Presentation

Analyzing Genomes & Transcriptomes with Structural and Functional Categories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1 Known Folds in GenomesShared and Common Folds. Fold Tree, Top-10. bab. Biases. Extent of MG fold assignment (65%). 2 Predicted Folds in GenomesTM’s Zipf law. Worm 7-TMs. Same 2º comp. but diff. a.a. comp. Biases: Extrapolation from known struct. to genomes? 3 Folds in the TranscriptomeTop-10 by expression. Enriched: A, G, synthesis, ab, TIM; Depleted: N, very long. Phenotypes clustering. Protein Abundance AnalysisComparison with transcriptome analysis PhenotypesClustering of phenotype profiles and growth conditions Analyzing Genomes & Transcriptomes with Structural and Functional Categories W Krebs T Johnson Y Kluger R Das P Harrison S Balasubramanian C Wilson V Alexandrov S Teichmann H Hegyi J Lin M Snyder P Ross-MacDonald A Kumar P Coelho D Greenbaum P Bertone A Drawid bioinfo.mbb.yale.edu/genome

  2. 1995 Genomes highlight the Finitenessof the World of Sequences • Bacteria, 1.6 Mb, ~1600 genes [Science269: 496] 1997 • Eukaryote, 13 Mb, ~6K genes [Nature 387: 1] 1998 • Animal, ~100 Mb, ~20K genes [Science282: 1945] 2000? • Human, ~3 Gb, ~100K genes [???]

  3. ~100000 genes 229 Folds ~1000 folds (human) (T. pallidum) 91 Functions ~1000 genes The World of Structuresis even more Finite,providing a valuable simplification Likewise, for the world of pathways, functions, regulatory systems.... Cross-referencing http://ecocyc.pangeasystems.com/ecocyc/ecocyc.html

  4. Fold Library vs.Other Fundamental Data structures Parts List Database;Statistical, rather than mathematical relationships and conclusions (Large than physics and chemistry, Similar to Finance (Exact Finite Number of Objects (3,056 on NYSE by 1/98), descrip. by Standardized Statistics (even abbrevs, INTC) and groups (sectors)) Smaller than Social Surveys, Indefinite Number of People, Not Well Defined Vocabulary and statistics.

  5. Hb Building a Library of Know Folds Fusing into a Single Core Structure Template Alignment of Individual Structures P<.000001 Mb Statisticsto Establish Relation-ships (P-values) P<.001 P~1

  6. Cross-Reference: FoldsSequences Organisms (3) Organize Sequences by Genome or Taxon (2) Match Sequences (fasta,blast) 3+5 (4) Results in “Fold Table” (1) Structures in Folds (scop)

  7. Venn Diagrams for Shared Folds of 339 ~300-350 folds (282 folds in scop 1.32 [‘96]) ~120K sequences in OWL 27.1 7 phylogeneticgroups of organisms 5 genomes -- HI, EC (bacteria), MJ (archeon), SC (eukaryote), CE (worm, animal) a/b

  8. 20 10 30 Cluster Trees Grouping Initial Genomes on Basis of Shared Folds Fold Tree “Classic” Tree D=S/T S = # shared folds 20 Genomes D = shared fold dist. betw. 2 genomes T= total # folds in both D=10/(20+10+30)

  9. Distribution of Foldsin Various Classes Unusual distribution of all-beta folds

  10. Top-10 Folds in a Genome Depends on comparison method, DB, &c (new top superfamilies via y-Blast, Intersection of top-10 to get shared and common) Yeast Archaea Eubacteria

  11. 336: 42 Characteristics of Common, Shared Folds: bab structure All share a/b structure with repeated R.H. bab units connecting adjacent strands or nearly so (18+4+2 of 24) HI, MJ, SC vs scop 1.32

  12. super-secondary structure (bb,TM-TM, abab,aaa) personplant protein fold (Ig) helix strand individual atom (C,H,O...) 1m 100Å 10Å 1Å At What Structural ResolutionAre Organisms Different? Practical Relevance (human) (Pathogen only folds as possible targets) (T. pallidum) Drug

  13. Bias Problem  Prediction • Known Structures are Incomplete,Biased Sample from Genome, so... • Resample • Solve Structures • Predict Structures Same Sampling Issues with US Census!! Known Struct.

  14. 1 Known Folds in GenomesShared and Common Folds. Fold Tree, Top-10. bab. Biases. Extent of MG fold assignment (65%). 2 Predicted Folds in GenomesTM’s Zipf law. Worm 7-TMs. Same 2º comp. but diff. a.a. comp. Biases: Extrapolation from known struct. to genomes? 3 Folds in the TranscriptomeTop-10 by expression. Enriched: A, G, synthesis, ab, TIM; Depleted: N, very long. Phenotypes clustering. Protein Abundance AnalysisComparison with transcriptome analysis PhenotypesClustering of phenotype profiles and growth conditions Analyzing Genomes & Transcriptomes with Structural and Functional Categories W Krebs T Johnson Y Kluger R Das P Harrison S Balasubramanian C Wilson V Alexandrov S Teichmann H Hegyi J Lin M Snyder P Ross-MacDonald A Kumar P Coelho D Greenbaum P Bertone A Drawid bioinfo.mbb.yale.edu/genome

  15. TM prediction (KD, GES). Count number with 2 peaks, 3 peaks, &c. Similar conclusions to others: von Heijne, Rost, Jones, &c. Divide Predictions into sure and marginal(Boyd & Beckwith’s criteria) TM-helix “prediction”

  16. Yeast has more mem. prots., esp. 2-TMs Similar conclusions to others: von Heijne, Rost, Jones, &c. Overall, no strong preference for particular supersecondary structures Freq. of Number of TM helixes follows a Zipf-like law: F=1/[5n2] In detail, worm has a peak for 7-TMs and E. coli for 12-TMs Comparative Genomics of Membrane Proteins

  17. 2º Structure Prediction • Bulk prediction of 2º struc. in genomes • Same fraction of a and b (by element, half each) • Both overall and only for unknown soluble proteins. • Diff From PDB: 31% helical and 21% strand. • Related results: Frishman Not expected since.…..

  18. Different Amino Acid Composition Should Give Different 2º Structure Each a.a. has different propensity for local structure -> Different Compositions (K from 4.4 in EC to 10.4 in MJ, Q too) -> Different Local Structure (but compensation?) Propensities from Regan (beta) and Baldwin (alpha)

  19. How Representative are the Known Structures of the Proteins in a Complete Genome? The issue of Bias • Assess 2º,TM predictions • (+) comprehensive, statistical • (-) predictions inaccurate(~65%) • (-) extrapolate from PDB (esp. TM),domain problem • Is prediction (extrapolation) based on known structures justified? • Length: Genomes Sequences are longer than those in Known Structures • 340 aa for avg. genome seq.(470 aa for yeast) 205 aa for PDB chain~160 aa for PDB domain

  20. Amino Acid Composition How Representative are the Known Structures of the Proteins in Complete Genome?

  21. 1 Known Folds in GenomesShared and Common Folds. Fold Tree, Top-10. bab. Biases. Extent of MG fold assignment (65%). 2 Predicted Folds in GenomesTM’s Zipf law. Worm 7-TMs. Same 2º comp. but diff. a.a. comp. Biases: Extrapolation from known struct. to genomes? 3 Folds in the TranscriptomeTop-10 by expression. Enriched: A, G, synthesis, ab, TIM; Depleted: N, very long. Phenotypes clustering. Protein Abundance AnalysisComparison with transcriptome analysis PhenotypesClustering of phenotype profiles and growth conditions Analyzing Genomes & Transcriptomes with Structural and Functional Categories W Krebs T Johnson Y Kluger R Das P Harrison S Balasubramanian C Wilson V Alexandrov S Teichmann H Hegyi J Lin M Snyder P Ross-MacDonald A Kumar P Coelho D Greenbaum P Bertone A Drawid bioinfo.mbb.yale.edu/genome

  22. Gene Expression Datasets: the Transcriptome Yeast Expression Data in Academia: levels for all 6000 genes! X-ref. with other genome data: protein fold features common in Transcriptome.... Young/Lander, Chips, Abs. Exp. Also: SAGE; Samson and Church, Chips; Aebersold, Protein Expression Brown, marray, Rel. Exp. over Timecourse Snyder, Transposons, Protein Exp.

  23. cDNA μArrays From: Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nat Genet 1999 Jan;21(1 Suppl):10-4 Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet 1999 Jan;21(1 Suppl):33-7

  24. What is Happening on DNA Chips From: Southern E, Mir K, Shchepinov M. Molecular interactions on microarrays. Nat Genet 1999 Jan;21(1 Suppl):5-9

  25. DNA Chip Performance Characteristics From: Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet 1999 Jan;21(1 Suppl):20-4

  26. Serial Analysis of Gene Expression (SAGE) From: Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW. Characterization of the yeast transcriptome. Cell 1997 Jan 24;88(2):243-51

  27. GenomeComposition TranscriptomeComposition Composition of Genome vs. Transcriptome VGA  NS 

  28. Amino Acid Composition in the Genome and the Transcriptome

  29. Composition of Transcriptome in terms of Broad Structural Classes

  30. Which Protein Folds are Highly Expressed? Top-10 folds in genome and tran-scriptome

  31. Composition of Transcriptome in terms of Functional Classes Prot. Syn.  energy unclassified  transcription  transport  signaling 

  32. Broad Categories Const. in Transcriptome over Timecourse, Not Specific Genes (or Folds) Brown cDNA microarray expts. not as useful for X-ref. at individual timepts Nevertheless, they show same aa composition and fold class usage at different timepts. However, top fold changes and also specific TM proteins....

  33. Different Classes of Membrane Proteins Have Different Changes in Expression Level (esp. 12 TMs) Column gives the expression in aerobic conditions (high sugar, second time-series data point in DeRisi et al.), and other column, in anaerobic conditions (low sugar, high ethanol, last time-series data point in DeRisi et al.). 9 hexose permeases, 1 lactate transporter.

  34. Correlate Expression Level with Functional Category MIPS YFC: 66 bottom classes, 10 top classesAverage correlation of uncharacterized genes is 0.16Similar to Botstein analysis.

  35. Results from Analysis of Correlation of Functional Class and Expression Highest Correlations • Many groups of genes categorized by MIPS do not have higher correlation than random ORFs • Smaller groups tend to have a slightly higher correlation

  36. Going One Step Further: Protein Abundance Levels Gygi/Aebersold: Quantitation of Protein Expression Levels

  37. Comparing Protein and mRNA Abundance Levels

  38. Going One Step Further: Protein Abundance Levels 116 proteins from Aebersold data (small sample) By definition highly expressed Similar enrichment of amino acids in proteins and mRNA

  39. Analyzing Gene Expression with Transposon System (Micheal Snyder)

  40. Amino Acid Enrichment: lacZ Fusions Compared with mRNA Expression Levels

  41. YBR01w YAL009c YCL029c YBR102c YMR009c YER021w Affected by Another Condition Affected by Cold WT Whole Genome Phenotype Profiles Transposon insertions into (almost) each yeast gene to see how yeast is affected in 20 conditions. Generates a phenotype pattern vector, which can be treated similarly to expression data <-- Conditions --> M Snyder Clustering Conditions

  42. 20 Conditions 20 Conditions 28 ORFs in cluster 28 ORFs in cluster Phenotype ORF Clustering k-means clustering of ORFs based on “phenotype patterns,” cross-ref. to MIPs Functional Classes Cluster showing cold phenotype (containing genes most necessary in cold) is enriched in metabolic functions Metabolism Cold

  43. 1 Known Folds in GenomesShared and Common Folds. Fold Tree, Top-10. bab. Biases. Extent of MG fold assignment (65%). 2 Predicted Folds in GenomesTM’s Zipf law. Worm 7-TMs. Same 2º comp. but diff. a.a. comp. Biases: Extrapolation from known struct. to genomes? 3 Folds in the TranscriptomeTop-10 by expression. Enriched: A, G, synthesis, ab, TIM; Depleted: N, very long. Phenotypes clustering. Protein Abundance AnalysisComparison with transcriptome analysis PhenotypesClustering of phenotype profiles and growth conditions Analyzing Genomes & Transcriptomes with Structural and Functional Categories W Krebs T Johnson Y Kluger R Das P Harrison S Balasubramanian C Wilson V Alexandrov S Teichmann H Hegyi J Lin M Snyder P Ross-MacDonald A Kumar P Coelho D Greenbaum P Bertone A Drawid bioinfo.mbb.yale.edu/genome

More Related