1 / 23

Bacterial physiology in the post-genome era

Functional Classification of PSI Proteins to Support High Throughput Biochemical Characterization: C lasses of R eciprocal S equence H omologs (CRSH). Samuel Handelman , Nelson Tong, Jon D. Luff, David P. Lee, André Lazar, Paul Smith, Prasanna Gogate, Rohan Mallelwar and John Hunt.

gent
Download Presentation

Bacterial physiology in the post-genome era

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Classification of PSI Proteins to Support High Throughput Biochemical Characterization:Classes of Reciprocal Sequence Homologs (CRSH) Samuel Handelman, Nelson Tong, Jon D. Luff, David P. Lee, André Lazar, Paul Smith, Prasanna Gogate, Rohan Mallelwar and John Hunt

  2. Bacterial physiology in the post-genome era • Exponential growth in sequence information. • Structural information is more difficult to obtain. Evolution is key to leveraging what we do know. • Direct functional information is scarcer still: evolution and comparative studies are even more critical. vs. genome images from BacMap (UAlberta) and VirtualLaboratory; protein structure images from NESG (Columbia/Rutgers).

  3. Even today, most proteins are of unknown biochemical function H. Sapiens E. coli 53%“hypothetical”“putative”“uncharacterized”or “unknown”(01/23/08) 54%Neither identicalnor similar to anyexperimentallyvalidatedprotein * “Known” “Known” ~4,200 proteins ~27,000 proteins *Genome Information Integration Project And H-Invitational 2 (2007) Nucleic Acids Research36:D793-799 • Closing this gap lays the groundwork for systems biology.

  4. CRSH Goal: Group Functionally Equivalent Homologs. • Homology clusters contain multiple distinct protein functions. CRSH Approach: • Identify sub- clusters such that all members have equivalent function (in bacteria only).

  5. Topic Overview • CRSH: what they are, why they’re useful • CRSH Web Interface, merits of mapping of TargetDB to protein functional groups • Using CRSH and Gene Neighborhood to predict stable tertiary interactions.

  6. Classes of Reciprocal Sequence Homologs(CRSHs) Predicted proteins from 474 fully sequenced bacterial genomes Main application: Gene neighborhood method. Calculate “co-localization” counts for all CRSH pairs (# of times their genes are within 15 kB on chromosomes of fully diverged organisms) Cluster based on BLAST scores; verify clusters on profile scores Split into sub-clusters when multiple members come from a single organism (likely paralogs); verify sub-clusters on profile scores } Merge sub-clusters into classes if more similar than expected after accounting for inter-organism distances; verify final classes on profile scores CRSHs  likely same function ~75,000

  7. Split into sub-clusters when multiple members come from a single organism Indicates a pair of reciprocal closest homologs in their respective organisms M. tuberculosis RV0859 E. coli PaaJ A. tumefaciens ATU0502 A. tumefaciens PcaF beta-ketoadipyl CoA thiolases acetyl-CoA acetyltransferases

  8. Courtesy Marco Punta Gene Neighborhood Preview Each Octagon represents a CRSH • Stronger neighborhood conservation => better function predictions. • Insight into function of unknown proteins. … O1 O2 O3 O4 ON O1 O3 Genome 1 O1 O3 Genome 2 O1 O3 Genome 3 “Co-localized” = within 15 kB

  9. A Fixed Homology Threshold Fails to Reliably Segregate Functionally Equivalent Proteins • Tremendous range in sequence conservation with more or less equivalent conservation of function.

  10. Based on sequence information, you can conclude that two proteins have the same structure, even if you don’t know the structure. We’re working towards an analogous scheme for protein function, but each functional group needs it’s own cutoff. We propose to do this especially for proteins whose function we do not yet know. Like Rost clusters, but for function Graph Courtesy Burkhard Rost

  11. We have developed a web interface for these CRSH, which is meant for use by experimentalists. • Presently hosted in India (at http://61.8.141.68:8080/Columbia/), will be hosted at the NESG (at www.orthology.org), where CRSH pages will be available for each entry in targetDB. • The CRSH Pages that follow have been mapped to targetDB, so that biologists working in the centers can access them directly.

  12. Within 2 mos. we hope for a direct link from the PSI TargetDB gateway to the CRSHs. • CRSHs already have links to biocyc, a leading bacterial physiology database; links coming to other functional genomics databases. • A consensus domain architecture schematic will appear shortly.

  13. The applet on the left provides a graphical display of the phylogenetic distribution. In the near future, we’ll add the info from targetDB to this applet and to the table below. • Known complexes in biocyc are targets for structural genomics efforts to solve multi-protein structures. • The genetically co-localizing CRSH are promising secondary targets, as I will explain…

  14. Gene Neighborhood Hypothesis Generation With suggested applications in structural genomics and functional genomics OR Rational ideas have consequences for action; reason necessarily has a constructive function.

  15. For every pair of CRSH for which complex-membership data is available in biocyc, we count the instances where the two CRSH appear in a putative operon together. These counts correlate strongly with well-established, well-studied, stable and definitive physical complexes (drawn in this case from biocyc). These Probabilities are overestimated due to the methods used. Known Stable Complexes Strongly Correlate with Gene Neighborhood

  16. For each CRSH, we extract from biocyc a set of known small molecule interaction partners (ligands, substrates, products, etc.) We excluded very common partners (water, phosphate, ATP, etc.) Because proteins together in operons are often part of the same metabolic pathways or respond to similar chemical signals, it is reasonable to extrapolate small molecule interactions to the conserved gene neighbors. There is a definite correlation. This graph is preliminary – it is likely an underestimate. Gene Neighborhood has some Correlation with Small Molecule Interaction Partners A

  17. This view, which is still in beta, gives the known small-molecule interactions of all of the gene neighbors for a given CRSH, weighted to reflect the strength of gene neighborhood conservation. • As well as providing a starting point for interaction screening, this can make the functional insights provided by the gene neighborhood method more accessible.

  18. Salvage Pipeline • For structural genomics targets which have been cloned and are soluble, but which have failed to crystallize, we introduce a parallel pipeline to salvage them by adding “known” or predicted protein or small molecule binding partners. • Bonus biology: whole greater than sum of parts. Crystallizewithout Partner Crystallize with Partner

  19. Concluding Remarks • We are eager to add links to PSI resources to our CRSH pages – they are intended to facilitate collaboration between structural and functional genomics, in particular. • Functional information can improve the impact of structural genomics efforts, and may provide new salvage pathways for difficult targets.

  20. Thank you John “The Jersey Eliminator” Hunt Paul “Schmitty” Smith Greg “Cassis” Boel Sai “Full Nelson” Tong Marco “The Shark” Punta Burkhard “Wrecking Ball” Rost Prasanna “Crackerjack” Gogate Rohan “The Punisher” Mallelwar Jon “JD” Luff Liang “Red, White and Thunder” Tong Howard “Hurricane” Shuman Dana “Steel Toe” Pe’er Harmen “H-Bomb” Bussemacher Larry “The Tank” Chasin Dre “Enter the Dragon” Lazar David “Intravenous” Lee Girish “Bone Breaker” Rao Stephanie “Bronx” Wong Diana “1-2-3” Flynn George “El Pato Loco” Oldan Allison “Grid Iron” Fay Jordi “El Chupacabra” Banach John “Steel” Dworkin Etay “Aces” Ziv Chris “Fireball” Wiggins Gerwald “Sunshine” Jogl Cal “Howitzer” Lobel Yongzhao “Downtown” Shao David “Finger of Death” Draper Gae “Knuckles” Monteleone Mike “The Red Baron” Baran John “Mountain Man” Everett The Hunt Lab, The NESG American Heart Association, CF Foundation, NSF.

  21. Consistency in CRSH sequence divergence levels between remote phyla EACH DOT IS A CRSH

  22. Deviation from Evolutionary Consensus in Protein Complexes

  23. Consistency in CRSH sequence divergence levels between remote phyla EACH DOT IS A CRSH

More Related