1 / 43

The RCSB Protein Data Bank Teaching an Old Dog New Tricks

The RCSB Protein Data Bank Teaching an Old Dog New Tricks. Philip E. Bourne pbourne@ucsd.edu. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted. Biocurator Perspectives. A Tribute. Agenda. The old dog

edana
Download Presentation

The RCSB Protein Data Bank Teaching an Old Dog New Tricks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The RCSB Protein Data BankTeaching an Old Dog New Tricks Philip E. Bourne pbourne@ucsd.edu

  2. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted Biocurator Perspectives A Tribute

  3. Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?

  4. History of the Old Dog 1970s • Community discussions about how to establish an archive of protein structures • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases 1990s • Ontology defined • Structural genomics begins • PDB moves to RCSB 2000s • wwPDB formed

  5. History of the Old Dog 1970s • Community discussions about how to establish an archive of protein structures • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases 1990s • Ontology defined • Structural genomics begins • PDB moves to RCSB 2000s • wwPDB formed

  6. Unchanging Core Mission • Create and maintain a well-curated database of macromolecular structure data derived using experimental methods that is… • Always accessible to a diverse user community worldwide • Developed in collaboration with that community that will… • Facilitate and support scientific research and education

  7. Challenges-Scientific • More complex structures – molecular machines, complexes • New methods (e.g. EM) • Lack of a vocabulary to provide reductionism in complex structures • Partially solved problems in analyzing structures – structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners • Integrating microscopic and macroscopic views • Disease relationships

  8. Growth and Complexity Number of released entries Year:

  9. Primary References Derived References Some Actions Data Integration • Function Coverage • Target Selection Human Proteome & Homology Models CATH Domains/ Families • CATH Browser • SCOP Browser • PFAM Display • Source Organism Browser Structure SCOP PFAM Pubmed Enzyme Commission Source Organism SWISS-PROT/ GenBank IDs NCBI Taxonomy • Abstract Search • Enzyme Browser • Reactome Gene Ontology OMIM/ Disease Genomes (NCBI Gene) Structural Genomics Targets • Target Search • Disease Browser • Genome Browser • SNPs Mapped to Structure • Find Structures by SP ID • GO Browsers • Find Structures by GO ID NAR 2005, 33: D233-D237

  10. Challenges - Technical • Sheer numbers • Efficient visualization • Improved annotation • Demands from a more diverse user base • Centralization versus decentralization • Web V2

  11. Diverse User Community (180,000 individuals per month) and Diversifying Further • Structural biologists • Computational biologists • Experimental biologists • Educators • Students • Lay public

  12. Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?

  13. New Tricks – Protein Representation The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.

  14. Limitations of a Cartesian Viewpoint • A local viewpoint – does not capture the global properties of the protein • Limited to a single scale descriptor • Limits comparative analysis New Tricks – Protein Representation

  15. Protein Kinase A – Open Book View

  16. Superfamily Members – The Same But Different

  17. Alignment Violates the Triangle Inequality Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI. Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of proteins. New Tricks – Protein Representation

  18. An Alternative Approach: Multipolar Representation • Roots in spherical harmonics • Parameter space and boundary conditions can be a variety of properties • Order of the multipoles defines the granularity of the descriptors • Bottom line – interpreted as shape descriptors Gramada & Bourne 2006 BMC Bioinformatics 7:242

  19. Results – Protein Kinase Like Superfamily AlignmentScheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49 • Clear distinction between families. • Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level. New Tricks – Protein Representation

  20. Possibilities – Structure Based Phylogenetic Analysis Scheeff & Bourne Multipoles New Tricks – Protein Representation

  21. Structures exist in a spectrum from order to disorder New Tricks – Protein Motion Ordered Structures Disordered Structures

  22. Obtaining Protein Dynamic InformationProtein Structures Treated as a 3-D Elastic Network Bahar, I., A.R. Atilgan, and B. Erman Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design, 1997. 2(3): p. 173-181. New Tricks – Protein Motion

  23. Gaussian Network Model • Each Cais a node in the network. • Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å) • Decompose protein fluctuation into a summation of different modes. New Tricks – Protein Motion

  24. Functional Flexibility Score • Utilize correlated movements to help define regional flexibility with functional importance. Functionally Flexible Score For each residue: Find Maximum and Minimum Correlation. Use to scale normalized fluctuation to determine functional importance. Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

  25. Identifying FFRs in HIV Protease Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

  26. Other Examples BPTI and Calmodulin Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

  27. Side Note: Gaussian Network Model vs Molecular Dynamics • GNM relatively course grained • GNM fast to compute vs MD • Look over larger time scales • Suitable for high throughput New Tricks – Protein Motion

  28. An Active Research Program Around the Resource is Good for the Resource

  29. Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?

  30. Single worldwide archive of macromolecular structural data • Ensures that the PDB remains a single & uniform archive publicly available to the worldwide community • 3 founding members: RCSB PDB, PDBj, MSD-EBI Virtual Communities - Internal

  31. wwPDB Activities • Collaborative projects • Remediation • taxonomy, ligands, literature • Single data processing system Virtual Communities - Internal

  32. Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External (modeling, other….) • What will the resource look like in 2-5 years?

  33. Virtual Communities - External Consider the PDB a gathering point through which a virtual and real community interacts with each other around a common interest

  34. Real Virtual Communities - External Traveling art exhibit for lay audiences NJ Science Olympiad Science Expo Virtual Website Tutorials/Feedback Molecule of the Month PDB-in-a-CAVE

  35. Virtual Communities - Modelers • Recommendations of Workshop • PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules • A central, publicly available archive (or technical equivalent thereof) or portal should be established for models • It was unanimously agreed that methods for assessing model quality are essential Structure 2006 To be published

  36. Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?

  37. What Will the Resource Look Like in the Next 2-5 Years? • Upwards of 75,000 structures • Consensus (and different) views at the micro and macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification… • Community annotation cf Wikipedia • Distributed subsets - External Reference Files (XML) • MyPDB • PDB-in-a-box • Specialized visualization tools (mbt.sdsc.edu)

  38. Is a database really different than a biological journal? PloS Comp Biol 2005 1(3) e34 The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. Now assigning DOIs to structures 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed

  39. Acknowledgements The RCSB PDB NIH, NSF, DOE Apostol Gramada Multipole Analysis Jenny Gu Protein Motions

  40. Breaking the protein into parts changes the object of the comparison This is interpreted in many cases to imply that the rmsd measure is inadequate. The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do. A Protein is More than the Union of its Parts From Røgen & Fain (2003), PNAS 100:119-124 New Tricks – Protein Representation

  41. Charge distribution (i.e. structure) Scalar potential + boundary conditions An Alternative Approach: Multipolar RepresentationRoots in Spherical Harmonics Spatial distribution of a scalar quantity • Parameterization Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

  42. An Alternative Approach: Multipolar Representation • “Out” Multipoles • For a given rank l, they form a 2l+1 dimensional vector under 3D rotations • Vector algebra applies => metric properties Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

  43. An Alternative Approach: Multipolar Representation • The multipoles can be interpreted as shape descriptors • In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail • The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

More Related