430 likes | 621 Views
The RCSB Protein Data Bank Teaching an Old Dog New Tricks. Philip E. Bourne pbourne@ucsd.edu. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted. Biocurator Perspectives. A Tribute. Agenda. The old dog
E N D
The RCSB Protein Data BankTeaching an Old Dog New Tricks Philip E. Bourne pbourne@ucsd.edu
From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted Biocurator Perspectives A Tribute
Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?
History of the Old Dog 1970s • Community discussions about how to establish an archive of protein structures • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases 1990s • Ontology defined • Structural genomics begins • PDB moves to RCSB 2000s • wwPDB formed
History of the Old Dog 1970s • Community discussions about how to establish an archive of protein structures • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases 1990s • Ontology defined • Structural genomics begins • PDB moves to RCSB 2000s • wwPDB formed
Unchanging Core Mission • Create and maintain a well-curated database of macromolecular structure data derived using experimental methods that is… • Always accessible to a diverse user community worldwide • Developed in collaboration with that community that will… • Facilitate and support scientific research and education
Challenges-Scientific • More complex structures – molecular machines, complexes • New methods (e.g. EM) • Lack of a vocabulary to provide reductionism in complex structures • Partially solved problems in analyzing structures – structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners • Integrating microscopic and macroscopic views • Disease relationships
Growth and Complexity Number of released entries Year:
Primary References Derived References Some Actions Data Integration • Function Coverage • Target Selection Human Proteome & Homology Models CATH Domains/ Families • CATH Browser • SCOP Browser • PFAM Display • Source Organism Browser Structure SCOP PFAM Pubmed Enzyme Commission Source Organism SWISS-PROT/ GenBank IDs NCBI Taxonomy • Abstract Search • Enzyme Browser • Reactome Gene Ontology OMIM/ Disease Genomes (NCBI Gene) Structural Genomics Targets • Target Search • Disease Browser • Genome Browser • SNPs Mapped to Structure • Find Structures by SP ID • GO Browsers • Find Structures by GO ID NAR 2005, 33: D233-D237
Challenges - Technical • Sheer numbers • Efficient visualization • Improved annotation • Demands from a more diverse user base • Centralization versus decentralization • Web V2
Diverse User Community (180,000 individuals per month) and Diversifying Further • Structural biologists • Computational biologists • Experimental biologists • Educators • Students • Lay public
Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?
New Tricks – Protein Representation The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.
Limitations of a Cartesian Viewpoint • A local viewpoint – does not capture the global properties of the protein • Limited to a single scale descriptor • Limits comparative analysis New Tricks – Protein Representation
Alignment Violates the Triangle Inequality Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI. Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of proteins. New Tricks – Protein Representation
An Alternative Approach: Multipolar Representation • Roots in spherical harmonics • Parameter space and boundary conditions can be a variety of properties • Order of the multipoles defines the granularity of the descriptors • Bottom line – interpreted as shape descriptors Gramada & Bourne 2006 BMC Bioinformatics 7:242
Results – Protein Kinase Like Superfamily AlignmentScheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49 • Clear distinction between families. • Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level. New Tricks – Protein Representation
Possibilities – Structure Based Phylogenetic Analysis Scheeff & Bourne Multipoles New Tricks – Protein Representation
Structures exist in a spectrum from order to disorder New Tricks – Protein Motion Ordered Structures Disordered Structures
Obtaining Protein Dynamic InformationProtein Structures Treated as a 3-D Elastic Network Bahar, I., A.R. Atilgan, and B. Erman Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design, 1997. 2(3): p. 173-181. New Tricks – Protein Motion
Gaussian Network Model • Each Cais a node in the network. • Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å) • Decompose protein fluctuation into a summation of different modes. New Tricks – Protein Motion
Functional Flexibility Score • Utilize correlated movements to help define regional flexibility with functional importance. Functionally Flexible Score For each residue: Find Maximum and Minimum Correlation. Use to scale normalized fluctuation to determine functional importance. Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Identifying FFRs in HIV Protease Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Other Examples BPTI and Calmodulin Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Side Note: Gaussian Network Model vs Molecular Dynamics • GNM relatively course grained • GNM fast to compute vs MD • Look over larger time scales • Suitable for high throughput New Tricks – Protein Motion
An Active Research Program Around the Resource is Good for the Resource
Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?
Single worldwide archive of macromolecular structural data • Ensures that the PDB remains a single & uniform archive publicly available to the worldwide community • 3 founding members: RCSB PDB, PDBj, MSD-EBI Virtual Communities - Internal
wwPDB Activities • Collaborative projects • Remediation • taxonomy, ligands, literature • Single data processing system Virtual Communities - Internal
Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External (modeling, other….) • What will the resource look like in 2-5 years?
Virtual Communities - External Consider the PDB a gathering point through which a virtual and real community interacts with each other around a common interest
Real Virtual Communities - External Traveling art exhibit for lay audiences NJ Science Olympiad Science Expo Virtual Website Tutorials/Feedback Molecule of the Month PDB-in-a-CAVE
Virtual Communities - Modelers • Recommendations of Workshop • PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules • A central, publicly available archive (or technical equivalent thereof) or portal should be established for models • It was unanimously agreed that methods for assessing model quality are essential Structure 2006 To be published
Agenda • The old dog • New tricks • Thinking differently about proteins • Virtual Communities • Internal (wwPDB) • External • What will the resource look like in 2-5 years?
What Will the Resource Look Like in the Next 2-5 Years? • Upwards of 75,000 structures • Consensus (and different) views at the micro and macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification… • Community annotation cf Wikipedia • Distributed subsets - External Reference Files (XML) • MyPDB • PDB-in-a-box • Specialized visualization tools (mbt.sdsc.edu)
Is a database really different than a biological journal? PloS Comp Biol 2005 1(3) e34 The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. Now assigning DOIs to structures 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed
Acknowledgements The RCSB PDB NIH, NSF, DOE Apostol Gramada Multipole Analysis Jenny Gu Protein Motions
Breaking the protein into parts changes the object of the comparison This is interpreted in many cases to imply that the rmsd measure is inadequate. The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do. A Protein is More than the Union of its Parts From Røgen & Fain (2003), PNAS 100:119-124 New Tricks – Protein Representation
Charge distribution (i.e. structure) Scalar potential + boundary conditions An Alternative Approach: Multipolar RepresentationRoots in Spherical Harmonics Spatial distribution of a scalar quantity • Parameterization Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation
An Alternative Approach: Multipolar Representation • “Out” Multipoles • For a given rank l, they form a 2l+1 dimensional vector under 3D rotations • Vector algebra applies => metric properties Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation
An Alternative Approach: Multipolar Representation • The multipoles can be interpreted as shape descriptors • In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail • The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation