1 / 35

Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center

PIR: a comprehensive resource for functional analysis of protein sequences and families . Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC. PIR Web Site. NEW web site, soon to become public http://pir.georgetown.edu

ianna
Download Presentation

Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIR: a comprehensive resource for functional analysis of protein sequences and families  Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC

  2. PIR Web Site NEW web site, soon to become publichttp://pir.georgetown.edu currently an old version PIR and UniProt web sites interlinked and cross-navigable PIR-specific features • Text Search • Sequence Search • Classification Database Search

  3. iProClass Protein Knowledgebase http://pir.georgetown.edu/iproclass • Integration of protein family, function, structure • Rich links (executive summary + hypertext links) to > 90 databases • Value-added reports for 1.96 Million UniProtKB protein entries i i

  4. Example • Want to find info on chorismate mutases, • Specifically: • Start with Bacillus subtilis P19080 = CHMU_BACSU • Relatedness to other chorismate mutases • - Homology • Domain architecture • Is it related to E.coli P07022 (a well-studied bifunctional enzyme (P-protein), chorismate mutase/prephenate dehydratase)

  5. iProClass Sequence Report

  6. Protein Analysis: I. Text Search iProClass What can we find about “chorismate mutase”

  7. UniProt ID Text SearchResults (I)

  8. Text SearchResults (II) Display options: add or remove columns

  9. Text Search Results (III) Find chorismate mutase(s) from B. subtilis

  10. Determining Protein Homology Is B. subtilis CMP19080homologous to E.coli P-proteinP07022? to B. subtilis AroA(G) P39912? Which domains, if any, in multidomain chorismate mutases it corresponds to? What kinds of domain architecture exist in chorismate mutases?

  11. Batch Retrieval ID mapping option: can use various non-UniProt IDs Retrieve Proteins by UID in Batch Mode

  12. Determining Protein Homology:Sequence Search BLAST FASTA SSearch

  13. Blast Search Results BLAST query UniProt sequence P19080 hits PIRSF005965 family members as best hits

  14. Pre-compiled Related Sequences: saves time

  15. BLAST/SSEARCH Results SSEARCHAlignment BLAST Alignment

  16. Determining Protein Homology: Peptide Search

  17. Peptide Search Results

  18. Family Classification System:One-Stop Platform for Protein Analysis • Protein families reflect evolutionary relationships • Function often follows along the family lines • Therefore, matching a protein sequence a protein family provides information about a protein (need a highly curated and annotated family) • Faster and often more accurate than searching against a protein database • Protein classification facilitates sequence and functional analysis of proteins and is used for accurate automatic annotation (PIRSF is used for UniProt annotation)

  19. PIRSF Classification System • PIRSF: reflects evolutionary relationships of full-length proteins • Definitions: • Basic unit = Homeomorphic Family • Homologous: Inferred by sequence similarity • Homeomorphic: Full-length sequence similarity and common domain architecture • Hierarchy: Flexible number of levels with varying degrees of sequence conservation; Network Structure: multiple domain parents • Advantages: • Annotation of both generic biochemical and specific biological functions • Accurate propagation of annotation and development of standardized protein nomenclature and ontology

  20. PIRSF Classification System A protein may be assigned to only one homeomorphic family, which may have zero or more child nodes and zero or more parent nodes. Each homeomorphic family may have as many domain superfamily parents as its members have domains.

  21. Unclassified UniProtKB proteins New Proteins Unassigned Proteins Automatic Procedure Automatic Clustering 1 3 Uncurated Homeomorphic Clusters 2 Orphans Map Domains on Clusters Automatic Placement Computer-assisted Manual Curation Merge/Split Clusters Add/Remove Members 4 Preliminary Homeomorphic Families 5 Hierarchies (Superfamilies/Subfamilies) Name, Refs, Abstract, Domain Arch. 6 Final Families, Subfamilies, Superfamilies 8 7 Build and Test HMMs Protein Name Rules/Site Rules

  22. Tool: Curator’s Decision Maker

  23. Classification Tool: BlastClust • Curator-guided clustering • Single-linkage clustering using BLAST • Retrieve all proteins sharing a common domain • Iterative BlastClust (fixed length coverage)

  24. Family Analysis of Homologous Proteins • Fully Curated Protein Family: • Especially important when the protein of interest is underannotated or misannotated (happens often!) • Evidence types: Characterized (validated), Predicted (by computational methods) or Uncharacterized • Preliminary or Uncurated Family • Have to do some analysis OR contact PIR and ask to prioritize this family • No Family Classification • Have to do some analysis OR contact PIR and ask to prioritize this family PIRSF - blank iProClass search

  25. Providing more information Underannotated Proteins Search iProClass with PIRSF005965

  26. PIRSF SCAN (sequence search) Returns only matches to fully curated PIRSFs UniProt sequence Q8Y5X7 is automatically classified as chorismate mutase of the AroH class PIRSF005965

  27. Taxonomic distribution of PIRSF can be used to infer evolutionary history of the proteins in the PIRSF Phylogenetic tree and alignment view allows further sequence analysis PIRSF Family Report:Curated Protein Family Information

  28. PIRSF Family Report (II) Integrated value added information from other databases Mapping to other protein classification databases

  29. Chorismate Mutase Results from iProClass Analysis • CM from B.subtilis P19080 does not bring B.subtilis AroA(G) or E. coli P-protein (or related proteins) in BLAST search • Contains a different PFAM domain • Identical conserved motifs are not found • NOT homologous • PIRSF reports: abstracts contain most of this info • PIRSF domain architecture (curated or uncurated): Pfam and newly defined domains • Structure information (PDB links) • Hierarchy in DAG (under development) Use PIRSF family database for the same analysis:

  30. New domain PIRSF Text Search AroA(G)

  31. AroQ AroH Chorismate Mutase • Convergent Evolution – EC 5.4.99.5 (Non-Orthologous Gene Displacement) • Two Distinct Sequence/Structure Types • AroQ Class: SCOP (all a), core: 6 helices, bundle • AroH Class: SCOP (a+b), core: beta-alpha-beta-alpha-beta(2) • Two Pfam Domains: PF01817, PF07736 (New PFAM domain)

  32. Developing DAG Viewer • Network structure (in DAG) for PIRSF family classification system reflects PIRSF family hierarchy which is based on evolutionary relationships Subfamily Before: all chorismate mutase proteins and families hit PF01817 including PIRSF005965 (not homologous to the rest)

  33. DAG Viewer (II) After: PFAM created a new domain PF07736 which is found in PIRSF005965 members “Orphans”: no family classification

  34. PIR Team • Dr. Cathy Wu, Director • Protein Classification team Dr. Winona Barker Dr. Lai-Su Yeh Dr. Anastasia Nikolskaya Dr. Darren Natale Dr. Zhang-Zhi Hu Dr. Raja Mazumder Dr. CR Vinayaka Dr. Sona Vasudevan Dr. Cecilia Arighi • Informatics team Dr. Hongzhan Huang Dr. Peter McGarvey Baris Suzek, M.S. Sehee Chung, M.S. Dr. Leslie Arminski Dr. Hsing-Kuo Hua Yongxing Chen, M.S. Jian Zhang, M.S. Dr. Xin Yuan • Students Christina Fang Vincent Hermoso Natalia Petrova UniProt is supported by the National Institutes of Health, grant # 1 U01 HG02712-01

More Related