1 / 78

Proteomics Resources at the EBI

Proteomics Resources at the EBI. Sandra Orchard EMBL-EBI. What do Protein scientists require?. 1. Protein Identification

Download Presentation

Proteomics Resources at the EBI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteomics Resources at the EBI Sandra Orchard EMBL-EBI

  2. What do Protein scientists require? 1. Protein Identification A high quality, non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs to act as a reference set. Stable identifiers and sequence archiving essential 2. Protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source 3. Reference data sets Comparative datasets to compare tissue specificity patterns, normal/disease protein sets

  3. Where do we go from here? Sequence similarity programs run against UniProt What is UniProt? Based on the original work on PIR, Swiss-Prot and TrEMBL Funded mainly by NIH Collaboration between EBI, SIB and PIR

  4. UniProt Consortium

  5. UniRef 50 UniRef 90 IPI Proteome Sets UniRef 100 UniSave UniProtKB UniMes UniParc PDB Sub/ Peptide Data FlyBase WormBase Patent Data INSDC (incl. WGS, Env.) RefSeq Ensembl VEGA Database sources UniProt data sources and data flow

  6. UniProtKB • UniProt Knowledgebase: • Aims to describe in a single record all protein products derived from a certain gene from a certain species • 2 sections • UniProtKB/Swiss-Prot Non-redundant, high-quality manual annotation - reviewed • UniProtKB/TrEMBL Redundant, automatically annotated - unreviewed www.uniprot.org

  7. What does UniProtKB give you? • Curated protein sequences – correction of frameshifts, premature stop sites, incorrect initiator methionine…….. stable identifiers, with archiving and versioning • Consistent nomenclature – plus synonyms • Identification of splice variants and/or alternative promoter usage - stable identifiers, with archiving and versioning

  8. What does UniProtKB give you? 4.Identification of variants (at amino acid level) and of PTMs – where known, consequence is given - stable identifiers, with archiving and versioning 5. Annotation of literature experimental data in 27 defined fields. Increasing use of controlled vocabularies, without loss of detail

  9. What does UniProtKB give you? 6. Extensive cross-referencing, a central portal to a wealth of external resources - 81 external databases cross-referenced to UniProtKB

  10. Simple Text Search

  11. 1. Sequence curation, stable identifiers, versioning and archiving www.ebi.ac.uk/uniprot/unisave

  12. Sequence curation, stable identifiers, versioning and archiving • For example – erroneous gene model predictions…. …frameshifts ..premature stop codons, readthroughs, erroneous initiator methionines…..

  13. 2. Consistent nomenclature (& synonyms)

  14. 3. Identification of splice variants

  15. 4. Identification of variants (at amino acid level)…. …and of PTMs … and also

  16. Domain annotation Binding sites

  17. Splice variants Experimental mutations Sequence conflicts

  18. 5. Annotation of literature experimental data in 27 defined fields.

  19. Controlled vocabularies used whenever possible… ..but ability to further describe each specific situation retained

  20. Disease specific annotation added to human entries… … with supporting cross-referencing

  21. 6. Extensive cross-referencing, a central portal to a wealth of external resources… .. Additional annotation (Gene Ontology)..

  22. Reactome

  23. wwPDB

  24. InterPro – defines protein family membership and enables domain annotation

  25. UniProtKB/TrEMBL • Redundant – only 100% identical sequences merged • Automated clean-up of annotation from original nucleotide sequence entry • Additional value added by using automatic annotation

  26. Automatic Annotation • Recognises common annotation belonging to a closely related family within UniProtKB/Swiss-Prot • Identifies all members of this family using pattern/motif/HMMs in InterPro • Transfers common annotation to related family members in TrEMBL

  27. BLAST more sequences Conserved signatures Protein Sequence Characterisation Basic information Build up consensus sequences of families, domains, motifs or sites

  28. Simplest (limited) More information Finding Conserved Signatures • Pattern • Fingerprint • Sequence clustering • Profile • HMM

  29. Integration of signatures InterPro Foundations of InterPro Manual curation

  30. (100) 1) PROSITE IPR000001 (100) PFAM (100) IPR000001 2) PROSITE (50) IPR000002 PFAM 3) (100) IPR000001 PROSITE IPR000001 (100) IPR000002 PFAM IPR000002 (100) PROSITE 4) (100) PFAM Integration Process Same positions Same protein hits Same positions Different protein hits Different positions Same protein hits Different positions

  31. (100) Protein kinase PFAM PFAM (75) Serine kinase SMART Protein kinase * (100) Protein kinase PFAM (25) PROSITE Tyrosine kinase SMART PROSITE Serine kinase Tyrosine kinase SMART PROSITE Children No proteins in common Signature Relationships 1) Parent - Child (subgroup of more closely related proteins) * Parent Applies to domains and families

  32. Receptor family PFAM N-terminal domain C-terminal domain SMART PROSITE Contains (Smart and Prosite) PFAM Receptor Family Found in (Pfam) SMART PROSITE N-terminal domain C-terminal domain Signature Relationships 2) Contains – Found in (Describes domain composition) Both families and domains can contain domains

  33. Specialisation of Databases

  34. PDB sequence InterPro sequence-structure comparison MSD Residue-by-residue mapping UniProt amino acid position Structural Representation in InterPro

  35. PDB structures displayed as striped patterns Structural classification in CATH and SCOP CATH SCOP and ModBase Homology models from Swiss-model Swiss-M ModB Structural Representation

  36. Signatures predictive of protein annotation Structural data for specific proteins Sequence-Structure Display

More Related