1 / 77

UniProt

Protein Sequence Database:. UniProt. Jennifer McDowall, Ph.D. Senior InterPro Curator. What do protein scientists require?. High quality protein sequence. Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs. Sequence archiving essential.

Download Presentation

UniProt

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequence Database: UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator

  2. What do protein scientists require? High quality protein sequence Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs. Sequence archiving essential Protein identification Stable identifiers and consistent nomenclature Protein annotation Detailed information: protein function, biological processes, molecular interactions, and pathways

  3. Sequence quality in UniProt Protein existence level Human Evidence at protein level 59% Evidence at transcript level 37.5% Inferred from homology 1% Predicted 0.5% Uncertain 2%

  4. UniProt Consortium

  5. 3 Components of UniProt UniProtKB Knowledgebase • Protein sequence repository • Swiss-Prot: non-redundant, manually annotated • TrEMBL: redundant, automatically annotated UniRef Reference Cluster • Combines sequences (speed searching) • UniRef100, UniRef90, UniRef50 UniParc Protein Archive • History of all sequences

  6. EMBL/GenBank/DDBJ, Ensembl, PDB, RefSeq, Patent data, Model organism databases

  7. UniProtKB translate sequence TrEMBL Swiss-Prot annotation EBI SIB PIR UniProtKB pipeline nucleotide sequencing CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG CGCTGTGATAGCGCTGATCGTGATGCGTATGCAGGTCGT EMBL

  8. Searching UniProt: Simple text search

  9. Searching UniProt Search tools include: • Text Search • Blast sequence search • Additional search engines through EBI (e.g. MPSearch and FASTA) http://www.uniprot.org/

  10. Text-based searching • Logical operators ‘&’ (and), ‘|’ Searching UniProt – Simple Search

  11. Searching UniProt – Simple Search

  12. Each linked to the UniProt entry Searching UniProt – Search Results

  13. Searching UniProt – Search Results

  14. Searching UniProt – Search Results

  15. EXERCISE 1

  16. Exploring a SwissProt entry: General information

  17. Splice variants Sequence Sequence features Ontologies Annotations References Nomenclature

  18. Taxonomy • Description of biological source Sequence variation • Identify conflicts & alternative splicing Modifications • Posttranslational, e.g. carbohydrates Annotate sequence • Map domains and sites onto sequence General annotation • Descriptive comments, e.g. function Structure • Describes both secondary and quaternary Disease association • Map sequence deficiencies causing disease Binary interactions • Linked to protein-protein interaction data Similarity Cross references • To protein families and domains • Extensive integration with other databases Bibliography • Cited references UniProt/Swiss-Prot Annotation Remove redundancy • Merge TrEMBL (1 gene product 1 entry)

  19. Collapse section Customise layout

  20. UniProtKB/Swiss-Prot Annotation

  21. Hold down cursor to drag-and-drop sections Customise layout

  22. Customise layout

  23. Entry Information Swiss-Prot removes redundancy

  24. Entry Information Sequence correction, versioning and archiving

  25. Able to compare versions directly Entry Information Sequence correction, versioning and archiving Merged A8K2S6 with Q00987

  26. Entry Information Sequence correction, versioning and archiving

  27. Entry Information Sequence correction, versioning and archiving For example: erroneous gene model predictions, frameshifts, read-throughs, premature stop codons, erroneous initiator Met…

  28. Names and Origin Some literature search engines pull synonyms from UniProt

  29. EXERCISE 2

  30. Exploring a SwissProt entry: Sequence annotation

  31. Sequence

  32. Sequence

  33. Sequence variation - conflicts

  34. Sequence variation – splicing

  35. Sequence variation – splicing

  36. Sequence variation – splicing

  37. Annotate Sequence

  38. EXERCISE 3

  39. Exploring a SwissProt entry: Structural annotation

  40. Structure - secondary

  41. Structure - secondary

  42. Structure - tertiary

  43. Structure - tertiary

  44. Provides information on ordered and disordered regions of protein Structure - tertiary

  45. Structure - tertiary

  46. Structure - quaternary

  47. EXERCISE 4

  48. Exploring a SwissProt entry: General annotation

  49. References provides Controlled vocabularies used where possible General Annotation Literature-derived annotation

  50. General Annotation Additional annotation from Gene Ontology

More Related