780 likes | 1k Views
Protein Sequence Database:. UniProt. Jennifer McDowall, Ph.D. Senior InterPro Curator. What do protein scientists require?. High quality protein sequence. Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs. Sequence archiving essential.
E N D
Protein Sequence Database: UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator
What do protein scientists require? High quality protein sequence Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs. Sequence archiving essential Protein identification Stable identifiers and consistent nomenclature Protein annotation Detailed information: protein function, biological processes, molecular interactions, and pathways
Sequence quality in UniProt Protein existence level Human Evidence at protein level 59% Evidence at transcript level 37.5% Inferred from homology 1% Predicted 0.5% Uncertain 2%
3 Components of UniProt UniProtKB Knowledgebase • Protein sequence repository • Swiss-Prot: non-redundant, manually annotated • TrEMBL: redundant, automatically annotated UniRef Reference Cluster • Combines sequences (speed searching) • UniRef100, UniRef90, UniRef50 UniParc Protein Archive • History of all sequences
EMBL/GenBank/DDBJ, Ensembl, PDB, RefSeq, Patent data, Model organism databases
UniProtKB translate sequence TrEMBL Swiss-Prot annotation EBI SIB PIR UniProtKB pipeline nucleotide sequencing CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG CGCTGTGATAGCGCTGATCGTGATGCGTATGCAGGTCGT EMBL
Searching UniProt: Simple text search
Searching UniProt Search tools include: • Text Search • Blast sequence search • Additional search engines through EBI (e.g. MPSearch and FASTA) http://www.uniprot.org/
Text-based searching • Logical operators ‘&’ (and), ‘|’ Searching UniProt – Simple Search
Each linked to the UniProt entry Searching UniProt – Search Results
Exploring a SwissProt entry: General information
Splice variants Sequence Sequence features Ontologies Annotations References Nomenclature
Taxonomy • Description of biological source Sequence variation • Identify conflicts & alternative splicing Modifications • Posttranslational, e.g. carbohydrates Annotate sequence • Map domains and sites onto sequence General annotation • Descriptive comments, e.g. function Structure • Describes both secondary and quaternary Disease association • Map sequence deficiencies causing disease Binary interactions • Linked to protein-protein interaction data Similarity Cross references • To protein families and domains • Extensive integration with other databases Bibliography • Cited references UniProt/Swiss-Prot Annotation Remove redundancy • Merge TrEMBL (1 gene product 1 entry)
Collapse section Customise layout
Hold down cursor to drag-and-drop sections Customise layout
Entry Information Swiss-Prot removes redundancy
Entry Information Sequence correction, versioning and archiving
Able to compare versions directly Entry Information Sequence correction, versioning and archiving Merged A8K2S6 with Q00987
Entry Information Sequence correction, versioning and archiving
Entry Information Sequence correction, versioning and archiving For example: erroneous gene model predictions, frameshifts, read-throughs, premature stop codons, erroneous initiator Met…
Names and Origin Some literature search engines pull synonyms from UniProt
Exploring a SwissProt entry: Sequence annotation
Exploring a SwissProt entry: Structural annotation
Provides information on ordered and disordered regions of protein Structure - tertiary
Exploring a SwissProt entry: General annotation
References provides Controlled vocabularies used where possible General Annotation Literature-derived annotation
General Annotation Additional annotation from Gene Ontology