420 likes | 563 Views
UniProt and Apoptosis. Sandra Orchard EMBL-EBI. What do Protein scientists require?. 1. A high quality protein sequence database
E N D
UniProt and Apoptosis Sandra Orchard EMBL-EBI Master headline
What do Protein scientists require? 1. A high quality protein sequence database A high quality, non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. It is not appropriate to use a nucleotide sequence database as a source of protein sequences. 2. Protein Identification Stable identifiers and consistent nomenclature 3. Protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source Master headline
UniProt What is UniProt? Based on the original work on PIR, Swiss-Prot and TrEMBL Funded mainly by NIH to be the highest quality, most thoroughly annotated protein sequence database Collaboration between EBI, SIB and PIR Master headline
UniRef 50 UniRef 90 IPI Proteome Sets UniRef 100 UniSave UniProtKB UniMes UniParc PDB Sub/ Peptide Data FlyBase WormBase Patent Data INSDC (incl. WGS, Env.) RefSeq Ensembl VEGA Database sources UniProt data sources and data flow Master headline
UniProtKB • UniProt Knowledgebase: • 2 sections • UniProtKB/Swiss-Prot Non-redundant, high-quality manual annotation - reviewed • UniProtKB/TrEMBL Redundant, automatically annotated - unreviewed www.uniprot.org Master headline
What does UniProtKB give you? • Curated protein sequences – correction of frameshifts, premature stop sites, incorrect initiator methionine…….. stable identifiers, with archiving and versioning • Identification of splice variants and/or alternative promoter usage - stable identifiers, with archiving and versioning • Identification of variants (at amino acid level) and of PTMs – where known, consequence is given - stable identifiers, with archiving and versioning Master headline
What does UniProtKB give you? 4.Consistent nomenclature – plus synonyms 5. Annotation of literature experimental data in 27 defined fields. Increasing use of controlled vocabularies, without loss of detail 6. Extensive cross-referencing, a central portal to a wealth of external resources – 85 external databases cross-referenced to UniProtKB Master headline
The New Website www.beta.uniprot.org Master headline
1. Sequence curation, stable identifiers, versioning and archiving Master headline
Sequence curation, stable identifiers, versioning and archiving • For example – erroneous gene model predictions, frameshifts • …. ..premature stop codons, read-throughs, erroneous initiator methionines….. Master headline
2. Identification of splice variants Master headline
3. Identification of variants (at amino acid level)…. … and also Master headline
…and of PTMs.. Master headline
.. And of Binding sites Master headline
4. Consistent nomenclature (& synonyms) Master headline
5. Annotation of literature experimental data in 27 defined fields. Controlled vocabularies used whenever possible… Master headline
Binary interactions taken from the IntAct database Master headline
Disease specific annotation added to human entries… … with supporting cross-referencing Master headline
UniProt Keywords UniProtKB entries are tagged with keywords that can be used to retrieve particular subsets of entries. 10 categories Biological process – Apoptosis, Cellular component Coding sequence diversity Developmental stageDiseaseDomain LigandMolecular function – Oncogene, Anti-oncogene. Post-translation modification Technical term The document keywlist.txt lists all the keywords and a definition of their usage in UniProtKB. Master headline
Source references included in entry Master headline
6. Extensive cross-referencing, a central portal to a wealth of external resources… Master headline
.. Additional annotation (Gene Ontology).. Master headline
InterPro – defines protein family membership and enables domain annotation Master headline
Annotation of entries in UniProtKB/Swiss-Prot Master headline
Annotation of human entries in UniProtKB/Swiss-Prot Master headline
UniProtKB/TrEMBL • Redundant – only 100% identical sequences merged • Automated clean-up of annotation from original nucleotide sequence entry • Additional value added by using automatic annotation Master headline
Automatic Annotation • Recognises common annotation belonging to a closely related family within UniProtKB/Swiss-Prot • Identifies all members of this family using pattern/motif/HMMs in InterPro • Transfers common annotation to related family members in TrEMBL Master headline
InterPro Master headline
INTERPRO 1) Extract conditions from InterPro 2) Group Swiss-Prot entries by conditions Swiss-Prot TrEMBL 4) Group TrEMBL by conditions and add common annotation to TrEMBL entries Automated annotation in TrEMBL 3) Extract common annotation Automatic Annotation Master headline
www.ebi.ac.uk/integr8 Complete Proteomes Master headline
Proteome set download Master headline
Non-redundant proteome sets Master headline Complete experimentally determined protein sets not yet available for higher organisms Require inclusion of predicted proteins to give full proteome International Protein Index (IPI) merges data from UniProt, Ensembl and Ref-Seq to produce non-redundant dataset
International Protein Index Master headline Non-redundant protein sets produced for human, mouse, rat, Arabidopsis, zebrafish, cow and chicken effectively maintains a database of cross references between the primary data sources provides minimally redundant yet maximally complete sets of proteins for featured species (one sequence per transcript) maintains stable identifiers (with incremental versioning) to allow the tracking of sequences in IPI
IPI Master headline
User Input • Feedback – if you find something wrong, outdated, missing… • Be thorough when writing your papers – make protein identification clear, use accession numbers etc. • Submit, Submit, Submit Master headline
With thanks to… The Sequence Database group – EBI UniProt collaborators – SIB, PIR InterPro consortium IntAct consortium GO consortium PRIDE HUPO-PSI Rolf Apweiler Master headline