1 / 13

Integration of PRO and UniProtKB

Integration of PRO and UniProtKB. PRO-PO-GO Meeting. Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO Framework. PRO terms are defined/annotated using other ontologies and resources via definition of relations or mappings when appropriate. Relationships Between PRO-GO-UniProtKB.

odell
Download Presentation

Integration of PRO and UniProtKB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of PRO and UniProtKB PRO-PO-GO Meeting Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D.

  2. PRO Framework • PRO terms are defined/annotated using other ontologies and resources via definition of relations or mappings when appropriate

  3. Relationships Between PRO-GO-UniProtKB ProComp-ProForm: has_part ProComp-GO: is_a ProForm-UniProtKB: xref • Accessioned, species-specific protein complexes in ProComp are described using protein entities in ProForm; and are cross-referenced to species-independent complex representations in GO • A gene product (PR:000025358) and its isoforms and modified forms (PR:000025355; PR:000025356) are represented in PRO as separate, uniquely accessioned entities; but are described in the same UniProtKB record (UniProtKB:Q9D6R2) The representation of protein complexes in the Protein Ontology (PRO) Bult CJ, Drabkin HJ, Evsikov A, Natale D, Arighi C, Roberts N, Ruttenberg A, D'Eustachio P, Smith B, Blake JA, Wu C. (2011) BMC Bioinformatics 12, 371 [PMID: 21929785]

  4. PRO ID Mapping • Mappings to various external databases • promapping.txt: tab-delimited, each line indicating the PRO ID, the database ID, and the type of mapping (is_a or exact) • promapping.obo: the same information as promapping.txt, but in OBO format • Mappings are of two types: • exact • The database object is an exact match to the PRO object • e.g., PR:000026497 describes an isoform of 6-phosphofructokinase type C in human only, which corresponds to UniProtKB:Q01813-1 • is_a • The database object is more specific than the PRO object • e.g., PR:000026465 describes an (organism-nonspecific) isoform of 6-phosphofructokinase type C, so UniProtKB:Q01813-1 (human) and UniProtKB:Q9WUA3-1 (mouse) are mapped to this term

  5. PRO Reasoning with ID Mapping pro.obo: PRO version with no implied links pro_reasoned.obo: implied link automatically realized via is_a bri1/iso1/phos5 (PR:000035786) has two parents: explicit one in formal definition (PR:000035785) implicit one only shown in the reasoned version (PR:000028355) [Term] id: PR:000035786 name: protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 (Arabidopsis thaliana) def: "A protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 in Arabidopsis thaliana. UniProtKB:O22476-1, Thr-872, MOD:00047|Ser-858, MOD:00046|Ser-891, MOD:00046." [PMID:22184234, PRO:LVM] comment: Category=organism-modification. Flag=automatic. synonym: "Athal-BRI1/iso:1/Phos:5" EXACT PRO-short-label [PRO:DNx] synonym: "At protein brassinosteroid insensitive 1 isoform 1 phosphorylated 4" RELATED [] is_a: PR:000028355 ! implied link automatically realized ! protein brassinosteroid insensitive 1 isoform 1 (Arabidopsis thaliana) is_a: PR:000035785 ! implied link automatically realized ! protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 intersection_of: PR:000035785 ! protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 intersection_of: only_in_taxon NCBITaxon:3702 ! Arabidopsis thaliana PR:000028355 PR:000035785

  6. Ontological Representation of UniProtKB in PRO • PRO provides the ontological presentation for UniProtKB • Integration of UniProt records/subrecords into the PRO ontological framework • Use UniProtKB protein records (labeled by accession numbers, isoform IDs, and potentially other stable identifiers within UniProtKB records) to represent organism-gene level and sequence level (and potentially modification-level) terms of PRO • Organism-Gene: canonical protein record • Organism-Sequence: isoform subrecord • Organism-Modification: chain/variant subrecord

  7. Organism-Gene/Sequence

  8. OntologizingUniProtKB • Full-scale implementation of 12 reference genomes (others as needed) • Organism-Gene: canonical protein record – UniProtKB:xxxxxx • Organism-Sequence: isoform subrecord – UniProtKB:xxxxxx-1 • Persistent URL: http://purl.obolibrary.org/obo/PR_xxxxxxxxx • UniProtKB URL in the ontological space, proposed as: • PR:xxxxxx (UniProtKB at organism-gene level) • PR:xxxxxx-1 (UniProtKB at organism-sequence level) • To consider • Organism-Modification: chain – UniProtKB:PRO_xxxxxxxxx • Organism-Modification: variant – UniProtKB:VAR_xxxxxx • Integration/coordination between ProComp and IntAct for ontological representation of protein complexes

  9. UniProtKB in PRO Ontological Framework: Rich Relations • Orthologous-Gene • Ortho-Isoform • Ortho-PTM • Organism-PTM • Ortho-Complex • Organism-Complex

  10. Issues • Stable identifiers • UniProtKBwould provide stable identifiers • ID mapping service • Need for sequence merging and isoform curation: when exist Swiss-Prot(SP) entry for a given gene and corresponding unmerged TrEMBL (Tr) entries that may represent a new isoform, a new variant, or a duplicate. • Unmerged Trentries corresponding to additional isoforms with a sequence different than any mentioned in the SP entry • organism-gene (SP): Q96F24 • organism-sequence (SP): Q96F24-1, Q96F24-2 • organism-sequence (Tr): B4DWS0 • Organism-gene only represented in unreviewed (Tr) section: where one or multipleTrentries exist for a given gene • One entry • organism-gene accession (Tr) = Q8VGZ9 • organism-sequence accession (Tr; implied) = Q8VGZ9-1 • Multiple entries • organism-gene accession ***???*** • organism-sequence accession = B9E100, Q6W3E0

  11. Integrating PRO curation into UniProtKB • Isoforms curated by PRO curators will continue to be integrated into UniProtKBas a priority • PRO isoform curation (mostly done at MGI) is based on experimental information from literature, and covers information such as UniProtKB AC, GO annotation, and comments on evidence on isoform and expression • PIR curators integrate new isoforms and associated annotations to SP entry • Submission of annotation for a new SP entry • PIR curators create new reviewed SP entries when annotating protein isoforms and PTM forms with no reference SP entry • Example: BUB3_XENLA • Other areas of PRO annotations, particularly on PTMs and complexes, could be integrated as appropriate • Reciprocal links from UniProtKB to PRO

  12. New isoform curation in PRO & UniProt • PRO literature-based annotation of isoforms 4 and 5 of a mouse protein • UniProtcuration: • Merged 3 TrEMBL entries to existing UniProtKB record (Q8BIF2) • Added Isoform specific subcellular localization information • Updated information about function and added new information CC   -!- SUBCELLULAR LOCATION: Nucleus. Cytoplasm.CC   -!- SUBCELLULAR LOCATION: Isoform 1: Nucleus.CC   -!- SUBCELLULAR LOCATION: Isoform 4: Cytoplasm.CC   -!- SUBCELLULAR LOCATION: Isoform 5: Nucleus. CC   -!- TISSUE SPECIFICITY: Widely expressed in brain, regions including …CC   -!- DEVELOPMENTAL STAGE: In the neural tube, expressed as early asCC       embryonic day 9.5 (E9.5) and expression is confined to the nervous …CC   -!- INDUCTION: By retinoic acid. Expression is up-regulated in P19CC       cells during neural differentiation upon retinoic acid treatment …CC   -!- PTM: Phosphorylated (Probable).CC   -!- SIMILARITY: Contains 1 RRM (RNA recognition motif) domain.CC   -!- CAUTION: Initial characterization was derived from usage of aCC       monoclonal antibody (A60) directed to an unknown protein called ...

  13. Integrating PRO curation into UniProtKB • Reciprocal links from UniProtKB to PRO • UniProtKB cross-reference (DR) lines [e.g., DR GO; GO:0006954; P:inflammatory response; IEA:Compara] • DR line to include PRO identifier (PURL), PRO name, and short-label • Link to the PRO page(s) at the exact (organism-gene) level and possibly also other PTM forms (organism-modification) • Other areas of PRO annotations, particularly on PTMs and complexes, could be integrated as appropriate • Annotation of sequence features (such as PTMs not annotated in UniProtKB) and functional annotation that apply to those features • Barrier for direct annotation integration: curation depth needed for all aspects of annotatable information beyond PTMs • Possible Solution: link to information in PRO as additionally annotated data, similarly to UniProt approach to include additional bibliography

More Related