90 likes | 106 Views
And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway / Genome PGDBs. Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com. Protein Features in Pathway Tools.
E N D
And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway / Genome PGDBs Tomer Altman Bioinformatics Research Group SRI International taltman@ai.sri.com
Protein Features in Pathway Tools • Represents annotations along a polypeptide sequence • Can represent anything from active sites to secondary structure • Defined by a set of classes rooted at ‘|Protein-Features| • Are found in the ‘FEATURES slot of ‘|Proteins| instances
BioWarehouse UniProt Loader • Parses the XML versions of the SwissProt and TrEMBL databases • Loads the Feature table with the corresponding sequence annotation entries • BioWarehouse is open-source software • Currently being extended to support alternate sequences and sequence annotation citations
Extensions to the Pathway Tools Schema • Rooted as a sub-class under ‘|Protein-Segments| • Mirrors protein features available from the UniProt controlled vocabulary • Makes distinctions between variants due to human activity, variants within an organism, and variants across a strain population
UniProt Feature Importer • PGDB proteins are mapped to entries in UniProt via UniProt Accession Numbers • If it does not already exist, the protein feature is imported from UniProt • Identity is based on the associated protein object, ‘|Protein-Feature| sub-class, and location along the protein. • If the previously-imported protein feature was deleted from UniProt, it is removed from the PGDB
Current Statistics for EcoCyc • 19032 total ‘|Protein-Features| instances (out of 75537 total frames in EcoCyc) • 2130 manually created instances • 16902 imported from UniProt • 5586 ‘|Transmembrane-Regions| • 1939 ‘|Metal-Binding-Sites| • 1647 ‘|Mutagenesis-Variants| • 1146 ‘|Conserved-Regions|
Current Work • Extending the UniProt Loader to import variant sequence information, and citations • Adding interface to UniProt Feature Importer from Pathway Tools • Creating databases on PublicHouse (publicly accessible BioWarehouse instance) to allow our users to import protein features into their own PGDBs
Alex Shearer Suzanne Paley Ingrid Keseler Valerie Wagner Acknowledgements EcoCyc.org