1 / 23

Ontology Driven Data Collection for EuPathDB

Ontology Driven Data Collection for EuPathDB. Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania. Issues associated with Data Collection. Heterogeneity of free text Difficulty in data integration, requires human intervention

Download Presentation

Ontology Driven Data Collection for EuPathDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania

  2. Issues associated with Data Collection Heterogeneity of free text Difficulty in data integration, requires human intervention Complex queries are limited

  3. Examples: GenBank

  4. Data Collection for EuPathDB • Apply ontology to data submission form design • Form to collect sequence data and information on isolates of pathogens • Geographic location from where isolate specimen collected • Host organism information: species, age, clinical information • Genetic manipulation with resulting phenotype data collection form • Mutation method • Effects of genetic modification on the parasite and on the location, function, and involvement in biological process of the resultant modified protein These data are important for parasite epidemiology and research on vaccines and anti-parasitic drugs • Enable Queries • Compare sequence data from Plasmodium isolates that are restricted to East Africa to those from West Africa and are controlled for age and health of hosts • List genes that when knocked out result in a defect in parasite growth during the erythrocytic cycle • List genes fused to green fluorescent protein (GFP) that when expressed are located in the cell membrane

  5. EupathDB EupathDB (Eukaryotic Pathogen Database Resources ) is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, et al.Nucleic Acids Res. 2010

  6. Isolate Data • Need to import and integrate datasets from GenBank • But GenBank did not specify needed metadata for isolates • Manual curation required • Harmonize: enable host queries: Human-> Homo sapiens • Deconvolute descriptions in free text: isolated from storm waters • isolated from Homo sapiens patient infected with HIV

  7. Isolate Data: GenBank ->EuPathDB

  8. Isolate Submission Form • Target isolate information • Geographic location • Source organism samples information • or Environmental samples information • Sequence information

  9. Ontology-based Representation of Isolate Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box

  10. Isolate Submission Form

  11. Ontology Selection

  12. Excel Format • Generally already collected in this format according to our community advisors • Lowers the barrier for usage • Easily converted to GenBank submission-ready format automatically • Allows multiple sequence submission

  13. Parser for GenBank Submission

  14. Genetic Manipulation and Phenotype Data • Integrate phenotype data from • other resources (GeneDB) • Allow individuals to submit • phenotype data via the EuPathDB • web site via User Comments • on Gene pages • Either way these are free text • descriptions limiting utility for • data exploration T. bruceiRNAi knockdowns

  15. Genetic Manipulation and Phenotype Submission Form • Genetic Manipulation • Mutation method including selective marker, report if available • Mutation type (effect on gene function) • Phenotype data – impact of genetic manipulation on four possible observed features: • Quality of the organism • Cellular location of gene product • Molecular function of gene product • Biological process of gene product

  16. Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box. Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage

  17. Ontology-based Representation of Genetic Manipulation – Gene Knock Out

  18. Genetic Manipulation Section OBI

  19. Phenotype Section OPL GO Cellular location OBI Biological process GO PATO OBI

  20. Web-based Form • Collect the data directly from specific components of the EuPathDB web site • Change dynamically based on user’s inputs (lifecycle stage based on species, display selective marker, report, etc. section when needed)

  21. Future Work • Submission forms are at the prototype stage • Distribute isolate submission forms to EuPathDB users • Incorporate genetic manipulation and phenotype form into EuPathDB website • Evaluation of submission forms based on the data collected • Improve the submission forms based on feedback

  22. Acknowledgements • Stoeckert Lab • Haiming Wang and EuPathDB Team • EuPathDB Community Dr. G Robinson, Dr. R Chalmers, Dr. CJ Janse, Dr. G. Widmer, Dr. L. Xiao, Dr. SM Khan • Funding • NIH grant 5R01GM93132-1 • National Institute of Allergy and Infectious Diseases at the National Institutes of Health Award NO1-AI900038C Contract No. HHSN272200900038C

  23. Thank You!

More Related