230 likes | 480 Views
OntoMerge – a system for computing aided merging of anatomical ontologies. Peter Petrov, Milko Krachunov, Ernest van Ophuizen, Jack Leunissen, Ivan Popov, Dimitar Vassilev. Goals. Computer-aided or semi-automated merging of anatomical ontologies in OBO format
E N D
OntoMerge – a system for computing aided merging of anatomical ontologies Peter Petrov, Milko Krachunov, Ernest van Ophuizen, Jack Leunissen, Ivan Popov, Dimitar Vassilev
Goals • Computer-aided or semi-automated merging of anatomical ontologies in OBO format • Artificial Intelligence – built-in intelligence based on the utilization of extensive external biomedical knowledge adopted from widely known and de facto standard knowledge bases (UMLS, FMA, more) • Extensibility – Ability to work with multiple RDBMS and ability to integrate other external knowledge bases (apart from UMLS and FMA) • Real-world applicability – being part of a larger text searching/mining project aiming to enable researchers to effectively perform cross-species searches and cross-species text mining in available biomedical literature • Portable across platforms – run both on Linux and Windows • Open-source dynamic programming language (Python) and open software technologies (GPL or LGPL)
Tasks #1 • Load and parse OBO files containing two anatomical ontologies (e.g. mouse and zebra-fish) • Communicate with external knowledge sources (currently presented in the form of two MySQL DBs) • Generate first-level (raw, basic) cross-ontology synonym (S) predictions by querying the external knowledge sources • Present the raw synonym predictions to a curator • Generate first-level (raw, basic) parent-child (P-C) ontology predictions by querying the external knowledge sources • Allow the curator to accept, reject or postpone his/her decision on the synonym predictions presented
Tasks #2 • Store curator’s decisions in the form of an SQL Lite relational DB locally on the curator’s machine • Infer/Find second-level (additional) cross-ontology synonym (S) predictions by taking into account - the curator’s decisions on raw synonym predictions - the already generated first-level parent-child (P-C) cross-ontology relations • Present second-level synonym predictions to the curator • Check for and resolve ambiguity conflicts and crossing synonyms • Curate again? Yes – repeat whole process, No - done
Underlying Math Model #1 – Linked DAGs (the two input ontologies linked) • Two linked directed acyclic graphs (DAGs) representing the two input ontologies • Blue ovals – nodes in the two DAGs – concepts in the two anatomical ontologies • Red Arrows: is_a relation within an ontology • Green Arrows: part_of relation within an ontology • Pink links – parent-child relations (cross-species links) • Orange links – synonym relations (cross-species links)
Underlying Math Model #2 – Super-ontology (the two input ontologies merged) • Super concepts (new concepts) are introduced (yellow ovals) • Yellow ovals – generalizations of concepts from one or both of the source ontologies • Cross-species links from Model #1 are removed (orange, pink) • MN– terms (concepts) from the mouse anatomical ontology • ZN – terms (concepts) from the zebra-fish anatomical ontology • Hierarchical edges (red, green, etc.) come into play again (is_a, part_of, etc.)
Program Structure • Two Python components • Two MySQL (or Postgre) DBs (UMLS, FMA) • pawnets – small parser component for loading and parsing anatomical ontologies in OBO format – http://www.obofoundry.org/ - e.g. adult mouse anatomy and zebra-fish anatomy ontologies • ontomerge – large component containing - main logic module (synonym and parent-child predictions) - GUI modules (menus, tables, progress bars, etc.) - interfaces to RDBMS (DB drivers for MySQL, Postgre SQL, etc.) - graph drawing module (showing term’s ontology context in a user friendly view/form)
Discovering Synonyms • Terms t1 from the left ontology are mapped/aligned to terms in one of the external knowledge bases (UMLS, FMA) producing a list of synonyms lst1 • Terms t2 from the right ontology are also mapped/aligned to terms in one of the external knowledge bases (UMLS, FMA) producing a list of synonyms lst2 • If lst1 and lst2 have common (or synonymous) elements then t1 and t2 are candidates for synonyms (are predicted synonyms) and they are presented to the curator for acceptance or for rejection • This algorithm finds raw/basic/first-level predictions • This and similar (ontology to knowledge base) alignment algorithms are at the heart of OntoMerge and allow it to predict synonym relations and parent-child relations thus utilizing the extensive knowledge available in the external knowledge based (UMLS, FMA, etc.) • Additional algorithms are applied based on curator’s input and on the already generated list of basic predictions to come up with even more (second-level) predictions
Program Input and Output • Takes as input two ontologies presented in OBO format • Outputs predicted synonyms and curator’s decisions on predicted synonyms in a lightweight relational DB • Outputs predicted synonyms and curator’s decisions in the form of an OBO file (super-ontology) • Intermediate processing done and presented in the form of a typical desktop-based GUI application (menus, progress bars, progress percentages, GUI alerts and notifications) • Curator’s session stored on program shutdown and reloaded on program’s startup
Extensibility and Configurability • Open to new ontology formats e.g. RDF, OWL (pluggable parsers) • Open to new external knowledge sources: ability to support more than just UMLS and FMA e.g. WordNet, others • Open to new external knowledge source formats: supports other DB types than MySQL e.g. Postgre SQL, possible support for external knowledge sources in the form of web services (distributed, decentralized program model) • RDBMS type configurable – MySQLvsPostgre SQL • DB schema type configurable – full (all tables and columns) vs simplified (selected tables and columns)
Openness • Technologies and tools licensed under GPL or LGPL • Qt – GUI toolkit • PySide – Python for Qt • PyQt4 – Python bindings for the Qt GUI toolkit • MySQL DB – MySQL Python driver/adapter • Psycopg2 – Postgre SQL Python driver/adapter • NetworkX – Python package for creation and manipulation of graphs and networks • MatPlotLib – plotting library for the Python programming language • Text-based, widely adopted, free file formats (OBO, XML, JSON) • Bazaar – version control system
OntoMerge – Main Frame (Initial View) • Initial view before opening a new or an existing project • Left section: to display terms from the so-called left and right ontologies • Bottom (gray) section: to display graph view of the selected term’s ontology context • Right section: to display term textual metadata (left and right terms’ ontology IDs, terms OBO code portions, references based on which the synonym prediction has been made)
OntoMerge – Create New Project • Load two ontologies (as OBO files) and save this project context as a new OntoMerge project file • OntoMerge project file format – JSON
OntoMerge – Main Frame (Populated View) • Left section with table containing synonym predictions (auto-inferred) • Synonym predictions are to be curated by the user • Synonym prediction types • Accepted • Rejected • Undecided (by the curator)
OntoMerge – Term Selected • Left section containing synonym predictions • Selected term highlighted (terms table in the left section) • Selected term’s ontology context presented in graph form (bottom graph section) • Terms’ ontology codes (from the two input ontologies) presented in the right section
OntoMerge – Right Section • Section comprised of three tabs • Terms tab – the two terms taking part in the synonym prediction • OBO code tab – the code of the two terms from the two input OBO ontologies • References tab – the knowledge sources based on which the synonym prediction has been made: simple prediction, direct prediction, UMLS, FMA
OntoMerge – Graph Section • Red nodes – terms from the left ontology • Blue nodes – terms from the right ontology • Arrows, Links – depicting cross-ontology semantic links • Links – synonyms (S) and/or parent–child (P–C) • Solid lines – is_a (specialization) relation • Dotted lines – part_of (aggregation) relation
OntoMerge – Actions Menu • Find basic predictions – finds the first-level/basic/raw predictions applying the algorithm of alignment of the ontologies to the external knowledge bases • Infer additional predictions – infers more predictions using the curator’s input and the already inferred basic predictions, can be run multiple times • Find all predictions – Finds basic predictions, then finds additional predictions (based on the curator’s choices made in the current or in a previous program session) • Refresh all/basic predictions – Similar to the above expect that old predictions are not rediscovered (only new ones are added based on new input from the curator)
OntoMerge – Results Menu • Export raw predictions –saving the raw predictions (but not the curator’s decisions) to the hard disk so that they are not loaded from the DB but from the hard disk • Import raw predictions – loading the raw predictions from the hard drive (not from the DB) • These are just speed ups or optimization techniques, not something conceptually new
Conclusion • Merging anatomical ontologies from different species can be greatly simplified with the use of appropriate software programs and tools like OntoMerge • Novel graph models and programming algorithms make it possible to utilize massive external knowledge that has been collected, curated, extended, and improved for many years; this provides a solid intelligent foundation to the software build on top of it • Openness of the software and its compliance with popular, free and de facto standard languages, technologies and file formats is important; it enables wider reach of the software and enhanced user interest in it
Thank You! Peter Petrov p_a_petrov@yahoo.com Milko Krachounov milko@3mhz.net Ivan Popov popov.bioinfo@gmail.com Dimitar Vassilev jim6329@gmail.com