170 likes | 186 Views
ZOOMA is an ontology mapping application that finds optimal matches between text values and ontology terms, supporting queries with curated data. It aids curators in mapping text values with ontology terms for enhanced data categorization and querying capabilities. ZOOMA offers automatic mapping, error detection, and mapping suggestions to streamline ontology mapping across various databases.
E N D
ZOOMA – Optimal Ontology Mapping Application Tony Burdett 29nd April
What is ZOOMA? ZOOMA is an ontology mapping application, designed to find optimal matches between “text values” and “ontology terms”. A bit of background: • ZOOMA grew out of the need to offer queries against EFO for data newly loaded into Atlas 2.0. • At load, user supplied text values are not resolved against ontology terms • Curators need to be able to “map” text values for newly loaded data in order to support ontology-enabled queries Master headline
Wider Usecase • Our pressing requirement, right now, is to map text values to ontology terms in the Atlas • But, this is a wider problem – such mappings are found in everything we do. • Atlas, ArrayExpress2, BII, MAGE-TAB, submitters all need to do this sort of ontology mapping • So it makes sense to do the mapping logic once and reuse it! Master headline
Some Jargon • We probably all understand what “text values” and “ontology terms” means (or we think we do) • “Text values” are things that a user, or maybe a curator has entered • This isn’t necessarily (but probably is) some sort of controlled term • “Ontology Terms” are things which come (only) from an ontology • If you can’t find it, it’s not an ontology term! • A “mapping” is an assertion that some text value somehow relates to some ontology term(s). Master headline
Atlas World View • The Atlas DB has Property, PropertyValue and OntologyTerm tables. • Properties are the “types” (e.g. organismpart) and Property Values are the values (e.g. heart) • These are controlled (reused between experiments) • There are join tables – AssayPVOntology and SamplePVOntology – to join property values to OntologyTerms on the assay and sample level Master headline
ArrayExpress2 World View • The ArrayExpress database also has Property, PropertyValue, OntologyTerm and OntologyEntry tables. • However, they’re joined in different ways • In the Atlas, Properties and Property Values are reused, and the join table determines uniqueness • In ArrayExpress2, there are join tables to join Properties to OntologyEntries, and Property Values to OntologyTerms (basically) • This means Property and Property Values must be unique per “usage”. Master headline
MAGE-TAB World View • In MAGE-TAB, certain columns can be followed with “Term Source REF” which designates a link to an Ontology Term. • The value entered into the column is a text value • The Term Source REF indicates we should be able to find an ontology term with the matching “name” in the given ontology • But this can be hit and miss! Although term source accession is better Master headline
MAGE-TAB World View Property (IDF may link to OntologyEntry) Property value Ontology term Master headline
Now that’s over… what does ZOOMA do? • In general, ZOOMA has three main modes of operation… • Automatic • This highlights any optimal mappings that don’t need curation • One text value maps to the same set of terms every time • Error detection • Highlight any mappings from text value to ontology term that might be in error • This only makes sense if you have a “repository of mappings” e.g. Atlas • Mapping suggestions • Take values, and propose new mappings to terms • Requires a curator eye to make a decision on which mapping is best Master headline
What ZOOMA does • These three modes are currently implemented for the Atlas, and for a list of values submitted by file • In the Atlas, ZOOMA will… • Lookup property values from Atlas (these are our text values), and find the optimal ontology term hits against EFO • Lookup property values from Atlas and find inherited ontology term hits from elsewhere in the database • Lookup property values from Atlas and find possible matches to terms not in EFO, querying BioPortal/OLS (using OntoCAT) • Currently, this data goes into a report, and is then written back to the database. • But automatic writing back to the Atlas is coming soon! Master headline
What ZOOMA does • When running over a submitted text file, ZOOMA can… • Lookup user supplied text values (from a supplied file) and find optimal hits against your ontology of choice • Lookup user supplied text values and recover mappings from the Atlas • Lookup user supplied text values and find possible matches in BioPortal/OLS, again using OntoCAT • Actually, the implementation of the different inputs is the same – we just need to chaange where our text values come from • Again, this generates a report • Error detection doesn’t really mean anything in this case – no mapping errors to detect! Master headline
Some tech… • The idea behind ZOOMA is to isolate the logic from the sources of text values and ontology terms. • To achieve this, ZOOMA has several top level interfaces: • OntologyMapper • OntologyMappingFormulator • OntologyTermRetriever • OntologyMappingHypothesis • OntologyMappingHypothesisFactory • OntologyMappingEvaluator • OntologyMappingCalculator • OntologyMappingOutcome Master headline
Some more tech… • The infrastructure described means that to use new databases just means writing new OntologyTermRetriever implementations • An OntologyTermRetriever fetches OntologyTerms that possibly match a text value, with enough information to decide how good the match is (the OntologyMappingContext) • New mapping logic can be added by adding implementations of the other interfaces – so at the moment, there is a “RankingBasedCalculator” that implements OntologyMappingCalculator • Implementations can quickly be wired up with Spring Master headline
What’s next? • Very short term: do a production release! Ele has already been using ZOOMA to do Atlas mappings, but it’s been a bit ad hoc so far • I need to add support for writing our new mappings into the Atlas database, instead of the report • I want to add an ArrayExpress2 retriever, so we can determine mappings here • And also add write support, to map terms there too • Possibly create an OWL-driven backend? Master headline
Questions? Master headline