290 likes | 489 Views
NIF Vocabulary Server. Maryann Martone, Ph. D. NIF Technical Team. Perry Miller, Yale Luis Marenco, Yale Yuli Li, Yale Arun Rangarajun, Cal Tech Hans-Michael Muller, Cal Tech Sredevi Polavarum, George Mason Jeff Grethe, UCSD Brian Sanders, UCSD Vadim Astakhov, UCSD Amarnath Gupta, UCSD
E N D
NIF Vocabulary Server Maryann Martone, Ph. D.
NIF Technical Team • Perry Miller, Yale • Luis Marenco, Yale • Yuli Li, Yale • Arun Rangarajun, Cal Tech • Hans-Michael Muller, Cal Tech • Sredevi Polavarum, George Mason • Jeff Grethe, UCSD • Brian Sanders, UCSD • Vadim Astakhov, UCSD • Amarnath Gupta, UCSD • Xufei Qian, UCSD • Bill Bug, UCSD • Maryann Martone, UCSD
Basic Architecture • The same architecture and workflow applies to the registration process
Role of NIF Terminologies • NIF terminologies provide a shared vocabulary for annotation of neuroscience data • NIF terminologies provide the shared semantics for accessing resources and data through the NIF interface • Semantic enrichment of terms to enable more targeted and meaningful queries • Ultimately, NIF terminologies are critical for data and database interoperability
Building the NIF Terminologies • NIF Basic: • Daniel Gardner held a series of workshops with neuroscientists to obtain sets of terms that are useful for neuroscientists • NIFSTD (NIF Standardized) • Bill Bug built a set of expanded vocabularies using the structure of the BIRNLex and the import of existing terminological resources • Provides enhanced coverage of domains in NIF Basic • Provides coverage of domains not included in NIF but covered by existing resources, e.g., molecules • Encoded in OWL/RDF • Provides mapping to source terminologies, including NIF Basic • Provides synonyms, lexical variants, abbreviations
Registering a Resource to NIF • Level 1 • NIF Registry: high level descriptions from NIF vocabularies supplied by human curators • Level 2*** • Discovery mechanism for hidden content (Disco or SiteMaps.org) • Level 3 • Direct query of web accessible database • Automated registration • Mapping of database content to NIF vocabulary by human ***Not yet implemented
Level 1 Registration • Sites are entered by curators • Annotation with NIF basic vocabulary + free text • May be searched with NIFSTD terms
Level 2 Registration • Automated or semi-automated discovery and indexing of web sites • Index of web sites registered to NIF registry • Web content is indexed against the NIFSTD vocabularies • Discovery mechanism planned (Luis) • XML will utilize NIFSTD
Level 3: NIF Data Federation • Allows deep query of database content through a single interface • Limited number of resources registered for Phase 2: proof of concept and demonstration of deep search via database mediation • Registration process: • Create wrapper to allow remote NIF mediator query • Map content to NIFSTD • Semi-automatic process based on high-level mapping of fields and data values: • e.g., SumsDB geography maps to NIFSTD regional part of brain
Mapping to Level 3: Concept Mapping Tool • Java webstart application • Retrieves database schema + data from mediator registry • Maps data to NIFSTD values • Provides term mapping to mediator Term Index Source (TIS)
Why is this done by a human at the moment? • Abbreviations, ambiguous terms, non-standard names, e.g., • LPF: (**if this is mapped as an abbreviation to NIFSTD, then it wouldn’t be a problem) • Anterior cingulate: Gyrus? Sulcus? • Frontal subgyral =frontal subgyral white matter?
Your definition-My definition? • Hippocampus (SUMS)= hippocampus (NIFSTD)? • can’t tell just by the string; must look at the definition
Subcellular Anatomy Ontology Phenotypic Qualities (PATO) NIF Molecule NIF Nerve Cell OBI Common Anatomy Referece Ontology (CARO) OBO Sequence OBO Cell Type BIRNLex Components BIRNLex Sensory Behavior Cognition Organism Taxonomy Anatomy Disease Investigation
Building NIFSTD • OBO Foundry principles and best practices • NIFSTD is built from a set of modular ontologies • Anatomy: Neuronames (via BIRNLex) • Taxonomy: NCBI taxonomy (via BIRNLex) • Molecule: IUPHAR + PDPS Ki + SwissProt (neuro) • Cell: NIF (Senselab, Neuromorpho, CCDB) • Subcellular anatomy: GO + SAO • Disease: MESH/UMLS + NINDS + OMIM (neuro) • Resource descriptors: NIF, NITRC, NCBC, OBI • Technique: NIF + Ontology for Biomedical Investigation (OBI) • Behavior: NIF, BIRN, BrainMap • Attributes: PATO • Each is mapped to a unique identifier • Single inheritance with minimal assignment of properties • Each file is imported separately, but integrated through the Basic Formal Ontology into a single vocabulary • Imported using manual, semi-automated and automated means • Degree of intervention dependent on the vocabulary • At this point, large degree of manual intervention is often necessary • Link back to source ID is maintained • Encoded in OWL/RDF
Term set submitted Match to BIRNLex lexical tags yes grab BIRNLex ID no manually map to nearest anscestor mapping vetted by domain expert autoconvert to BIRNLex OWL/RDF Batch modifications (alpha) prefLabel synonym abbrev acronym tax scientific name tax common name GENBANK common name NCBI BLAST name antiquated label misspelling IMSR standard name
Row = class Parent prop: required to place in BIRNLex hierarchy col = related property (annotations & objects) Batch modification example • IUPHAR V-gated Ion Channels (NIF)
Batch modification example • IUPHAR V-gated Ion Channels (NIF)
Citations & Mappings • Maintain link back to external knowledge source • For terms/concepts and for definitions • Mappings provide parsable representation of cross terminology synonymies
Citations & Mappings • External IDs • Generic • externalSourceId • Specific (for common sources) • Neuroanatomy: neuronamesID/bamsID • Organism taxonomy: ncbiTaxID/itisID/gbifID/jaxMiceID/tacMiceID • Cells/Tissue: atccID • Disease: UmlsCui/MeSH • URL templates • Use IDs to link to external source URL references (when available) • automatically add ref links in to tools using BIRNLex - TIS, BONFIRE, etc. • Definition citations as well including URIs & publication references
Use Case: Cell Types • Existing cell type ontology, but poor coverage of neuronal cells and generally agreed by the community to be “problemmatical” • Senselab, CCDB, NeuroMorpho.org, NIF collated cell type terminologies • Produced master list on Excel spreadsheet with defined properties • Neurotransmitter, anatomical location, morphology, molecular constituent, circuit type • Using Jena code written by BB, imported contents directly into Protégé OWL, matching strings against existing content, e.g., anatomy, molecules
is-a Photoreceptor Cell Cerebellar Granule Cell is-a is-a Pyramidal Cell Purkinje Cell is-a Globular Bushy Cell is-a Glutamatergic Neuron is-a Photoreceptor Cell is-a Cerebellar Granule Cell is-a is-a Chandelier Cell is-a is-a Cortical Spiny Stellate Cell is-a Granule Cell is-a is-a Cerebellar Basket Cell Olfactory Granule Cell is-a Double Bouquet Cell is-a Neuron is-a Neuron is-a is-a Dentate Gyrus Granule Cell Globular Bushy Cell is-a is-a is-a Spiny Cell is-a Medium Spiny Cell Medium Spiny Cell is-a is-a is-a Purkinje Cell Pyramidal Cell is-a is-a is-a Dentate Gyrus Granule Cell is-a GABAergic Neuron is-a Double Bouquet Cell Olfactory Granule Cell is-a is-a Cerebellar Basket Cell Cortical Spiny Stellate Cell is-a Chandelier Cell
Maintaining NIFSTD • Maintenance of NIFSTD at this point will probably require the use of a human curator, although several of the functions can be automated • Community can contribute to NIF Basic; human curator will be needed to migrate much of the content to NIFSTD
Availability of NIFSTD • NIFSTD OWL file available from http://purl.org/nif/ontology/nif.owl • NIFSTD available through Bonfire (1 and 2) for programmatic access
Bonfire • NIF vocabularies are served by the vocabulary server built by BIRN: Bonfire • Oracle database • Cross mappings between different vocabularies • Basic graph queries (neighborhood, shortest path) • Web services were developed for NIF • Based on the structure of UMLS • User interface for graph visualization and queries (not planned for NIF delivery) • Bonfire 2 • Optimized for NIF vocabularies • Postgres RDMS + ontology access functions that we have built • e.g., Given a term, produce its ancestry graph by following the edge-label(subclass-of OR part-of)
NIF Application ArchitectureFor OntoQuest (Bonfire 2) Web Client App. Configuration Application Logic Term Mapper and Indexer Fed. DB Registry XML NIF Registry OntoQuest Lucene Index Ontology Database Text Engine External Database-1 External Database-1 External Database-1 External web sites Docs Docs Docs Ontologies Neuroscience Web sites
What’s next • NIFSTD: Comprehensive “is a” hierarchy, but relations sparse - e.g., “part of”, “binds ligand”, “sequence of”, etc. • Continue to build pipeline from loosely structured to formal ontology • Continue to add domains • Add relationships and definitions • Generate additional hierarchies • Incorporate more of the semantics into the NIF search
Evolution of Terminologies • NIF STD • Imports existing terminologies developed by other communities • Modular design • Normalizes structure according to Basic Formal Ontology (BFO) Creates single inheritance “is a” tree • Provides mapping between NIF and NIFSTD • Provides synonyms, abbreviations and lexical variants • OWL/RDF • NIF Basic vocabulary • Contributed by panels of experts • Coarse granularity but broad coverage • Loose hierarchy • XML • NIF Plus • Relates classes through “part of” and other OBO relations • Consistent human and machine-readable definitions NIF Phase I and II
Current Status and Future Work • Prototype interfaces built upon Bonfire I and II • NIFSTD 1.0 in Bonfire 1 • NIFSTD 1.1 in Bonfire 2 • Will update Bonfire 1 content after this demonstration • Implementation and testing of vocabulary services using Bonfire 2 • Better use of lexical variants, synonyms etc. • Mapping of NIF Registry and NIF data federation with NIFSTD • All resources registered will mark up more content • Coverage of behavior (sensory, motor) and behavioral assessments will be added • More lexical variants will be used in searches • Improved access to annotation properties through Concept Mapper