260 likes | 433 Views
Mental Functioning and Semantic Search in the Neuroscience Information Framework. Maryann Martone Fahim Imam. Funded in part by the NIH Neuroscience Blueprint HHSN271200800035C via NIDA. Neuroscience Information Framework – http://neuinfo.org. Literature.
E N D
Mental Functioning and Semantic Search in the Neuroscience Information Framework Maryann Martone Fahim Imam Funded in part by the NIH Neuroscience Blueprint HHSN271200800035C via NIDA Neuroscience Information Framework – http://neuinfo.org
Literature The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience Database Federation UCSD, Yale, Cal Tech, George Mason, Washington Univ • A portal for finding and using neuroscience resources • A consistent framework for describing resources • Provides simultaneous search of multiple types of information, organized by category • Supported by an expansive ontology for neuroscience • Utilizes advanced technologies to search the “hidden web” Registry http://neuinfo.org Supported by NIH Blueprint
NIF takes a global view of resources • NIF’s goal: Discover and use resources • Data • Databases • Tools • Materials • Services • Federated approach: Resources are developed and maintained by the community • >150 data sources; 350M records • Agile approach: the NIF system is designed to be populated quickly and allow for incremental improvements to representation and search • Contract specifies 25 sources/year NIF’s Rules for using digital resources #1: YOU HAVE TO FIND THEM!!!!!!! #2: You have to access/open them #3: You have to understand them Neuroscience is inherently interdisciplinary; no one technique reveals all
What do you mean by data? Databases come in many shapes and sizes Registries: Metadata Pointers to data sets or materials stored elsewhere Data aggregators Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede Single source Data acquired within a single context , e.g., Allen Brain Atlas • Primary data: • Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) • Secondary data • Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) • Tertiary data • Claims and assertions about the meaning of data • E.g., gene upregulation/downregulation, brain activation as a function of task
NIFSTD Ontologies Bill Bug et al. • Set of modular ontologies • 86, 000 + distinct concepts + synonyms • Expressed in OWL-DL language • Supported by common DL Reasoners • Currently supports OWL 2 • Closely follows OBO community best practices • Avoids duplication of efforts • Standardized to the same upper level ontologies • e.g., Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO) • Relies on existing community ontologies • e.g., CHEBI, GO, PRO, DOID, OBI etc. • Modules cover orthogonal domain • e.g. , Brain Regions, Cells, Molecules, Subcellular parts, Diseases, Nervous system functions, etc. Neuroscience Information Framework – http://neuinfo.org
Importing into NIFSTD • NIF converts to OWL and aligns to BFO, if not already • Facilitates ingestion, but can have negative consequences for search if model adds computational complexity • Data sources do not make careful distinctions but use what is customary for the domain • Modularity: NIF seeks to have single coverage of a sub-domain • We are not UMLS or Bioportal • NIF uses MIREOT to import individual classes or branches of classes from large ontologies • NIF retains identifier of source • NIF uses ID’s for names, not text strings • Avoids collision • Allows retiring of class without retiring the string NIFSTD has evolved as the ontologies have evolved; had to make many compromises based on ontologies and tools available
NIFSTD Modules and Sources Neuroscience Information Framework – http://neuinfo.org
What are the connections of the hippocampus? Query expansion: Synonyms and related concepts Boolean queries Hippocampus OR “CornuAmmonis” OR “Ammon’s horn” Data sources categorized by “data type” and level of nervous system Tutorials for using full resource when getting there from NIF Common views across multiple sources Link back to record in original source
Entity mapping BIRNLex_435 Brodmann.3 Explicit mapping of database content helps disambiguate non-unique and custom terminology
NIF Concept-Based Search • Search Google: GABAergic neuron • Search NIF: GABAergic neuron • NIF automatically searches for types of GABAergicneurons • Defined by OWL axioms Types of GABAergicneurons Neuroscience Information Framework – http://neuinfo.org
Ontological Query expansion through OntoQuest • OntoQuest – NIF’s ontology management system for NIFSTD ontologies • Implements various graph search algorithms for ontological graphs • Automated query expansion for NIFSTD terms, including the ones with defined logical restrictions. Gupta et al., 2010
NIF information space NIF developed a tiered system Concepts • Domain knowledge • What you would teach someone coming into your domain • NIFSTD/Ontoquest • All upper level BFO categories are suppressed • Claims based on data • Bridge files across domains (constructed by NIF), Databases, triple stores, • Text • Data • Relational databases • Spreadsheets Knowledge Base Data Concepts, Entities + data summaries Scientists search via the terms they use, not what we would like them to use-NIF needs a broad net to find relevant resources
What genes are upregulated by drugs of abuse in the adult mouse? Gene upregulated mice illegal drug When searching across broad information sources, need to search for what people are looking for
NIF “translates” common concepts through ontology and annotation standards • What genes are upregulated by drugs of abuse in the adult mouse? Morphine Increased expression Adult Mouse Arbitrary but defensible
NifStd and NeuroLex Wiki • Semantic wiki platform • Provides simple forms for structured knowledge • People can add concepts, properties, and annotations • Generate hierarchies without having to learn complicated ontology tools • Community can contribute • Relax rules for NIFSTD so dedicated domain scientists can contribute their knowledge and review other contributions • Teaches structuring of knowledge via red links/blue links • Process is tracked and exposed • Implemented versioning Larson et al. Readily indexed by Google; queries to NIF data via NIF navigator
NeuroLex Content Structure Stephen D. Larson et al. Neurolex is becoming a significant knowledge base
Top Down Vs. Bottom up • Top-down ontology construction • A select few authors have write privileges • Maximizes consistency of terms with each other • Making changes requires approval and re-publishing • Works best when domain to be organized has: small corpus, formal categories, stable entities, restricted entities, clear edges. • Works best with participants who are: expert catalogers, coordinated users, expert users, people with authoritative source of judgment NIFSTD • Bottom-up ontology construction • Multiple participants can edit the ontology instantly • Semantics are limited to what is convenient for the domain • Not a replacement for top-down construction; sometimes necessary to increase flexibility • Necessary when domain has: large corpus, no formal categories, no clear edges • Necessary when participants are: uncoordinated users, amateur users, naïve catalogers • Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated NEUROLEX Larson et. al Neuroscience Information Framework – http://neuinfo.org
Engaging domain scientists Planned process Disposition Continuant ? ? ? Cognitive process Mental Process Mental state Recall Memory Retrieval Episodic Non-declarative Encoding
Mental functioning is difficult to define and dissect • Very few behaviors are “pure” • Operationally defined through experiments • What is a mental function? • Activity, state, function, process • Subtypes are rarely disjoint • Episodic memory • Semantic memory • Procedural memory • Declarative memory • Distinctions among paradigms, assessments, tests, rating scales, tasks are often subtle Early work done in BIRN; later terms added by students and curators
Neurolex does not adhere strictly to BFO Concepts and things happily co-exist; content gets reconciled over time
Nevertheless... • We do not allow duplicates • We do not allow multiple inheritance • Use “role” to shortcut many relations • We do try to re-factor contributions so as to avoid collisions across our domains • But...once they are in the wiki, they will move about and be added to as necessary Neuinfo.org/neurolex/wiki/COGPO_00123
Cognitive-related searches through NIF • fear prefrontal arousal • Attention and distraction • Passive viewing • stroop effect • sequence learning • studies done on the cognitive-behavioral model of addiction • memory recall • self-administration • Visual oddball paradigm • Sexual Orientation • Face recognition • neurophysiology of language • Olfaction • Consciousness • Gustatory Scientists tend to focus on tests and general concepts rather than deep considerations of cognitive processes
Mental Functioning: What NIF needs • Computable taxonomies of test (assessments, paradigms, tasks) types • Test types should be related to the function they purport to measure but will only be an approximation • Not just human!!! • Computable operational definitions of cognitive concepts • Translates tests into concepts used in search • Dementia rating scale scores = Dementia • Smoking assessment scores = smoker
Concluding Remarks • NIFSTD is utilized to provide a semantic index to heterogeneous data sources • BFO allows us to promote a broad semantic interoperability between biomedical ontologies. • The modularity principles allows us to limit the complexity of the base ontologies • NIF defines a process to form complex semantics to neuroscience concepts through NIFSTD and NeuroLex collaborative environment. • NIF encourages the use of community ontologies • Moving towards building rich knowledgebase for Neuroscience that integrates with larger life science communities Neuroscience Information Framework – http://neuinfo.org
Points of Discussion CogPO/CogAT/NEMO/MHO Harmonization? • What kind of interplay are we looking at? • Is it about re-use of ontological vocabularies? • What should be the best practice for reuse? • Re-using URI vs Creating new class and Mapping • Non-semantic reuse of classes as entities (e.g., MIREOT) • Is it about building new relationships between the entities covered in all these four ontologies? • What do we achieve through doing this? • Are we trying to connect all the curated/ annotated experimental data-set to a common semantic layer? • All of the above? What should be NIF's role? • How can we help to expose your experiments and results to a broader audience through our interface? • What kind of involvement can people have in terms of re-using your ontological content or contributing to your content? • We want to be the 'host' of all the NS concepts and entities, but not necessarily the 'maintainer'.
What ontology isn’t(or shouldn’t be) • A rigid top-down fixed hierarchy for limiting expression in the neurosciences • Not about restricting expression but how to express meaning clearly and in a machine readable form • A bottomless resource-eating pit that consumes dollars and returns nothing • A cure-all for all our problems • A completely solved area • Applied vs theoretical • Easy to understand Mike Bergman