280 likes | 303 Views
LinkSuite is a framework for integrating data, information, and ontology in life sciences, addressing complexities across structured and unstructured sources. It involves foundational ontology, information engineering, and natural language understanding to support heterogenous databases. Understanding ontology is crucial for structured data extraction.
E N D
LinkSuite™: formally robust ontology-based data and information integration Werner Ceustersa, Barry Smithb, James Matthew Fieldingb a Language & Computing nv (L&C) b Institute for Formal Ontology and Medical Information Science
The problem • A (simple?) question ... • What genes are involved in juvenile diabetes ? • ... may lead to many more questions: • Where is the answer to be found ? • knowledge sources: text books, scientific papers, ... • information sources: physician reports, medical records, ... • data sources: clinical laboratory databases, ... • Is there a known correct answer ? • How should the question be phrased for machine processing ? • ...
Partial solutions are available Same question – different answers
our approach to “ontology” L&C’s LinkSuite How to solve this ? • By developing a framework for data-, information- and ontology-integration • across all levels of generalisation • including information in both structured and unstructured forms. • what requires three tasks to be dealt with properly: • identifying the basic ontological foundations of a framework expressive enough to describe life science data at all levels; • carrying out the research in information engineering needed to create technology able to exploit this ontological framework in a way that can support the integration of massively heterogenous structured and semi-structured life science databases; • developing the tools for natural language understanding in the domain of the life sciences needed to extract structured data from free text documents.
“Ontology” N. Guarino, P. Giaretta, "Ontologies and Knowledge Bases: Towards a Terminological Clarification". In Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, N. Mars (ed.), pp 25-32. IOS Press, Amsterdam, 1995.
From buzz-word to the “O-word” • “An ontology is a classification methodology for formalizing a subject's knowledge or belief system in a structured way. Dictionaries and encyclopedias are examples of ontologies.” (X1) • “A terminology (or classification) is a kind of ontology by definition and it should preserve (and "understand") the relationships between the 1,000s of terms in it or else it would become a mere dictionary (or at best a thesaurus).” (X2) • “Ontologies are Web pages that contain a mystical unifying force that gives differing labels common meaning.” (X3)
If, later, you can remember just one thing of this representation, then make sure it is this one: If you use the word “ontology”, ALWAYS be specific about what you understand by it.
The T-Box has no meaning without the A-Box to be used by software (agents) in a machine, and NOT by humans • does not rely on what people know or think, hence no “concepts” • instance driven, although it accepts universals that are not instanciated • does not “create” or “constrain” reality My understanding of an ontology • a for a computer understable representation of some pre-existing domain of REALITY, reflecting the properties of the objects within its domain in such a way that there obtain substantial and systematic correlations between reality and the ontology itself. modified from Barry Smith
Ontological theories • = theories between reality and “the ontology” (“ontology” as a representation) • Granular Partition Theory (T Bittner & B. Smith) • Logic of Classes (B. Smith)
Theory of granular partitions (B. Smith) Think of it as Alberti’s grid
Granular partitions: main principles • a partition is the drawing of a (typically complex) fiat boundary over a certain domain • a partition typically comes with labels and/or an address system • partitions are artefacts of our cognition • a partition is transparent (veridical) • bona fide objects exist independently of our partitions, fiat objects are determined by partitions • different partitions may represent cuts through the same reality which are skew to each other • entities (existing in reality) located in the same cell of a partition share common characteristics
Logic of classes • primitive: • entities: particulars versus universals • relation inst such that: • all classes are universals; all instances are particulars • some universals are not classes, hence have no instances: pet, adult, physician • some particulars are not instances; e.g. some mereological sums • subsumption defined resorting to instances:
Basic Formal Ontology Basic Formal Ontology consists in a series of sub-ontologies (most properly conceived as a series of perspectives on reality), the most important of which are: • SnapBFO, a series of snapshot ontologies (Oti ), indexed by times • SpanBFO a single videoscopic ontology (Ov). Each Oti is an inventory of all entities existing at a time. Ov is an inventory (processory) of all processes unfolding through time.
Governmental or Regulatory Activity Educational Activity Daily or Recreational activity Finding Social Behaviour Machine Actiivty Occupational Activity Research Activity Intellectual Product Behaviour Individual Behaviour Health care Activity OrganismAttribute Group Therapeutic Procedure Diagnostic Procedure Laboratory Procedure Language Activity Conceptual Entity Occupation Or Discipline Organism Function Mental Process Genetic Function Molecular Function GroupAttribute Organ or Tissue Function Cell Function Physiologic Function Idea orConcept Disease or Syndrome Mental or Behavioural Dysfunction Organisation Phenomenon Or Process Biologic Function Organism Manufactured Object Pathologic Function PhysicalObject Human-caused Phenomenon Or Process Natural Phenomenon Or Process Substance Neoplastic Process Anatomical Structure Injury or Poisoning Experimental Model of Disease Cell or Molecular Dysfunction Environment Effect of Humans UMLS Semantic Types Entity Event
structured text Instance data Instance data LinKBase Class data Class data Virtual Ontology Information Extraction System Technology overview MaDBoKS TeSSI indexer LinKFactory Server LinKFactory Client
Language A Proprietary Terminologies LanguageB Lexicon Lexicon Others ... Grammar ICPC Grammar SNOMED ICD LinKBase Formal Domain Ontology Cassandra Linguistic Ontology MEDDRA
HAS-SPATIAL-POINT-REFERENCE HAS-CONNECTING-REGION HAS-OVERLAPPING-REGION IS-SPATIAL-PART-OF HAS-DISCRETED-REGION HAS-SPATIAL-PART HAS-DISCONNECTED-REGION IS-PROPER-SPAT.-PART-OF IS-INSIDE-CONVEX-HULL-OF HAS-PROPER-SPATIAL-PART IS-PARTLY-IN-CONVEX-HULL-OF IS-OUTSIDE-CONVEX-HULL-OF HAS-EXTERNAL-CONNECTING-REGION IS-NON-TANG.-SPAT.-PART-OF IS-TANG.-SPAT.-PART-OF IS-TOPO-INSIDE-OF IS-GEO-INSIDE-OF IS-SPAT.-EQUIV.-OF HAS-NON-TANG.-SPAT.-PART HAS-TANG.-SPAT.-PART Based on formal ontology HAS-PARTIAL-SPATIAL-OVERLAP
Snomed-RT : “Convulsion” MESH-2001 : “Seizures” ISA IS-narrower-than Snomed-RT : “Seizure” MESH-2001 : “Convulsions” Has-CCC Has-CCC Has-CCC Has-CCC L&C : Health crisis IS-A IS-A L&C : Seizure L&C : Convulsion IS-A IS-A L&C : Epileptic convulsion Linking external ontologies
Internal ontology External ontology Criteria Mappings Definitions Terms Managing different views
Ontological theory insideLinKBase • if a real-world entity is an instance of a domain-entity, all that is said about the domain-entity applies to the instance; • if you know that a real-world entity satisfies the Full Definition of a domain-entity-type, then you may infer that that object is an instance of that type. • the statement “A-Link-B” says something about all instances of A, but nothing about instances of B unless the Link is declared to have an inverse;
Generalised Possession Healthcare phenomenon Human Has- possessor Has- possessed IS-A 1 1 2 1 IS-A Having a healthcare phenomenon IS-A 2 Is-possessor-of Patient Has-Healthcare-phenomenon Malignant neoplasm IS-A 3 IS-A 3 ONTOLOGY Cancer patient lung carcinoma IS-A Ontology based parsing 1. Parsing 2. Relating 3. Inferring Mr. Smithhasa pulmonary carcinoma Mr. Smith has a pulmonary carcinoma
Conclusions • There is a huge need for life science data integration technology able to deal with both structured and unstructured data formats. • To keep the data manageable, the technology should be able to understand the data. • The proper sort of ontology is a means to accomplish this. • Based on several POCs, L&C’s LinKSuite can be claimed to be a successful attempt to exploit these insights. • But humble as we are, we understand that it is still far from where it should be.