250 likes | 447 Views
BioHealth Informatics Group. Intermediate Representations Taming the complexity monster(s). Dr Jeremy Rogers Manchester University Jeremy.e.rogers@manchester.ac.uk. Ontology Engineering Complexity Monsters. Computation The Domain Artefacts Understanding. Domain Complexity Medicine is big.
E N D
BioHealthInformaticsGroup Intermediate RepresentationsTaming the complexity monster(s) Dr Jeremy Rogers Manchester University Jeremy.e.rogers@manchester.ac.uk
Ontology EngineeringComplexity Monsters • Computation • The Domain • Artefacts • Understanding
Domain ComplexityMedicine is big • Very large and inherently complex • 975,354 distinct UMLS concepts • but this is still too small
Domain ComplexityMedicine is big 344,000 (Jan 2003) Concepts 300,000 SNOMED CT 250,000 SNOMED RT (Beta) 200,000 (V 3.5) SNOMED International (V 3.4) 150,000 (V 3.0) (V 3.3) (V 3.2) (V 3.1) 100,000 SNOMED II (1979) 50,000 (V 3.5) (V 3.4) (V 3.1) (V 3.3) (V 3.2) (V 3.0) Pathology II (1980) 1980 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 UMLS: 975,000 different medical concepts
Domain ComplexityMedicine is changing New entities defined… Charcot-Marie-Tooth 1:2500 genetically determined peripheral neuropathy 1975 (ICD) - one code 1996 (Steadmans) – 3 codes 2003 (OMIM) – 46 phenotypic variants, 39 identified locus …old ones consigned to history
Domain ComplexityBut this is tip of iceberg… 15,496 15000 - 14,737 10000 - 10,000 5000 - 1965 1970 1975 1980 1985 1990 1995 2000 2005 • 15496 OMIM UIDs (1 Feb 2006) • Conditions probably or definitely at least partly genetically determined, including: • 107700 Appendicitis • 600631 Bed Wetting • 604324 Adult Acne • 177900 Psoriasis • 103780 Alcoholism • 188890 Tobacco Addiction • 606349 Pathologic Gambling • 119915 Cluster Headache • 607504 Benign Sexual Headache • 220700 Deafness • 125480 Depression/Mania • 109200 Male pattern baldness • 179600 Raynaud Disease
Domain ComplexityMultiple User Views • Different Purpose • [Physician] – where is the pain (and the lesion)? • [Pharmacologist] – what is physiology of pain receptors and conduction? • [Neurologist] – in phantom limb pain, how is pain perceived? • Different focus • Ontology for diabetic Sx: • Abscess:Locus(finger, hand, forearm, arm, shoulder, neck, face, scalp, chest, abdomen, back, thigh, calf, shin, forefoot or toe) • Ontology for diabetic Rx: • Penicillin – IndicatedFor – Abscess:Locus(Skin) • Ischaemia:Locus(finger, hand, forearm, arm, shoulder, neck, face, scalp, chest, abdomen, back, thigh, calf, shin, forefoot or toe)
Domain ComplexityNeeds of external KBs • Need for lots of rules • Form on [Hand] shouldhaveoptions [left, right] • Form on [Palm of hand] shouldhaveoptions [left, right] • Form on [Finger] shouldhaveoptions [left, right] • Form on [Thumb] shouldhaveoptions [left, right] • Form on [First metacarpal] shouldhaveoptions [left, right] • Etc. etc. • Obscure categories required for parsimony • ALL Forms on [mirror-imaged body structures] OR [their subparts] shouldhaveoption ANY [laterality] • ALL Forms on [respiratory disease] shouldhaveoptions [cough, wheeze] • ALL Forms on [symptoms] shouldhaveoption ANY [severity]
Domain ComplexityPolyhierarchies • Requirement for multiaxial classification • World is used to monoaxial classification • Very large domain very large polyhierarchies • Impossible to accurately construct by hand • Inherently confusing to navigate • The smallprint, if made explicit, overwhelms us
Artefactual ComplexityUnreadable notations <owl:Class rdf:about="#InjectionOfContrastMedium"> <owl:equivalentClass> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty> <owl:ObjectProperty rdf:about="#acts_on"/> </owl:onProperty> <owl:someValuesFrom> <owl:Class rdf:ID="RadioopaqueContrastMedium"/> </owl:someValuesFrom> </owl:Restriction> <owl:Class rdf:about="#Injecting"/> </owl:intersectionOf> </owl:Class> </owl:equivalentClass> </owl:Class> <owl:Class rdf:ID="ContrastCTScan"> <owl:equivalentClass> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty> <owl:TransitiveProperty rdf:about="#has_part"/> </owl:onProperty> <owl:someValuesFrom> <owl:Class rdf:ID="InjectionOfContrastMedium"/> </owl:someValuesFrom> </owl:Restriction> <owl:Class rdf:about="#XrayComputedTomography"/> </owl:intersectionOf> </owl:Class> </owl:equivalentClass> </owl:Class> (XrayComputedTomography which hasPart (Injecting which actsOn RadioopaqueContrastMedium)) name ContrastCTScan (ClinicalSituation which <isCharacterisedBy (presence which isExistenceOf (ContractionProcess which <isSpecificFunctionOf SphincterAniMuscle hasImmediateConsequence Pain hasIntentionality (Intentionality which hasAbsoluteState involuntary) hasDuration (Duration which hasAbsoluteState longTerm) hasTemporalPattern (TemporalPattern which hasAbsoluteState ongoing) >)) isCharacterisedBy (presence which isExistenceOf (UrgeToVoidUrineOrFaeces which hasProcessActivity (ProcessActivity which hasQuantity (Level which hasMagnitude highLevel)))) isCharacterisedBy (presence which isExistenceOf AbdominalStraining) >)
Artefactual ComplexityWorkarounds • Limitations in formalism Workarounds
Scale of task Multiple authors Quality control and assurance Formal ontologies require great precision Poor debugging tools Natural Language can misdirect Cognitive Complexity
ComplexityEffect on the user Syntactic confusion – I can’t read this Navigational Confusion – I don’t need most of this Navigational Uncertainty – where am I ? Editorial Uncertainty – atom or primitive ? Editorial Confusion – what recipe ?
MAIN replacing ACTS_ON heart valve HAS_ FEATURE induced arrest of heart Intermediate RepresentationsFrom this to this… (SurgicalDeed which isCharacterisedBy (performance whichG isEnactmentOf (Dividing whichG < playsClinicalRole SurgicalRole actsSpecificallyOn HeartValve hasSubprocess (TemporalFeature which < isSpecificImmediateConsequenceOf VolitionalAct involves Heart hasSpecificConsequence (BodyProcess which < isSpecificFunctionOf Heart hasProcessActivity (ProcessActivity which hasQuantity (Level which hasMagnitude undetectedLevel)) >) hasPathologicalStatus pathological >)>))) (And back again)
Intermediate RepresentationSemantic Expansion ‘DESCRIPTORS’ ‘LINKS’ GALEN Common Reference Model GALEN Common Reference Model transethmoidal leg excision - action pituitary structure open partial etc. HAS_APPROACH ACTS_ON SITE HAS_EXTENT HAS_LOCATION IS_PART_OF etc. (Route which passesThrough EthmoidSinus) LowerExtremity Excising PituitaryGland surgicallyOpen partial ... hasSubprocess (Approaching hasMeans ... ...hasSubprocess (Approaching hasMeans (Route passesThrough ... ...hasSubprocess (Approaching hasMeans (TranstubalRoute hasDirection... MAIN excisionaction HAS_APPROACH transethmoidal SITE pituitary structure
Intermediate Representation Context sensitive substitution Descriptor Mappings Link Mappings RUBRIC ‘Transethmoidal hypophysectomy’ SOURCE ‘READ’ CODE ‘71000’ MAINexcision action HAS_APPROACHtransethmoidal SITEpituitary structure (SurgicalDeed which isMainlyCharacterisedBy (performance whichG isEnactmentOf ((Excising which playsClinicalRole SurgicalRole) whichG < hasSpecificSubprocess (SurgicalApproaching whichG hasPhysicalMeans ((Route which passesThrough EthmoidBone))) actsSpecificallyOnPituitaryGland>))) hasProjection (('READ' schemeVersion 'default') code '71000' 'code'); extrinsically hasDissectionRubric 'READ 71000 Transethmoidal hypophysectomy'.
Intermediate Representation Supporting Infrastructure Dissections compare New Links Descriptors Derived Classification ClaW: Check & Iterate Links New Descriptors Mapping GRAIL Expansion Expanded Dissections Existing Classification So where did the complexity go? Dissection Library Expansion Algorithm