1 / 40

Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Dep

Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University ABERDEEN AB24 3FX Tel: +44 (0)1224 272296 Email: d.sleeman@abdn.ac.uk WWW: http//www.csd.abdn.ac.uk

stian
Download Presentation

Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Dep

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University ABERDEEN AB24 3FX Tel: +44 (0)1224 272296 Email: d.sleeman@abdn.ac.uk WWW: http//www.csd.abdn.ac.uk Acknowledgements: EPSRC support for the AKT Consortium Students: Eugenio Alberdi, David Corsar, Andy Aiken, Mark Winter

  2. OVERVIEW of TALK I: Context: Advanced Knowledge Technologies (AKT) Consortium II:Co-operative Knowledge Acquisition & Knowledge Refinement Systems. III: ReTAX system IV: The REFINER++ System Questions / Discussion

  3. I: AKT’s CHALLENGES Knowledge Acquisition Knowledge Maintenance KnowledgeModelling Life Cycle, Integration Issues & Testbeds Knowledge Reuse Knowledge Publishing Knowledge Retrieval

  4. II: Co-operative KA & Knowledge Refinement Systems • Knowledge-Based systems inevitably require a sizeable amount of • domain knowledge. This can be acquired from: • domain experts (KA) • detailed examples (using ML techniques) etc • However for complex tasks these KBs are inevitably • incomplete when further Knowledge-Acquisition is needed; • inconsistent when the KB needs to be refined. • also it is likely that background knowledge will be incomplete; thus requiring an expert to act as an oracle. • Hence the need for: Co-operative (Problem Solving) Knowledge • Acquisition & Knowledge Refinement Systems

  5. II: Co-operative KA & Knowledge Refinement Systems KRUST (Classical KB; Classification) (Susan Craw) STALKER (Efficient Truth Maintenance based system; Classification) (Leo Carbonara) REFINER/Refiner++ / R5 (Case-base; Classification) (Sunil Sharma; Mark Winter; Andy Aiken) RETAX (Revision of Taxonomies) (Eugenio Alberdi; David Corsar) CRIMSON (Refinement of Constraints) (Mark Winter) TIGON Time Series Data/Causal Model (Diagnosis) (Fraser Mitchell) SALT+ Rules & Constraints; Propose & Revise (Piero Leo) References see - WWW: http//www.csd.abdn.ac.uk

  6. II: Co-operative KA & Knowledge Refinement Systems KRUST & Wine Adviser STALKER REFINER+ Attendance at Medical Clinics & Stock control CRIMSON/ConRef Stock control RETAX Botanical Taxonomies TIGON Turbines (Fault Detection & Diagnosis) SALT+ Elevators/Lifts References see - WWW: http//www.csd.abdn.ac.uk

  7. III: RETAX+ • The heuristics in RETAX are based on a study to determine how Botanists reacted to a rogue item(s). • There are 2 (principal) rules which determine whether a taxonomy is well formed: • each child node must be more specialized that its parent • each of a node’s siblings must be unique. • Retax was used to replicate the revision of a major botanical taxonomy done “manually” in Aberdeen’s Botany dept in the 90s. • References: Middleton & Wilcox (1990) Edinburgh Journal of Botany {revision of taxonomy for Pernettya / Gaultheria} • Alberdi & Sleeman (1997) AI Journal, p257-279. • Alberdi, Sleeman & Korpi (1999) Cognitive Science Journal

  8. Vehicle Train Car Cycle Lorry Sports Car Salon Car Bicycle Motorbike Large Lorry Small Van Smaller Van

  9. RETAX+ Let’s refer to a new object/node as N, the existing hierarchy/tree as T, and the potential parent node as P. Then possible operations are: • Is T well formed? (If not report nodes which violate the rules.) • {E.G., If Sibling nodes N1 & N2 are equal, then merge the 2 nodes.} • Is N already in T? • Assuming T is well-formed, to which parent node, P, can N be attached without causing T to be rearranged or N modified? (Answer could be none) • What changes have to be made to N to make it a “legal” child of node P? • What changes have to be made to T so that N can be a child of P? • Combinations of the last 2 operations

  10. ReTAX • Ericaceae • Arctostaphylos Arbutus Pernettya Leucothoe Gaultheria Agauria Andromeda • A. uva-ursi A. unedo P. tasminica G.oppositfolia G. rupestris G. antipoda A. polifolia

  11. ReTAX • - Historical: In Bentham & Hooker’s (1876*) classification the main differences detected between the Pernettya & Gaultheria genera were type of fruit and succulence of the calyx features. • *G Bentham & JD Hooker (1876). Genera Plantarum, Vol II, Part2. (Publ: Reeves & Co, London) • - Subsequent botanical investigations in the 20th Century challenged this analysis, but did not suggest any further distinguishing features for the 2 genera; hence the 2 genera were combined, (Middleton & Wilcox, 1990).

  12. ReTAX • Simulation (Simplified) • - The descriptions of several species of the Pernettya & Gaultheria genus were replaced by others with revised features (descriptors) which effect the definitions of the parent nodes (P +G) • - When parent nodes (Pernettya & Gaultheria) are found to be the same, the system checks a set of other features (further facility of ReTAX) to see if they are distinctive & when no differences are found, the 2 nodes (P+G) are collapsed

  13. RETAX+: Current / Future activities • Use with other experts to help them formulate / refine taxonomies (eg other aspects of botany, microbiology) • Use RETAX+, or a variant, to formulate / refine ontologies (eg medical terminologies). This has resulted in the Protégé RepairTAB which detects inconsistencies on OWL Ontologies & gives advice about removing inconsistencies. (Lam, Sleeman, Pan, & Wasconcelos (2008) Journal of Data Semantics)

  14. IV: REFINER++ System • The Refiner++ algorithm • Sample dataset • Interaction with experts • Current / future work

  15. The Sample Dataset

  16. The Refiner++ Algorithm • Each case is assigned to a category • Category descriptions are inferred from the case values • When a case matches a category it was not assigned, by the expert, this is an inconsistency • While inconsistencies exist… • A selection of disambiguation strategies are suggested • The user chooses a strategy to be performed • The list of inconsistencies is re-evaluated • The refined dataset is now consistent

  17. Generating Descriptions • Generalise each field • Numeric: range from lowest to highest • String: set of all unique items • Taxon: nearest common parent • Boolean: set of all unique items from the set {‘true’, ‘false’, ‘any’} • Combine to get category description

  18. Category Descriptions • There are inconsistencies: • Cases 4 and 5 match A • Case 7 matches B • We need to remove the overlap

  19. Disambiguation Strategies • Change values for certain cases • Remove values from a category (eg, create a disjunction) • Reclassify a case • Make a case match an additional category • Shelve a problem case • Add a new field

  20. Refiner++ C2 C1 C3

  21. Strategies for this problem • Change value of DBP in case 7 to 90 • Change value of DBP in case 5 to 95 • Reclassify case 7 to category B • Add case 7 to category B • Shelve case 7 • Change value of Disease in cases 3 and 7 to D3 • Reclassify cases 4 and 5 to category A • Add cases 4 and 5 to category A • Shelve cases 4 and 5 • Add a new field

  22. Strategy Ordering • Typically, many strategies are suggested • We need heuristics to order them • Ordered by number of times suggested; prefer strategies which are suggested many times • Ordered by number of cases affected; prefer strategies which affect fewer cases

  23. The Refiner++ Main Screen

  24. Scalability • Measured the time taken to • perform validation on • randomly-generated datasets • with varying numbers of • cases, fields and categories • For most datasets, time taken • is under 1 second

  25. Use of REFINER++ by Experts* • Refiner++ has been used with various experts including: • Pain Control Expert (Anaesthesiology) • Child psychologist • High Dependency Unit (HDU) Physician • * KCAP-2003 paper (Aiken & Sleeman)

  26. Pain Control • Pre-existing Access dataset on epidural patients • Many cases, lots of fields / descriptors • Refiner++ imported the data (almost) perfectly • Expert categorised cases based on the length of the epidural (in days) • REFINER++ took only a few seconds to create category descriptions and validate • But…

  27. Pain Control • Hundreds of inconsistencies found • Hundreds of strategies suggested • Almost all which were ‘change value’ • Why did it not work better? • Subjective nature of the subject domain. • Categories were contiguous

  28. Child Psychology • The session was a series of anecdotes and outlines of specific cases • Three types of cases were identified: • Severely autistic • Mildly autistic • Difficulties with language development

  29. Child Psychology • The expert stated that autistic children usually had the • following characteristics: • Problems with language and verbal communication • Problems with social interaction • Obsessive behaviour • These characteristics were abstracted by the knowledge • engineers and subsequently confirmed with the expert • The expert showed no inclination to use REFINER++, but a case set was created by the knowledge engineers

  30. HDU • Task poised by domain expert: when to move high dependency unit (HDU) patients to a general ward, or the intensive care unit (ICU), or leave them in the HDU. • Used Refiner++ with three datasets one for each condition (cardiac, neuro & respiratory) • Expert did not use the system but did dictate the descriptors & the sets of cases to the knowledge engineers who typed this information into REFINER. • Refiner++ found 2 categories were consistent; & in the third identified inconsistencies

  31. Inconsistent Dataset

  32. Category Descriptions • There are inconsistencies: • Case 1 matches Category SAME • Case 4 matches Category HIGHER • We need to remove the overlap • Refiner++ suggested lower and upper ‘danger zones’ for each field

  33. Future Work: Use with Domain Experts • Make the system’s GUI more intuitive (some changes already made) • Ask expert to come along to the session with a document which summarizes the main features of the dataset they wish to discuss. (In session ask them to highlight principal concepts) • For each domain expert contacted, record an AVI session of a simple but related domain (eg simple childhood diseases before approach a paediatrician) (demo)

  34. Current Work (ICU domain) • Developed system which is statistically based, so given a case description it returns the likelihood of that case belonging to one of the predefined categories (R5: Andy Aiken) • Acquired data set of patients’ physiological parameters from an ICU DB, and have clinicians assign patients on day-by-day & hour-by-hour to a 5-point severity score. (Develop in conjunction with Glasgow Royal Infirmary) • Using R5 with the above data set to assign new patient reports to a severity class. (Practically important as the descriptors include clinical interventions which “standard” scales don’t.) • Identify & analyse (explain) anomalous / unusual cases (segments of cases)

  35. VI: Dimensional Analysis ?? • Outline issue • Pointer to TR • Pointer to WWW systems / sources

  36. Questions/Comments

  37. V: (Causal) Explanations for Anomalous Medical cases • Discuss ICU context • Experiment to detect Anomalous cases / sections of cases • Outline a typical investigation

  38. V: Seeking to Explain an anomalous Observation • EXPECTED: An injection of X will cause the heart (Organ, O) to increase its contraction rate within T seconds. • SUPPOSE that does not happen, then here are some of the investigations which might be performed: • Is the injection being given effectively • IF so then check whether the drug X is being transported to Organ, O • Is the transport path physically / bio-chemically blocked? • Is the transport mechanism inhibited slowed down? • IF the drug is actually arriving at Organ O & the conc is OK, then investigate: • Is the drug mechanism within the organ being blocked? • Is the organ for some reason unable to respond in the usual way (eg weaken heart muscle)

More Related