330 likes | 429 Views
KITE Current Design and Roadmap. IBM Research J. William Murdock Christopher Welty David Ferrucci. Last Update: Mar. 6, 2006. Background. Transforming Knowledge. The transformation of knowledge from one form to another requires the explicit mapping across ontologies. Relation (ManagerOf).
E N D
KITE Current Design and Roadmap IBM Research J. William Murdock Christopher Welty David Ferrucci Last Update: Mar. 6, 2006
Transforming Knowledge The transformation of knowledge from one form to another requires the explicit mapping across ontologies. Relation(ManagerOf) Source Ontology Entity (Person):Fred Center Entity (Organization):Center Micros KITE Mapping Plugins Person(?x) ^ ManagerOf(?x, ?y) Executive(?x) Organization(?x) SocialAggregate(?x) ManagerOf(?x, ?y) hasManager(?y, ?x) Target Ontology Executive:Fred Center SocialAggregate:Center Micros hasManager
Motivation: Why Transform Knowledge? • Different systems have different ontologies and/or different representational schemes • Sometimes those differences are arbitrary • Other times they are specifically motivated by differences in the purposes of the systems • In either case, interoperation requires that knowledge be transformed
Reference Scenario: Transforming extracted knowledge • Transforming extracted knowledge into a form suited for reasoning. • Representations and ontologies for legacy extractors tend to be radically different from those for legacy reasoners. • Those differences are generally dramatic and are motivated by significant functional issues. • Extraction ontologies tend to be very close to how things are expressed in language. Types are grouped by how instances of those types can be described. • Reasoning ontologies tend to permit parsimonious rules. Types are grouped by the inferences that can be drawn over them. • A powerful/flexible framework is needed to resolve these differences. • This is not the only use for KITE, but it is an important use.
KITE-based applications Mapper Plugin(s) TargetRepository SourceRepository Target Data Source Data Source Plugin Target Plugin Provenance Plugin OntologyLanguage Plugin OntologyLanguage Plugin ProvenanceRepository SourceOntology TargetOntology
Building KITE applications • Framework provides: • API’s for: • Mapper plugins • Source plugins • Target plugins • Provenance plugins • Language plugins • Classes for Data • Top-level control from sourcemappertarget • Some broadly applicable plugins (of each of the types) • Application developer provides: • Configuration for some of KITE’s broadly applicable plugins • New, application specific plugins (if needed) MapperPlugin(s) Source Plugin Target Plugin Provenance Plugin OntologyLanguage Plugin OntologyLanguage Plugin
Some Built-in Broadly Applicable Components • Aggregate mappers that provide control flow • Selection aggregate: Runs the first applicable delegate • Cascade aggregate: Runs each delegate in order • Configurable primitive mappers • e.g., Table lookup: Configured with a table of one-to-one sourcetarget mappings • EKDB source, target, and provenance plugins • “Lispy” source and target plugins • UIMA type system ontology plugin • OWL ontology plugin
Broad-class of KITE applications: UIMAOWL UIMA Analytics(recognition, coreference, etc.) OWL Tools(Protégé, reasoners, etc.) Mapper Plugin(s) UIMAAnalysisResults OWLStore Target Data Source Data Source Plugin Target Plugin Provenance Plugin UIMA Type System Plugin OWL Ontology Plugin ProvenanceRepository TypeSystem OWLOntology
KITE Plugins: UML class model interface IntegratorPlugin interface Mapper interface IntegratorResource Collection<Data> map(Data) close() <<send>> <<send>> MapperNotApplicableException interface OntologyResource interface AggregateMapper addMapper(Mapper); init(Integrator); Collection<Data> integrateDeferred() bool subsumesClass(String, String) bool subsumesProperty(String, String) ... MapperNotApplicableYetException OwlModelResource TypeSystemResource TableMapper SelectionAggregate CascadeAggregate OntModel model TypeSystem typeSystem TypeNameMapper interface TargetResource interface SourceResource write(Collection<Data>) Iterator<Instance> instanceIterator() Iterator<Tuple> tupleIterator() IdentityMapper interface ProvenanceResource write(Collection<Data>)
KITE Data: UML class model Data String typeNameString id Tuple Instance List<String> arguments LabeledInstance String canoncialFormString[] variantForms (assorted convenience methods & data structures not shown)
interface Mapper Developing mappers in KITE Collection<Data> map(Data) <<send>> <<send>> MapperNotApplicableYetException MapperNotApplicableException • map method takes source Data and returns any number of target Data items • map throws: • MapperNotApplicableException: Indicates that the mapper can not be run at all on this data • MapperNotApplicableYetException: Indicates that the mapper could be run on this data in a different context; recommends that the caller try again later (NationalGovernment uid11a) (Nation uid11) (GeographicRegion uid11b) Mapper (governs uid11a uid11b)
Example primitive mapper: One-to-one lookup table PERuid105 public Collection<Data> map(Data d) throws MapperNotApplicableException { String sourceType = d.getTypeName(); if (!table.containsKey(sourceType)) throw new MapperNotApplicableException(); String targetType = table.get(sourceType); Instance i = new Instance(type, d.getId()); List<Data> retval = new LinkedList<Data>(); retval.add(i); return retval; } Personuid105
Aggregate Mappers • An aggregate mapper is composed of delegate (i.e., lower level) mappers that may be primitive or aggregate • KITE provides two built-in aggregate mapper plugins: • Selection aggregate: The first delegate mapper that applies to the data item is applied and the other mappers are ignored • Cascade aggregate: Each delegate mapper is run in sequence; the output of each is an input to the next • KITE also provides an API for developers to build their own aggregate mapper plugins. interface Mapper Collection<Data> map(Data) interface AggregateMapper addMapper(Mapper); init(Integrator); Collection<Data> integrateDeferred()
Selection Aggregate Mapper The first delegate mapper that applies to the data item is applied and the other mappers are ignored (Date uid15) Selection Aggregate (Vehicle uid16) Temporal Entity Mapper (Primitive) Physical Entity Mapper (Primitive) (TemporalInterval uid15a) (TransportationDevice uid16a)
Cascade Aggregate Mapper • Each delegate mapper is run in sequence • The output of each is an input to the next • Results accumulate Cascade Aggregate (Nation uid11) Political Entity Mapper (Primitive) Geospatial Entity Mapper (Primitive) (NationalGovernment uid11a) (GeographicRegion uid11b) (governs uid11a uid11b) • Later mappers can be defined in terms of the target ontology • Especially useful if the target ontology is designed for reasoning
EKDB HUTTKANI: A complex KITE application HUTTKANI Aggregate Mapper EKDBExtractionTables EKDBRDF Tables Target Data Source Data EKDB Extraction Source Plugin EKDB RDF Target Plugin EKDB ExtractionRDFProvenance Plugin UIMA Type System Plugin OWL Ontology Plugin EKDBProvenanceTable HUTTType System KANI OWLOntology
EKDB HUTTKANI: A complex KITE application[simplified, for paper] HUTTKANI Aggregate Mapper UIMAExtractionDatabase RDF StoreDatabase Target Data Source Data UIMA Extraction Database Source Plugin RDF StoreTarget Plugin ExtractionRDFProvenance Plugin UIMA Type System Plugin OWL Ontology Plugin UIMA/RDFProvenanceDatabase HUTTType System KANI OWLOntology
EKDB HUTTKANI: A complex KITE application HUTTKANI (Selection Aggregate) HUTTKANI Aggregate Mapper HUTTKANI lookup-table (Cascade Aggregate) HUTTKANItype name matching (Cascade Aggregate) HoldsDuringMapper (Primitive) Table Mapper (Primitive) Type Name Matcher (Primitive) RDF labels (Primitive) HUTTKANIad hoc (Primitive) OWL-Time (Primitive) TimeSlice (Primitive)
KITE for Queries • In some cases, the ontology in which a user (or an automated system) poses a query is different from one in which data is encoded. • Some KITE applications (e.g., NIMD knowledge integrator) handle this by mapping the data at indexingtime. • Other KITE applications map the query at run time.
Example: KITE for JuruXML Queries <NationalGovernment> Republic Angola</NationalGovernment> <GeographicRegion>Republic Angola</GeographicRegion> (National-Government uid1a) <Nation> Republic Angola</Nation> (KEYWORD uid2a“Republic”) (KEYWORD uid3a“Angola”) (Nation uid1) MapperPlugin(s) JuruXML Source Plugin JuruXML Target Plugin (CONTAINS uid1auid2a uid2b) (KEYWORD uid2“Republic”) OntologyLanguage Plugin OntologyLanguage Plugin (KEYWORD uid3“Angola”) (Geographic-Regionuid1b) (CONTAINS uid1uid2 uid3) (KEYWORD uid2b“Republic”) ...
Mapping types in KITE • KITE is typically used to map concrete data (instances), but it can be used to map types in an ontology (meta-instances) • For example, KITE can map a UIMA Type System Descriptor into an OWL RDF ontology • KITE built-in UIMA KLT source plugin produces one KITE “instance” for each entity type, plus KITE tuples for each relation type, and plus tuples for parents of types • KITE built-in OWL model target plugin takes a stream of tuples and writes them to an OWL RDF file • With KITE built-in “identity” mapper: a direct translation • With other mappers: a partial/complex translations • In some cases, the mappers can then be reused to map instances across the two ontologies • In other cases, mapping instances may depend on contextual issues that are not relevant to mapping types
Example: Mapping UIMA types to OWL classes/properties <typeDescription> <name>org.example.Nation</name> <supertypeName>org.example.Place</supertypeName> </typeDescription> <typeDescription> <name>org.example.Place</name> <supertypeName>org.example.TopEntity</supertypeName> </typeDescription> <owl:Class rdf:about=“example:Country"> <rdfs:subClassOf> <owl:Class rdf:about=“example:Place"/> </rdfs:subClassOf> </owl:Class> (org.example.Nation uid1) (example:Country uid1a) UIMA Knowledge-Level Types Source Plugin MapperPlugin(s) OWL TargetPlugin (org.example.Place uid2) (example:Place uid2a) (PARENT uid1 uid2) (PARENT uid1a uid2a) UIMA Type System Plugin OWL Ontology Plugin
Future Objectives • Recall that the existing framework provides: • API’s for plugins (mappers, sources, targets, etc.) and classes for input/output data • Control flow code • Some broadly applicable plugins • Future versions of the framework will provide: • API’s and classes that are better aligned with established products and standards (e.g., UIMA, Ecore) • Control flow that is more scalable • More built-in plugins (e.g., target plugins for existing RDF storage systems)
Tighter integration with UIMA • Many of the capabilities of KITE seem very similar to capabilities already found in UIMA. • e.g., KITE allows developers to build an aggregate mapper and specify some control flow among delegate mappers; UIMA allows similar functionality for analytics. • If we could reuse some of that functionality, we could leverage existing UIMA infrastructure and tool support. • Furthermore, recall that our reference scenario involves transforming extracted knowledge. • UIMA is frequently used for extraction. • Thus developers working on our reference scenario are likely to be familiar with UIMA; easier for them to “get up to speed” on KITE if we are reusing UIMA capabilities in KITE.
UIMA Integration Level 1: UIMA data structures & API’s • KITE defines various interfaces and classes (plugins, data, etc.). However, many elements of UIMA serve similar purposes, e.g.: • We could redefine KITE to use the corresponding UIMA structures instead of its own customized structures. • This would allow us to use UIMA descriptor language, corresponding tool support, etc.
UIMA Integration Level 2: UIMA control flow • If KITE plugins were UIMA components, then presumably the UIMA collection processing manager (CPM) could provide flow control among them • Flow from source mapper target is handled well by UIMA’s built in “fixed flow.” • Flow within an aggregate mapper in KITE is more complex. • Cascade aggregate is essentially “fixed flow” with deferment • Selection aggregate is a different flow and also requires deferment • Fortunately, flow control is a pluggable element of the UIMA framework. • Thus (presumably) the KITE built-in aggregate mapper types could be written by KITE developers as UIMA flow control plugins. • If KITE application developers wanted their own aggregate mappers, they could develop their own UIMA flow control plugins.
ECore Integration Data String typeNameString id • Recall the UML model for KITE data: • There are many existing standards for storing instances and links among them. • ECore is one such standard that has a great deal of existing tool support. • UIMA interoperability with ECore is currently under development. • Maybe we should use ECore for KITE data. Tuple Instance List<String> arguments LabeledInstance String canoncialFormString[] variantForms
Larger Scale • We have been using the KITE-based “EKDB HUTTKANI” application for a 2006 evaluation being conducted by National Institute of Standards & Technology • Input: ~580 thousand entities, ~450 thousand relations extracted from a 169MB text corpus with 37,442 documents • The KITE-based application takes about 2 hours to run on this data. • It requires more than 1.5 GB of Java heap space and thus can only run on a 64 bit computer. • This application must be faster and more memory efficient if it is to effectively scale to multi-GB corpora. • Some improvements will be local to specific plugins used in the application (e.g., EKDB RDF Target Plugin) • Other improvements may involve more fundamental alterations to the KITE architecture
Open Questions • What are the top priorities for future development? • What external requirements are driving deadlines? • NIST evaluation • Commercialization of SAW • Others? • What is the timeline?