460 likes | 645 Views
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views. Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University. Funded by NSF. Leverage this …. … to do this. Information Exchange. Source. Target. Information
E N D
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF
Leverage this … … to do this Information Exchange Source Target Information Extraction Schema Matching
Presentation Outline • Overview • Matching (Direct) • Matching (Derived) • Matching Algorithm • Summary
Requirements • f is an injective function. • f maps obj. sets to obj. sets and rel. sets to rel. sets • f respects rel-set arities. • f respects referential integrity. • f respects types. • f respects real-world identity. • f ’s coercions are G/S compatible. • f respects subset constraints. • f respects mutual-exclusion constraints. • f respects union constraints
User Interaction(IDS Statements) • Issue • Explains the issue • Example: units, may need transformation • Default • Explains the default option • Example: if no transformation, no conversion • Suggestion • Gives a suggestion about how to resolve the issue • Example: if needed, specify the conversion
Theorem Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …
Matching (Direct) • Object Sets • Relationship Sets
Object-Set Type Compatibility • <a, b> • type(a) = type(b) • type(a) type(b) • type(a) type(b) • type(a) type(b)
type(a) = type(b) • Same type • string = string, but Airport Head Of State • Need better matching techniques • Same type, different units • Size Nr Sq Km • Need unit conversion • Same type, different format • Date Date, but 01/02/2002 Jan 2, 2002 • Need format conversion • Same type, same units and format, different assumptions • Altitude Altitude, but altitude of aircraft and spacecraft differ • Need same assumptions • Same type, same units and format, same assumption, OIDs
type(a) type(b)and type(a) type(b) • Real Integer or Video Image • Target has greater discriminating power • Can add .0 or make a video of a single image (?) • Integer Real or Image Video • Source has greater discriminating power • Can round off or select one of the frames (?)
type(a) type(b) • Image String • Mismatch, even if same attribute (e.g. both City) • Types can help discard potential matches • String(5) Integer • But suppose the integer is 2 • Might work, but is “2.000” ok?
Relationship Match Requirements • Referential integrity • Constraints • Cardinality • Mandatory/Optional
a b Referential Integrity a’’ a’ . . . . . . The types of a, a’, and a’’ can all be different, but not arbitrary. Example: a (String), a’ (Integer), a’’ (Real). b’ Target Source
Relationship-Set Constraint Compatibility • <a, b> • constr(a) <=> constr(b) • (constr(a) <= constr(b)) (constr(a) => constr(b)) • (constr(a) <= constr(b)) (constr(a) => constr(b)) • (constr(a) <= constr(b)) (constr(a) => constr(b))
owns o o Person Car o o drives ? Person Car o o constr(a) <=> constr(b) Need more information to resolve: Perhaps “?” is “purchased.”
City City City Map City Map a b (constr(a) <= constr(b)) (constr(a) => constr(b)) The target (a) expects many maps, but the source can’t supply them.
City City City Map City Map (constr(a) <= constr(b)) (constr(a) => constr(b)) a b The target (a) expects one map, but the source can supply many.
City City City Map City Map (constr(a) <= constr(b)) (constr(a) => constr(b)) o a b The target (a) expects at least one and potentially many maps, but the source may have none or at most one.
Matching (Derived) • Generalization/Specialization • Composite Values • Derived Relationship Sets • Displayable/Nondisplayable Object Sets
Generalization/Specialization • For a target object set, a source object set may: • have no overlap (just ignore) • have a proper subset (accept or find missing generalization) • have the same values (direct match) • have a proper superset (hard, except for roles) • overlap (like proper subset and proper superset) • Consider roles and missing generalizations
o o o o Roles Travel Video City target: Video With City Scene City Clip: Video source: Video With City Scene
Map: Image Map: Image Missing Generalization City Map Country Map City Map: Image Country Map: Image target source
Composite Values • Composite in Source (split) • Composite in Target (merge) • Examples of Derived Relationships
Video Video Nr Hours Nr Minutes Time Nr Hours Nr Minutes Composite in Source target source Note also that we generated a source path.
Video Video Nr Hours Nr Minutes Nr Hours Nr Minutes Composite in Source target source
Video Video Nr Hours Nr Minutes Time Time Composite in Target target source
Video Video Time Time Composite in Target target source
Displayable/NondisplayableObject-Set Matches • Nondisplayable in Source: find a key • Nondisplayable in Target: create a key
Nondisplayable in Source Airline flys to Airport Airport serves City target source No Key: Discard Match
Nondisplayable in Source Airline flys to Airport Airport serves City target source No Key: Discard Match
Nondisplayable in Source Airline Airport Name flys to Airport Airport serves City target source One Key: Choose it
Nondisplayable in Source Airline Airport Name flys to Airport Airport serves City target source One Key: Choose it
Nondisplayable in Source Airline Airport Name flys to Airport Airport serves Airport Code City target source Two or more Keys: Choose One
Nondisplayable in Source Airline Airport Name flys to Airport Airport serves Airport Code City target source Two or more Keys: Choose One
Pictorial View of Match Table target source
Concluding Remarks • QED (the theorem holds) Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …
Pictorial View of Match Table f = the mapping t’ = submodel t = target t’ has a valid interpretation s = source
Concluding Remarks • QED (the theorem holds) • Merge (several sources) • All sources extracted to same view • Union merge • Object identity problems • Constraint problems • Source Modeling (convert to OSM) • Framework defined, but not implemented