1 / 46

Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views. Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University. Funded by NSF. Leverage this …. … to do this. Information Exchange. Source. Target. Information

len
Download Presentation

Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF

  2. Leverage this … … to do this Information Exchange Source Target Information Extraction Schema Matching

  3. Presentation Outline • Overview • Matching (Direct) • Matching (Derived) • Matching Algorithm • Summary

  4. Requirements • f is an injective function. • f maps obj. sets to obj. sets and rel. sets to rel. sets • f respects rel-set arities. • f respects referential integrity. • f respects types. • f respects real-world identity. • f ’s coercions are G/S compatible. • f respects subset constraints. • f respects mutual-exclusion constraints. • f respects union constraints

  5. User Interaction(IDS Statements) • Issue • Explains the issue • Example: units, may need transformation • Default • Explains the default option • Example: if no transformation, no conversion • Suggestion • Gives a suggestion about how to resolve the issue • Example: if needed, specify the conversion

  6. Theorem Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …

  7. Target(Graphical View)

  8. Target(Textual View)

  9. Source Example(Assumed to be Populated)

  10. Matching (Direct) • Object Sets • Relationship Sets

  11. Object-Set Type Compatibility • <a, b> • type(a) = type(b) • type(a)  type(b) • type(a)  type(b) • type(a)  type(b)

  12. type(a) = type(b) • Same type • string = string, but Airport  Head Of State • Need better matching techniques • Same type, different units • Size  Nr Sq Km • Need unit conversion • Same type, different format • Date  Date, but 01/02/2002  Jan 2, 2002 • Need format conversion • Same type, same units and format, different assumptions • Altitude  Altitude, but altitude of aircraft and spacecraft differ • Need same assumptions • Same type, same units and format, same assumption, OIDs

  13. type(a)  type(b)and type(a)  type(b) • Real  Integer or Video  Image • Target has greater discriminating power • Can add .0 or make a video of a single image (?) • Integer  Real or Image  Video • Source has greater discriminating power • Can round off or select one of the frames (?)

  14. type(a)  type(b) • Image  String • Mismatch, even if same attribute (e.g. both City) • Types can help discard potential matches • String(5)  Integer • But suppose the integer is 2 • Might work, but is “2.000” ok?

  15. Relationship Match Requirements • Referential integrity • Constraints • Cardinality • Mandatory/Optional

  16. a b Referential Integrity a’’ a’ . . . . . . The types of a, a’, and a’’ can all be different, but not arbitrary. Example: a (String), a’ (Integer), a’’ (Real). b’ Target Source

  17. Relationship-Set Constraint Compatibility • <a, b> • constr(a) <=> constr(b) • (constr(a) <= constr(b))  (constr(a) => constr(b)) • (constr(a) <= constr(b))  (constr(a) => constr(b)) • (constr(a) <= constr(b))  (constr(a) => constr(b))

  18. owns o o Person Car o o drives ? Person Car o o constr(a) <=> constr(b) Need more information to resolve: Perhaps “?” is “purchased.”

  19. City City City Map City Map a b (constr(a) <= constr(b)) (constr(a) => constr(b)) The target (a) expects many maps, but the source can’t supply them.

  20. City City City Map City Map (constr(a) <= constr(b)) (constr(a) => constr(b)) a b The target (a) expects one map, but the source can supply many.

  21. City City City Map City Map (constr(a) <= constr(b)) (constr(a) => constr(b)) o a b The target (a) expects at least one and potentially many maps, but the source may have none or at most one.

  22. Matching (Derived) • Generalization/Specialization • Composite Values • Derived Relationship Sets • Displayable/Nondisplayable Object Sets

  23. Generalization/Specialization • For a target object set, a source object set may: • have no overlap (just ignore) • have a proper subset (accept or find missing generalization) • have the same values (direct match) • have a proper superset (hard, except for roles) • overlap (like proper subset and proper superset) • Consider roles and missing generalizations

  24. o o o o Roles Travel Video City target: Video With City Scene City Clip: Video source: Video With City Scene

  25. Map: Image Map: Image  Missing Generalization  City Map Country Map City Map: Image Country Map: Image target source

  26. Composite Values • Composite in Source (split) • Composite in Target (merge) • Examples of Derived Relationships

  27. Video Video Nr Hours Nr Minutes Time Nr Hours Nr Minutes Composite in Source target source Note also that we generated a source path.

  28. Video Video Nr Hours Nr Minutes Nr Hours Nr Minutes Composite in Source target source

  29. Video Video Nr Hours Nr Minutes Time Time Composite in Target target source

  30. Video Video Time Time Composite in Target target source

  31. Displayable/NondisplayableObject-Set Matches • Nondisplayable in Source: find a key • Nondisplayable in Target: create a key

  32. Nondisplayable in Source Airline flys to Airport Airport serves City target source No Key: Discard Match

  33. Nondisplayable in Source Airline flys to Airport Airport serves City target source No Key: Discard Match

  34. Nondisplayable in Source Airline Airport Name flys to Airport Airport serves City target source One Key: Choose it

  35. Nondisplayable in Source Airline Airport Name flys to Airport Airport serves City target source One Key: Choose it

  36. Nondisplayable in Source Airline Airport Name flys to Airport Airport serves Airport Code City target source Two or more Keys: Choose One

  37. Nondisplayable in Source Airline Airport Name flys to Airport Airport serves Airport Code City target source Two or more Keys: Choose One

  38. Matching Algorithm

  39. Sample Match Table

  40. Pictorial View of Match Table target source

  41. Summary

  42. Concluding Remarks • QED (the theorem holds) Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …

  43. Pictorial View of Match Table f = the mapping t’ = submodel t = target t’ has a valid interpretation s = source

  44. Concluding Remarks • QED (the theorem holds) • Merge (several sources) • All sources extracted to same view • Union merge • Object identity problems • Constraint problems • Source Modeling (convert to OSM) • Framework defined, but not implemented

More Related