200 likes | 318 Views
The Use of Machine-Generated Ontologies in Dynamic Information Seeking. Giovanni Modica Avigdor Gal Hasan M. Jamil. Motivating example. Preliminaries. Definition : An ontology is an explicit representation of a conceptualization. (Gruber 1993)
E N D
The Use of Machine-Generated Ontologies in Dynamic Information Seeking Giovanni Modica Avigdor Gal Hasan M. Jamil CoopIS’2001 Trento, Italy
Motivating example CoopIS’2001 Trento, Italy
Preliminaries • Definition: An ontology is an explicit representation of a conceptualization. (Gruber 1993) • Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology. • Observation: Application in a given domain use different ontology representation. • Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that fA(OA)=O CoopIS’2001 Trento, Italy
OA= fA-1(fB(OB)) Problem description • Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that fBA (OB)=OA • In a perfect world: • O is known. • fA is known. • fB is known. • Alas: • O is unknown. At best, an approximation of O exists, in a form of a standard. • fA and fB are unknown: lack of documentation, the mental state of a designer, etc. CoopIS’2001 Trento, Italy
Proposed solution • Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that • fBA depends on the ontology representation. • A matching is associated with a “degree of confidence” in the matching. • 0 identifies non-matching terms. • 1 identifies a crisp matching. CoopIS’2001 Trento, Italy
Ontology representation • Dynamic information seeking: • HTML forms • Labels • Input fields • Scripts • Assumptions: • Labels represent terms in an ontology (e.g., Pick-up Date). • Input fields provide constraints on the value domains (e.g., {Day, 1,…31}). • Scripts, among other things, suggest a precedence relationship (e.g., Pick-up Locations is required before selecting a Car Type). CoopIS’2001 Trento, Italy
Ontology representation • Conceptual modeling approach • Based on Bunge: • Terms (things) • Values • Composition • Precedence CoopIS’2001 Trento, Italy
URL (e.g. http://www.avis.com) Phase 1 Parsing Phase 2 Labeling Phase 3 Ontology Phase 4 Merging KB Ontology Creation Submission HTML Parsing Refined Ontology Form Rendering Thesaurus Matching Algorithms DOM Tree Label Identification Target/Candidate Ontology CandidateOntology Target Ontology HTML Elements rules KB FORM Elements Ontology extraction and matching CoopIS’2001 Trento, Italy
Phase 1: Parsing CoopIS’2001 Trento, Italy
Phase 2: Labeling CoopIS’2001 Trento, Italy
Phase 2: Labeling CoopIS’2001 Trento, Italy
Phase 2: Labeling CoopIS’2001 Trento, Italy
Merging Heuristics for the ontology merging (Frakes and Baeza-Yates, 1992): • Textual matching: Date date Pickup pickup • Ignorable characters removal: *Country country • De-hyphenation: Pick-up Pickup Pickup Pick up • Stop terms removal: Date of Return Return Date Stop terms: a, to, do, does, the, in, or, and, this, those, that, … etc. • Substring matching: Pickup Location Code Pick-up location (66%) • Content matching: Dropoff Day (1,..,31) Return Day (1,..,31) (100%) Dropoff Return • Thesaurus matching: Dropoff Location Return Location (100%) CoopIS’2001 Trento, Italy
Phase 4: Merging CoopIS’2001 Trento, Italy
Recall: Precision: Preliminary Results • Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992): • Recall (completeness) • Precision (soundness) Parameters: • tr: number of terms retrieved • tm: number of terms matched • te : number of terms effectively matched CoopIS’2001 Trento, Italy
Preliminary Results Example: # of terms in Ontology1: 20 # of matches identified: 15 Recall: 75% (15/20) # of effective matches: 10 Precision: 66% (10/15) A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992): CoopIS’2001 Trento, Italy
Preliminary Results CoopIS’2001 Trento, Italy
Preliminary Results CoopIS’2001 Trento, Italy
Preliminary Results CoopIS’2001 Trento, Italy
Summary and Future Work • We have introduced: • Automatic ontology creation • Automatic matching process • Preliminary results • Future work oriented towards: • Incorporation of query facilities into the tool • Automatic navigation of web sites for ontology extraction • Dynamic translation between queries against the target ontology to queries against the multiple candidate ontologies CoopIS’2001 Trento, Italy