180 likes | 312 Views
An Ontology for Domain-oriented Semantic Similarity Search On XML Data. Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de http://www-dbs.cs.uni-sb.de/. (BTW) February 25 – 28, 2003 Leipzig, Germany. Motivation. movie. astronomy. sports. Query on Web Data:.
E N D
An Ontology for Domain-oriented Semantic Similarity Search On XML Data Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de http://www-dbs.cs.uni-sb.de/ (BTW) February 25 – 28, 2003 Leipzig, Germany
Motivation movie astronomy sports Query on Web Data: Ranking based on content data and structure (XML,…) Using Ontologies for similarity search Grouping results by their topics
Outline 0. Why we need Ranked Retrieval and Ontologies? 1. XXL Search Engine 2. Ontologies - a Linguistic Challenge 3. Graph-based Ontology 4. Quantification: Edge Weights 5. Similarity of Ontology Nodes 6. Ontology-based Query Processing
XXL Search Engine … XML Document <galaxy> <object> <description>sun</> <appearance>…light and heat…</> <location>…</> … </object> <history> … </> … </galaxy> … Crawler EPI Handler Path Indexer EPI ECI Handler Content Indexer Visual XXL ECI Query Processor WWW Name Ontology Indexer Name Ontology Handler NOI Content Ontology Indexer Content Ontology Handler COI XXL Query: SELECT * FROM INDEX WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“
sense: ...a celestial body of hot gases... refers to symbolized word: star object: stands for Ontologies – a linguistic challenge ontology: ...representational vocabulary of words including hier- archical relationships and associative relationships between these words [Gruber93]...
Word – Sense – Synset words w Σ* + word senses U = {(w,s) | w Σ*, s S: word w has sense s} + synonym relationship synset(s) = { w | (w,s) U}
synset(s) = { w | (w,s) U} // U = {(w,s) | word w has sense s} abstraction entity, physical thing attribute object, physical object shape, form natural object figure celestial body, heavenly body plane figure, 2-dim. figure star synset(s): star sense s: sense 4: a plane figure with 5 or more points… sense 1: (astronomy) a celestial body of hot gases… Disambiguation: Synset – Category + hypernym relationship category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)}
Disambiguation: Synset – Category synset(s) = { w | (w,s) U} // U = {(w,s) | word w has sense s} + hypernym relationship category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)} abstraction entity, physical thing attribute object, physical object shape, form natural object figure celestial body, heavenly body plane figure, 2-dim. figure star synset(s): star sense s: sense 4: a plane figure with 5 or more points… sense 1: (astronomy) a celestial body of hot gases…
Example Ontology entity, physical thing [entity, physical thing] group, grouping [group, grouping] abstraction [abstraction] [0. 71] food [substance, matter] universe, cosmos [collection,...] [0.83] [0.94] star [plane figure, 2-dim figure] milk [foodstuff, ...] natural object [object,...] galaxy, ... [collection,...] cows‘milk [milk] star [celestial body,...] hexagram [star] milky way [galaxy,...] Beta Centauri [star] sun [star]
x = (synset(s), category(s)) V e = (x,y, type, weight) E • word: ... extracted from a document ... extracted from an existing thesaurus (interchangable!!!) • category, type: • weight: ... expresses semantic similarity of connected words • sim: ... expresses semantic similarity of ontology nodes Graph-based Ontology Ontology G=(V,E) Construction: Use:
semantic similarity of connected synsets according to their concepts vector space measures / probabilistic measures DICE coefficient: …using web search engines for word frequencies… galaxy, extragalactic nebula [collection,aggregation,accumulation,assemblage] X := (coll … ass) (galaxy extr…) Y := (cel heav) (star) [0.172] X Y := X Y star [celestial body,heavenly body] [0.113] sun [star] Quantification: Edge Weight
entity [entity] group [group] [0.1] protein [macromolecule] universe [collection] sim(milky way, sun) [0.1] |p|=3: 3/3 0.6 + 2/3 0.5 + 1/3 0.8 = 1.2 [0.3] milk [liquid] natural object [object] galaxy [collection] [0.6] [0.2] [0.5] [0.6] star [celestial body] cows‘ milk [milk] milky way [galaxy] [0.8] Beta Centauri [star] sun [star] Similarity of Ontology Nodes
entity [entity] group [group] [0.1] protein [macromolecule] universe [collection] sim(milky way, sun) [0.1] |p|=3: 3/3 0.6 + 3/3 0.8 + 2/3 0.5 + 2/3 0.5 + 1/3 0.6 = 1.3 1/3 0.8 = 1.2 [0.3] milk [liquid] natural object [object] galaxy [collection] [0.6] [0.2] [0.5] [0.6] star [celestial body] cows‘ milk [milk] milky way [galaxy] [0.8] Beta Centauri [star] sun [star] Similarity of Ontology Nodes
entity [entity] group [group] [0.1] protein [macromolecule] universe [collection] sim(milky way, sun) [0.1] |p|=3: 3/3 0.6 + 3/3 0.8 + 2/3 0.5 + 2/3 0.5 + 1/3 0.6 = 1.3 1/3 0.8 = 1.2 [0.3] milk [liquid] natural object [object] galaxy [collection] [0.6] [0.2] [0.5] [0.6] sim(milky way, sun) = 0.42 star [celestial body] cows‘ milk [milk] milky way [galaxy] sim(milky way, cows‘ milk) = 0.2 [0.8] Beta Centauri [star] sun [star] Similarity of Ontology Nodes
XXL Query: XML Documents: … <galaxy> <object> <description>sun</> <appearance>…light and heat… </appearance> <location>…</> … </object> <history> … </> … </galaxy> … ... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ XXL Query Representation: ~universe % % ~appearance ~ “star” Ontology-based Query Processing
XXL Query: XML Data Graph: ... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ galaxy 0.94 XXL Query Representation: 1.0 sim(universe, galaxy) object history ~universe description location 1.0 appearance 1.0 % % sim(app, app) ~appearance “…light and heat…” sun 0.43 ~ “star” sim(star, sun) * tfidf(sun) Ontology-based Query Processing
XXL Query: XML Data Graph: ... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ galaxy 0.94 XXL Query Representation: 1.0 sim(universe, galaxy) object history ~universe description location 1.0 appearance 1.0 % % sim(app, app) ~appearance “…light and heat…” sun 0.43 ~ “star” sim(star, sun) * tfidf(sun) Ontology-based Query Processing (result graph) = 0.4
- ENDE - Vielen Dank! Gibt es etwa noch Fragen?