200 likes | 296 Views
SemSearch : A Search Engine for the Semantic Web. Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented by Jungyeon , Yang. Outline. Research background SemSearch overview Query interface Search process Implementation & examples
E N D
SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented by Jungyeon, Yang
Outline • Research background • SemSearch overview • Query interface • Search process • Implementation & examples • Conclusions
Research background • Semantic search: extending traditional search with the semantic web technology • Exploiting the explicit meaning of documents (i.e., ontology-based metadata) • Current semantic search tools • Form-based, e.g., SHOE, Magnet • QA-based, e.g., AquaLog, ORAKEL • Keyword-based, e.g., TAP, Squiggle, DOSE
Support for ordinary end users • Form-based tools • Forms are intuitive • Issues: knowledge overhead; scalability • QA-based tools • Easy to use • Issue: heavy NLP. • Keyword-based tools • Easy to post queries; quick response • Issue: typically one keyword only; general knowledge of the problem domain required
The goal of our search engine • Hide the complexity of semantic search from end users: • Low barrier to access: easy to post queries • Avoiding the form-based routine • Dealing with relatively complex queries • Supporting multiple keywords • Precise and self-explanatory results: • Results satisfy user queries • Results are easy to understand • Quick response • Avoiding linguistic processing
SemSearch Architecture End users Google-like User Interface Layer • Google-like query interface Text Search Layer • Semantic entity indexing engine • Semantic entity search engine Semantic Query Layer • Formal query construction engine • Query engine • Ranking engine Formal Query Language Layer (SPARQL, SERQL, etc.) Semantic Data Layer
The Google-like query interface • Extending the traditional keyword search languages by allowing the specification of: • The queried subject (the type of expected search results) • The combination of keywords • Three operations are used: • Operator “:” captures the query subject • “and”/”or” specifies the combination of keywords • Query formats: • One keyword: finding entities that have relations with the keyword match • Multiple keywords: “subject:keyword1 and/or keyword2 and/or keyword3”, e.g., “<news: phd students>”, <paper: john and enrico> • Advantages: • More flexible than form-based query interface • More powerful than state-of-art keyword-based semantic search interfaces
The search process • Step1: making sense of the user queries • Step2: translating user queries into formal queries • Step3: Querying the back-end semantic data repository • Step4: Ranking the querying results
Making sense of user queries • Finding out the semantic meaning of keywords • Class, (e.g., the keyword “phd students”) • Relation, (e.g., “author”) • Instance, (e.g., “Enrico”, ”KMi director”) • Method: text search • labels (rdfs:label) • Short literals also used in the case of instances matching • When searching for “KMi director”, the instances can be picked up. • Two components in the search engine • The semantic entity index engine • The semantic entity search engine
Translating user queries into formal queries • The search engine takes as input the semantic matches of user search terms • The search engine takes outputs an appropriate formal query according to the semantic meanings of keywords • One user query Each keyword multiple matches SEARCH ENGINE multiple formal queries.
Simple user queries • There are only two keywords involved: <subject : keyword> • Fixed number of combination types • The SeRQL query templates are defined
A template example • Pattern: Subject -> Class Cs; Keyword -> Class Ck • Results: <Is,Relation,Ik> associated with exploratory links. • Example: news stories about phd students • <news “KMi success”, mentions-person, Tom-Heath> • A simplified template in Sesame SeRQL: select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Is} R {Ik} union select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Ik} R {Is}
Complex user queries • < subject: keyword1 and/or keyword2 and/or… > • Instances of the subject which either have relations with all the keywords or have relations with some of the keywords. • Operational problem • the number of combination gets big when there are many keywords involved and there are lots of matches for each keyword. • Rules for combination reduction: • Only considering the subjectkeyword as class entities • Choosing the closest matches to the keyword as possible • Choosing the most specific class match among the class matches.
Query construction • In SeRQL • Three building blocks • Head block: what needs to be retrieved, i.e., <Is, r, Ikx> • Body block: how to retrieve the triples • Condition block: conditions need to be satisfied • Union block : in order to cover bidirectional relations SELECT DISTINCT label(ArtefactTitle), MuseumName FROM {Artefact} arts:created_by {} arts:first_name {"Rembrandt"}, {Artefact} arts:exhibited {} dc:title {MuseumName}, {Artefact} dc:title {ArtefactTitle} WHERE isLiteral(ArtefactTitle) AND lang(ArtefactTitle) = "en" AND label(ArtefactTitle) LIKE "*night*"
Has keyword match? Is instance? Is property? Is class? Query construction algorithm Initializing the query blocks No Yes Adding query blocks for class-class relations retrieval Yes No Adding query blocks for class-property relationsretrieval Yes No Adding blocks for class-instance relations retrieval Yes No Composing queries using the blocks
Conclusions • A keyword-based semantic search engine has been developed • Google-like query interface • Supporting relatively complex queries • Providing relatively quick response
Opinions • Pros • Google-like query interface (intuitive) • Supporting relatively complex queries • Cons • Limitation of the target data form. (RDF) • Ranking • Simple semantic matching • Issues • Finding out the semantic meaning of keyword • Storage modeling • Strategy of the semantic match between keyword and semantic entity