SemSearch : A Search Engine for the Semantic Web

SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented by Jungyeon, Yang

Outline • Research background • SemSearch overview • Query interface • Search process • Implementation & examples • Conclusions

Research background • Semantic search: extending traditional search with the semantic web technology • Exploiting the explicit meaning of documents (i.e., ontology-based metadata) • Current semantic search tools • Form-based, e.g., SHOE, Magnet • QA-based, e.g., AquaLog, ORAKEL • Keyword-based, e.g., TAP, Squiggle, DOSE

Support for ordinary end users • Form-based tools • Forms are intuitive • Issues: knowledge overhead; scalability • QA-based tools • Easy to use • Issue: heavy NLP. • Keyword-based tools • Easy to post queries; quick response • Issue: typically one keyword only; general knowledge of the problem domain required

The goal of our search engine • Hide the complexity of semantic search from end users: • Low barrier to access: easy to post queries • Avoiding the form-based routine • Dealing with relatively complex queries • Supporting multiple keywords • Precise and self-explanatory results: • Results satisfy user queries • Results are easy to understand • Quick response • Avoiding linguistic processing

SemSearch Architecture End users Google-like User Interface Layer • Google-like query interface Text Search Layer • Semantic entity indexing engine • Semantic entity search engine Semantic Query Layer • Formal query construction engine • Query engine • Ranking engine Formal Query Language Layer (SPARQL, SERQL, etc.) Semantic Data Layer

The Google-like query interface • Extending the traditional keyword search languages by allowing the specification of: • The queried subject (the type of expected search results) • The combination of keywords • Three operations are used: • Operator “:” captures the query subject • “and”/”or” specifies the combination of keywords • Query formats: • One keyword: finding entities that have relations with the keyword match • Multiple keywords: “subject:keyword1 and/or keyword2 and/or keyword3”, e.g., “<news: phd students>”, <paper: john and enrico> • Advantages: • More flexible than form-based query interface • More powerful than state-of-art keyword-based semantic search interfaces

The search process • Step1: making sense of the user queries • Step2: translating user queries into formal queries • Step3: Querying the back-end semantic data repository • Step4: Ranking the querying results

Making sense of user queries • Finding out the semantic meaning of keywords • Class, (e.g., the keyword “phd students”) • Relation, (e.g., “author”) • Instance, (e.g., “Enrico”, ”KMi director”) • Method: text search • labels (rdfs:label) • Short literals also used in the case of instances matching • When searching for “KMi director”, the instances can be picked up. • Two components in the search engine • The semantic entity index engine • The semantic entity search engine

Translating user queries into formal queries • The search engine takes as input the semantic matches of user search terms • The search engine takes outputs an appropriate formal query according to the semantic meanings of keywords • One user query  Each keyword  multiple matches  SEARCH ENGINE  multiple formal queries.

Simple user queries • There are only two keywords involved: <subject : keyword> • Fixed number of combination types • The SeRQL query templates are defined

A template example • Pattern: Subject -> Class Cs; Keyword -> Class Ck • Results: <Is,Relation,Ik> associated with exploratory links. • Example: news stories about phd students • <news “KMi success”, mentions-person, Tom-Heath> • A simplified template in Sesame SeRQL: select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Is} R {Ik} union select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Ik} R {Is}

Complex user queries • < subject: keyword1 and/or keyword2 and/or… > • Instances of the subject which either have relations with all the keywords or have relations with some of the keywords. • Operational problem • the number of combination gets big when there are many keywords involved and there are lots of matches for each keyword. • Rules for combination reduction: • Only considering the subjectkeyword as class entities • Choosing the closest matches to the keyword as possible • Choosing the most specific class match among the class matches.

Query construction • In SeRQL • Three building blocks • Head block: what needs to be retrieved, i.e., <Is, r, Ikx> • Body block: how to retrieve the triples • Condition block: conditions need to be satisfied • Union block : in order to cover bidirectional relations SELECT DISTINCT label(ArtefactTitle), MuseumName FROM {Artefact} arts:created_by {} arts:first_name {"Rembrandt"}, {Artefact} arts:exhibited {} dc:title {MuseumName}, {Artefact} dc:title {ArtefactTitle} WHERE isLiteral(ArtefactTitle) AND lang(ArtefactTitle) = "en" AND label(ArtefactTitle) LIKE "*night*"

Has keyword match? Is instance? Is property? Is class? Query construction algorithm Initializing the query blocks No Yes Adding query blocks for class-class relations retrieval Yes No Adding query blocks for class-property relationsretrieval Yes No Adding blocks for class-instance relations retrieval Yes No Composing queries using the blocks

Simple query example

Refinement support

Complex query example

Conclusions • A keyword-based semantic search engine has been developed • Google-like query interface • Supporting relatively complex queries • Providing relatively quick response

Opinions • Pros • Google-like query interface (intuitive) • Supporting relatively complex queries • Cons • Limitation of the target data form. (RDF) • Ranking • Simple semantic matching • Issues • Finding out the semantic meaning of keyword • Storage modeling • Strategy of the semantic match between keyword and semantic entity

SemSearch : A Search Engine for the Semantic Web

SemSearch : A Search Engine for the Semantic Web

Presentation Transcript

Search Engines for Semantic Web Knowledge

Semantic Markup and Search Engine Optimization

Web browser , Search Engine

Semantic Markup and Search Engine Optimization

A Semantic Web Search and Metadata Engine

The Lincoln Project Building a Web-Scale Semantic Search Engine

The Future of Semantic Web Search

Swoogle: A Semantic Web Search and Metadata Engine

Growing the Semantic Web with Inverse Semantic Search

Semantic Web Search

XSEarch: A Semantic Search Engine for XML

Introduction to semantic search engine

Trust based web spam detection in semantic search engine

An inference engine for the semantic web

Storage Engine for Semantic Web

Web Search Engine Optimization

Semantic Web and the Perfect Search

Growing the Semantic Web with Inverse Semantic Search