550 likes | 657 Views
Semantic Search Facilitator: Concept and Current State of Development. InBCT Tekes PROJECT Chapter 3.1.3 : “Industrial Ontologies and Semantic Web” (year 2004). Industrial Ontologies Group. Researchers Vagan Terziyan Oleksandr Kononenko Andriy Zharko Oleksiy Khriyenko Olena Kaykova
E N D
Semantic Search Facilitator: Concept and Current State of Development InBCT Tekes PROJECT Chapter 3.1.3 :“Industrial Ontologies and Semantic Web” (year 2004)
Industrial Ontologies Group • Researchers • Vagan Terziyan • Oleksandr Kononenko • Andriy Zharko • Oleksiy Khriyenko • Olena Kaykova • Olga Klochko • Andriy Taranov • Contact: • e-mail: vagan@it.jyu.fi • Phone: +358 14 260 4618 • URL: http://www.cs.jyu.fi/ai/OntoGroup
Resources 12 000EUROsalariesfor5months Resources used from InBCT Project in 2004:
Semantic-basedEnhancementoftheInformationRetrievalMotivation from Industrial Ontologies Group:“While recently there is luck of annotated resources in the Web, which makes metadata-based search useless, we should develop enhanced Web search tool based on Google and WordNet ontology and provide semantic search user interface”
Semantic Web and Information Retrieval • Semantic Web promises many advantages and benefits, but: • We are only in “transition” towards the Semantic Web • Resources are not yet annotated semantically • Not enough metadata available in the Web for more smart search • Semantic search of non-semantic data ??? • Yes, why not? We need a Semantic Facilitator !
Semantic Facilitator Concept • What is it? • Search service that uses other services • Utilizes other search engines as Web services and… • … makes their performance better due to smart query generation algorithms • Supports search within heterogeneous resources (Web pages, Web databases, local file system, etc.) • Filters returnedresults based on user preferences • Intelligent “semantic query”-based tool that really “understands” what users want to find • What it is not? • Search engine, indexing tool, registry, etc. • Data storage, database browser, etc.
? Mouse Web search - What’s the Problem? • Search in the web is not always convenient: • Polysemy of words gives irrelevant results • Synonymy does not supported by search engines => loss of relevant results • There is a need to capture semantics from search query
Semantic Search Assistantlight version of Semantic Facilitator • “Semantic Search Assistant” (SSA) is a software that: • helps user to obtain more relevant results while using standard search engine (Google) by interaction with WordNet ontology • finds possible contexts for words in search query • can broaden or constrict search query with other relevant words and phrases for result improvement • works with not annotated documents • is not restricted to any concrete domain
Sense Determination • WordNet is an open source ontology, which contains information about different meanings of a term, synonyms, antonyms and other lexical and semantic relations • Having several words in search query we can determine in which context (sense) each of them is used with the help of WordNet: • by comparing words synsets • by comparing words textual descriptions and examples • by finding common roots going up in WordNet hierarchy tree for each word • by asking a user
How does it work? • Getskeyword query • Translates original query into series of queries to Google taking into account the semantics of keywords • Combines returned results
“Driver” “Driver” “Driver” “Driver” “Driver” Common ontology Ontology Personalization Ontology Personalization: is mechanism, which allows users to have own conceptual view and be able to use it for semantic querying of search facilities. Search
X X X X X X X X X X Result: SemanticFiltering Semantic Search Enhancement Semantic Search Enhancement : Common (linguistic) ontology Semantic Search Facilitatoruses ontologically (WordNet) defined knowledge about words and embedded support of advanced Google-search query features in order to construct more efficient queries from formal textual description of searched information. Semantic Search Facilitator hides from users the complexity of query language of concrete search engine and performs routine actions that most of users do in order to achieve better performance and get more relevant results. Domain ontology ( ) Query :
Capturing Semantics from Search PhrasesMotivation according to our Ukrainian colleague: Vadim Ermolayev“Google query should be transformed based on domain ontology”
Algorithm for the New Query Generation Rij - relevance of the word’s sense j = 1, p 1 Sense (i1) -1 … k = 1, mij … i = 1, n Syn (ij1) Word(i) Sense (ij) 1 -1 … Nijk Syn (ijk) … 1 0 … - number of the synonym’s senses Syn(ijmij ) Ri - significance of the word in query 1 Sense (ip) -1 … n – number of the words from query p – number of the word’s senses mij – number of the word’s synonyms in senses
Syn Syn Syn Syn Syn Syn Syn Qijk Qijk Qijk Qijk Qijk Qijk Qijk Algorithm for the New Query Generation Synonym Quality: j=1 Rij 1 , if Synijk is a member of the synsetj Qijk = p * Nijk L L– number of the synsets which contain Synijk Word(i) … Reduction of the synonym quality absolute value if Qijk >= 0, then synonym will used via ”OR” in a query if Qijk < 0, then will used via ”AND NOT”
Algorithm for the New Query Generation Algorithm 1: Word(1) Syn Syn Syn … Syn Word(i) Syn Syn Syn … Syn … Word(n) Syn Syn Syn … Syn AND AND Query Syn Syn Syn Syn Syn Syn Syn Syn Syn OR (AND NOT) OR (AND NOT) OR (AND NOT)
Algorithm for the New Query Generation Algorithm 2: > |Q| Word(1) Syn Syn Syn … Syn Word(i) Syn Syn Syn … Syn … Word(n) Syn Syn Syn … Syn Ri Filtering based on a significance of the word AND AND Query Syn Syn Syn Syn Syn Syn Syn Syn Syn OR (AND NOT) OR (AND NOT) OR (AND NOT)
We use Google because.. • Developers write software that connects remotely to the Google Web APIs service and access Google's index of more than 4 billion web pages • Google Web APIs support the same search syntax as the Google.com site • Communication is performed via the Simple Object Access Protocol (SOAP), an XML-based mechanism for exchanging typed information ..but that could be virtually any of existing search engine
WordNet( online access: http://www.cogsci.princeton.edu/cgi-bin/webwn )
WordNet 2.0 Search Example • Search word: "driver“ The noun "driver" has 5 senses in WordNet.1. driver -- (the operator of a motor vehicle)2. driver -- (someone who drives animals that pull a vehicle)3. driver -- (a golfer who hits the golf ball with a driver)4. driver, device driver -- ((computer science) a program that determines how a computer will communicate with a peripheral device)5. driver, number one wood -- (a golf club (a wood) with a near vertical face that is used for hitting long shots from the tee) • Sense 1driver -- (the operator of a motor vehicle) => busman, bus driver -- (someone who drives a bus) => chauffeur -- (a man paid to drive a privately owned car) => designated driver --(the member of a party who is designated to refrain from alcohol and so is sober when it is time to drive home) => honker -- (a driver who causes his car's horn to make a loud honking sound; "the honker was fined for disturbing the peace") => motorist, automobilist -- (someone who drives (or travels in) an automobile) => owner-driver -- (a motorist who owns the car that he/she drives) => racer, race driver, automobile driver -- (someone who drives racing cars at high speeds) …
WordNet – Basic Terminology • Syntactic category – part of speech {noun, verb, adjective, adverb} • Synonymic set (synset) – list of synonymic words or collocations • Every word can have several senses • Every sense of a word is associated with synonyms (synset) of the word in that specific sense • Synsets are organized in hierarchies interlinked with semanticrelations
WordNet – Organization • Building Blocks: • Word forms – common word orthography • Word meanings – by synsets • Relations: • Lexical – between word forms • Semantic – between word meanings • => Pointers: • Lexical – pertain only to specific word • Semantic – pertain to all of the words in semantic set.
Features of SSA • Platform independent (written in Java) • Works in 2 modes: • common mode, implements almost all of Google functionality; • extended mode, extends common mode, makes several requests with the same semantic sense, returns compound results. • Keeps results in XML format
Common mode • SSA has clear and simple interface, which helps user makes advanced Google search without special knowledge • SSA transforms values of fields into Google request according to special format, which Google provides for advanced search
Extended mode • More powerful mode than the common one • SSA takes user request, makes a try to choose more convenient sense with user’s help • Makes a set of requests, which extend user’s request by synonyms and exclude unsuitable words
Generating of requests set • WordNet API and dictionaries are used for generating the set of requests • When user enters original request, SSA switches to the panel, where different senses of typed word are presented
Generating of requests set (2) • For every presented sense on this panel a user can see some description (even example) extracted from WordNet dictionary • Also he/she can set rate of correspondence for every sense in range [-1, 1]
Making compound result • SSA sends generated requests to Google one by one • It keeps obtained results for each request separately • User finally will get an integrated result, which was generated according special rules
Integrated results:generating rules • Unique identifier for each result is its URL • SSA counts amount of URL appearances in returned results and sets this amount as index for every URL • Results with bigger index are showed first • If indexes are equal, results are shown according the order as Google returned them
Results analysis • After making all requests, SSA shows final results • All results are keeping also in files in XML format for further analysing • User can highlight results for specific request, if there were more than one request
Results • Methods for automatic sense determination using WordNet Lexical Database were studied and correspondent algorithms were implemented • Algorithm for new query generating were implemented and embedded to the programming complex • User Interface for advanced search (with Google integration) was developed with Semantic Search Assistant functionality
Example • Initial query: hotel reservation agency (1, 7 and 5 senses correspondingly) • From first 5 results only 3 are relevant (results with whole sequence of query words even does not appear in first three pages) • Generated query: ("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency") (-"means") • From first 5 results all are relevant (using synonym “booking” along with “reservation” was helpful)
Example Results of generated query: Results of initial query:
Test 1: Initial query:cork mousepad
Test 1: Initial query:cork mousepad Enhanced query:("phellem" OR "bobfloat" OR "bobber" OR "cork" OR "bob") ("mousepad" OR "mouse mat")
Test 2: Initial query:flowers present shop
Test 2: Initial query:flowers present shop Enhanced query:("flower") (-"heyday" -"prime" -"efflorescence") ("present") (-"nowadays" -"present tense") ("store" OR "shop") (-"workshop")
Test 3: Initial query:hotel reservation agency
Test 3: Initial query:hotel reservation agency Enhanced query:("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency") (-"means")