530 likes | 872 Views
SEMANTIC SEARCH. PRESENTED BY: Group No:13 Jai Mashalkar 113050007 Khushraj Madnani 113050041 Lahari Poddar 113050029. SEMANTIC SEARCH.
E N D
SEMANTIC SEARCH PRESENTED BY: Group No:13 Jai Mashalkar 113050007 Khushraj Madnani 113050041 Lahari Poddar 113050029
SEMANTIC SEARCH • Semantic search seeks to improve search accuracy by understanding searcher’s intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.
Semantically Relatable Sets • Query Expansion • Relevance Feedback
SEMANTICALLY RELATABLE SET • A semantically relatable set (SRS) of a sentence is a group of unordered words in the sentence (not necessarily consecutive) that appear in the semantic graph of the sentence as linked nodes.
FORMS OF SRS • {CW,CW} • {CW,FW,CW} • {FW,CW} CW: Content Word or Clause FW: Function Words Example: The girl borrowed a book on AI from library. CW: girl, borrowed, book, AI, library FW: the, a, on, from
THE GIRL BORROWED A BOOK ON AI FROM LIBRARY borrowed past tense agent place object girl book library a: indefinite from: modifier the: definite modifier AI on: modifier
THE GIRL BORROWED A BOOK ON AI FROM LIBRARY Sets Formed: • {the,girl} • {girl,borrowed} • {borrowed,book} • {book,on,AI} • {borrowed,from,library} • {a,book}
THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY announced agent object professor SCOPE that: modifier the: definite
THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY SCOPE: conduct Will: Future Tense agent time object he lecture sunday an: indefinite on: modifier modifier extra
SETS FORMED {the,professor} {professor,announced} {announced.that,SCOPE} SCOPE:{he,conduct} SCOPE:{will,conduct} SCOPE:{conduct,lecture} SCOPE:{conduct,on,sunday} SCOPE:{extra,lecture} SCOPE:{an,lecture}
SRS BASED SEARCH • The relevance score for a document d: Rq(d) = Relevance of the document d to the query q |Sd| = Number of sentences in the document d rq(s) = Relevance of sentence s to the query q • The relevance of the sentence s to the query q : weight(srs) = weight of the SRS srs. press(srs) = true if srs is present in sentence s, false otherwise.
ANALYSIS OF SRS • SRS based search technique gives very high precision value ( the fraction of retrieved instances that are relevant) compared to tf-idf based search. • But falls short of tf-idf based search due to its low recall( the fraction of relevant instances that are retrieved).
LOW RECALL REASONS: • Morphological Divergence • Eg: Apparel for man: Clothes for men • Synonymy/Hypernymy/Hyponymy Divergence • Eg: Color: red/blue • Physical Separation Divergence • Eg: Book on AI: AI book
LOW RECALL ENHANCEMENTS: Stemming Eg: Moving, moved, moves → move Word Similarity Eg: Clothes ~ Apparel SRS Augmentation <Noun1 Preposition Noun2> ~ <Noun2 Noun1>
QUERY EXPANSION • Query expansion is the process of reformulating a seed query to improve retrieval performance. • Techniques involved: • Finding synonyms of words. • Finding all the various morphological forms of a word by stemming
TYPES OF QUERY EXPANSION • GLOBAL : Examine word occurrences and relationships using thesaurus. It can be constructed manually or automatically. • LOCAL: Using the top ranked documents retrieved by the original query.
GLOBAL QUERY EXPANSION • Manual Thesaurus Generation: Use of a controlled vocabulary (maintained by human editors) that is built up from sets of synonymous names for concepts. • Automatic Thesaurus Generation: • Exploit word co occurrence. • Exploit grammatical relations or grammaticaldependencies.
ANALYSIS OF QUERY EXPANSION • Query expansion is effective in increasing recall of relevant documents. • But it may significantly decrease precision,particularly when the query contains ambiguous terms. • In general a domain specific thesaurus is required for better performance.
RELEVANCE FEEDBACK Initially the given query by user is fired Some results are retrieved Analyze whether or not those results are relevant Perform a new query and then produce the final search results by firing this modified query.
TYPES OF RELEVANCE FEEDBACK Explicit Feedback : • Process of taking Feedback Taken By users for assessing a given output(Set of Documents). • Eg: After a document is viewed, ask “Was this document helpful?”
ANALYSIS: • ADVANTAGE: It is able to depict the actual requirement and expectations of the user • DISADVANTAGE: • Large fraction of user may not be interested to participate in surveys and Feedbacks. • These surveys may be biased based on personal choices of users. e.g. : When searched about inferno, most of the people may rank the pages of musical band named inferno over that of inferno OS
IMPLICIT FEEDBACK: • Feedback which is inferred by the actions of user on output documents. Factors: • Number of times document is visited • Duration of visit on particular URL • Depth and number of links from visited
ANALYSIS: • ADVANTAGE : The interaction time with user is eliminated as the system takes the feedback of the user implicitly. • DISADVANTAGE: • Number of Hits on Url: Users may tend to always click on the initial document received. Thus if the search was initially not upto the mark, it may continue performing poor. • Time Spent on URL: Sometimes the time taken to reject a document may be substantial enough for the algorithm to believe that it is relevant. • Number and Depth of links visited: This will definitely rank a relevant document as relevant. But this will fail to rank a good document without links as relevant.
PSEUDO RELEVANCE FEEDBACK OR BLIND FEEDBACK : • Takes a query as an input. • From some top k ranked results on that query, some keywords (as per their weights) are selected and augmented to the query which results in further search process.
ANALYSIS: • ADVANTAGE : It is a completely automated process. Hence totally free from human biasness. • DISADVANTAGE: • The efficiency heavily depends on the ranking algorithm used. If the top documents retrieved by the initial query are not very relevant then the final result will also not be very impressive. • The type of term associations obtained for QE is restricted to co-occurrence based relationships in the feedback documents, and thus other types of term associations such as lexical and semantic relations (morphological variants, synonyms) are not explicitly captured .
MULTI LINGUAL PRF Given a query in a language, we take the help of another language to ameliorate the well known problems of PRF. The steps are: • Translation: L1 -> L2 • PRF performed in L2. • Result back-translation: L2 -> L1 • Combination of feedback models of L1,L2. • Fetch a new ranked list of documents.
ANALYSIS OF MULTILINGUAL PRF • Good Feedback from Assisting Language: If the feedback model in the assisting language contains good terms, then the back-translation process will introduce the corresponding feedback terms in the source language, thus leading to improved performance. • Finding Synonyms/Morphological Variations: Another situation in which MultiPRF leads to large improvements is when it finds semantically/lexically related terms to the query terms which the original feedback model was unable to. • Abundance of documents in the assisting language in the web compared to the base language.
CONCLUDING REMARKS • Semantic Search will be helpful in case of Research Search but won’t be much helpful for Navigational Search. • Semantic Search performs better than traditional searching methods in case of semantically meaningful sentences or phrases but will fall short for keyword based search. • To be able to use Semantic Search Engine to their full potential the users also need to get used to searching with meaningful queries instead of just keywords.
THE FUTURE AHEAD… • Semantic search may not able to replace the traditional web completely but it has the power to enhance it. • With semantic search the web will become more intelligent as it will be able to understand exactly what we mean instead of searching just the keywords.
REFERENCES • Rajat Mohanty, Anupama Dutta and Pushpak Bhattacharyya, Semantically Relatable Sets: Building Blocks for Repesenting Semantics, 10th Machine Translation Summit ( MT Summit 05), Phuket, September, 2005. • Manoj Chinnakotla, Karthik Raman and Pushpak Bhattacharyya, Multilingual PRF: English Lends a Helping Hand, SIGIR 2010, Geneva, Switzerland, July, 2010. • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. • Query Expansion Using Local and Global Document Analysis Jinxi Xu and W. Bruce Croft Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts, Amherst, MA 01003-4610, USA. • http://en.wikipedia.org/wiki/Semantic_search ,Last modified on 23 October 2011 at 14:11,Last Accessed on 02 November 2011 at 17:31 • http://en.wikipedia.org/wiki/Query_expansion, Last modified on 7 October 2011 at 20:43, Last Accessed on 04 November 2011 at 18:45 • http://en.wikipedia.org/wiki/Relevance_feedback,Last modified on 31 October 2011 at 03:46,Last Accessed on 04 November 2011 at 19:10