1.02k likes | 1.32k Views
From Syntactic Search to Semantic Search. מחיפוש תחבירי לחיפוש משמעותי. אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il. Contents. Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!.
E N D
From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il A. Frank
Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank
“The good, the bad and the …” A. Frank
Semantics in Web 2.0?! , Open Gardens blog, AjitJaokar http://opengardensblog.futuretext.com/archives/2005/12/mobile_Web_20_w.html A. Frank
Web 1.0, Web 2.0, Web 3.0, Web X.0… A. Frank
Where are we headed?! A. Frank
Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank
Googlism A. Frank
Google • Presents us with a search box and allows us to enter free form queries. • Basically retrieves documents based on keyword searches. • The search algorithms are based on string/pattern matching, statistical frequencies and Page rank (based on hyperlink popularity analysis). • There is some light-weight semantics – when you search for known objects or public data, Google knows about these types of entities (OneBox) and provides factual answers. A. Frank
Actual Answers to Factual Questions A. Frank
Find Public Data A. Frank
Compare Public Data A. Frank
Current State • The current generation of search engines is severely limited in its understanding of the user's intent and the Web's content. • Semantic search has attracted a lot of attention in the past year, largely due to the growth of the Semantic Web as a whole. • The term “Semantic Search” itself is popular enough to be considered overused/overloaded. • But in general it refers to methods of searching documents beyond the syntactic level of matching keywords. A. Frank
Syntactic vs. Semantic Search • Syntactic search – can match the query against • index of the textual content of the resources • URIs (URLs, URNs) in the system • literals in the RDF metadata • or a combination of these, possibly using: • Exact, prefix or substring match, stemming, minimal edit distance • Semantic search – in addition to syntactic search, can use • index of the meaning of sentences in each resource • semantic knowledge structures and analysis • the graph structure of RDF metadata • or a combination of these, possibly using: • query expansion, classification/categorization, tagging, graph traversal, microformats, RDF & OWL inferencing and reasoning A. Frank
Can Semantic Search answer this :-?) A. Frank
Semantics & NLP • Semantics is a subfield of linguistics that is devoted to the study of meaning, as expressed by words, phrases, sentences, and even larger units of text. • Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics that studies the problems of automated generation and understanding of natural human languages by computers. • Semantic NLP integrates these to provide meaning and understanding within context of computer applications. A. Frank
Semantic Search in General • Collective methods that look at information available beyond the level of individual words or phrases. • Semantic search methods are diverse and cover a range of disciplines • Information Retrieval, Natural Language Processing, Semantic Analysis, the Semantic Web, Databases, Information Extraction, Information Visualization, etc. • An understanding of content at the semantic level also makes it possible to aggregate, compare and reason with content as structured data. • Enables people to type in arbitrarily complex questions and then interpret these queries and execute them. A. Frank
Contents • Web 1.0, Web 2.0, Web 3.0, Web X.0… • Syntactic vs. Semantic Search • Semantic Search Engines • Spectrum of Semantic Search Problems?! A. Frank
Types of “Semantic Search” Engines • Semantic Web SearchEngines • search on the Semantic Web data (RDF, OWL, etc). • Semantically-enhanced Search Engines • mostly augment and refine the displayed results. • NLP-based Search Engines • mainly work on the indexing and query side of the search. • Semantic-NLP-based Search Engines • employ semantic knowledge structures and analysis. • Computational-NLP-based Search Engines • employ an inference engine that enables reasoning and computation. A. Frank
Sample of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank
Sample queries used here • Query involves extracting concepts • Artificial Intelligence • Query involves ambiguous term • Jaguar, Cycle • Query involves aggregating data • What did Albert Einstein do • Query involves inference • When was California’s governor born • Query involves computation • What is the distance from San Francisco to Michigan A. Frank
Results of sample queries used here A. Frank
Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank
Google Orion • Technology that can better understand associations and concepts related to search. • It finds relationships between queries and related concepts and presents those as search refinements: • Pages are scanned in “real-time” by Google after a query is entered. • Conceptually and contextually related sites/pages are then identified and expressed in the form of the improved refinements. • Also provides longer “snippets” (text extracts containing the keywords) when users input queries of three words or more. A. Frank
Orion Search Refinements A. Frank
Yahoo! SearchMonkey • The SearchMonkey platform is mainly based on the Semantic Web approaches. • Supports the ability to modify the presentation of search results through plug-ins to the search interface. • Content providers or third-party developers can create custom data services to extract information from the Web page. • After defining the extracted properties per entry, an enhanced display can be provided. A. Frank
SearchMonkey Enhanced Display A. Frank
Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank
MetaWeb Freebase • Open, shared database of the world’s knowledge that collects data from the Web to build a massive, collaboratively-edited database of cross-linked data. • Free for anyone to query, contribute to, build applications on top of, or integrate into their Web sites. • Focus is on organizing and managing complex data structures by use of Semantic Web technologies. • Enables extraction of ordered knowledge out of the information chaos that is the current Web. A. Frank
Freebase Interface A. Frank
Freebase Repository • Covers millions of topics in hundreds of categories. • Freebase contains structured information on many popular topics, like movies, music, people and locations – all reconciled and freely available. • A base is a new kind of Website that anyone can build using Freebase; It is a place to organize and share collections of information about a particular subject. • Freebase has two search interfaces: • Free form queries; a text search box. • MQL (MetaWeb Query Language) queries. A. Frank
Freebase Structure • Freebase spans domains, but requires that a particular topic exist only once, even if it might normally be found in multiple bases. • For example, Arnold Schwarzenegger would appear in a movie base as an actor, a political base as a governor and a bodybuilder base as a Mr. Universe. • In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together. A. Frank
Freebase vs. Wikipedia • The difference lies in the way they store information. • Wikipedia arranges information in the form of articles. • Freebase lists facts and statistics. • Its list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other Web sites/software. • Topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia. A. Frank
Query involves extracting concepts A. Frank
Query involves ambiguous term A. Frank
Query involves aggregating data (1) A. Frank
Query involves aggregating data (2) A. Frank
Query involves inference A. Frank
Query involves computation A. Frank
Microsoft Powerset • Provides a NLP-based search engine. • Powerset reads and understands every sentence on a Web page and allows asking questions in plain English. • Powerset search and discovery experience is currently based on Wikipedia and Freebase. • In large, Powerset is trying to match the meaning of a query with the meaning of sentences in Wikipedia or facts in Freebase. A. Frank
Powerset Interface A. Frank
Powerset Services • In the search box, you can express yourself in keywords, phrases, or simple questions. • On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregating information from across multiple articles. • Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content. A. Frank
Powerset Factz • Factz are concise representations of information extracted from sentences. • They are represented in three parts: the subject, relation and object (e.g., Oswald shot JFK). • Factz do not always represent truth, but rather propositions that are asserted in the text of Wikipedia. • On the search results page, you will often see Factz for general topic queries; these Factz are collected from pages across Wikipedia. • On a particular topic page, you can see the Factz extracted from the given page in the outline. A. Frank
Snippet of a Powerset Result Page A. Frank
Query involves extracting concepts A. Frank
Query involves ambiguous term A. Frank
Query involves aggregating data A. Frank
Query involves inference A. Frank
Query involves computation A. Frank
Types of “Semantic Search” Engines • Semantic Web Search (not covered here) • Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) • Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … • NLP-based Search • MetaWeb Freebase, MS Powerset, … • Semantic-NLP-based Search • hakia, Cognition, (Readware?), … • Computational-NLP-based Search • True Knowledge, (Wolfram Alpha?), … A. Frank