1 / 30

Semantic Search: different meanings

Semantic Search: different meanings. Semantic search: different meanings. Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords Hakia , PowerSet , SearchMonkey

xannon
Download Presentation

Semantic Search: different meanings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Search: different meanings

  2. Semantic search: different meanings • Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords • Hakia, PowerSet, SearchMonkey • Definition 2: Semantic search as the problem of searching large semantic web datasets • Watson, PowerAqua, Swoogle, Sindice, SWSE

  3. Facing keyword-based search problems • Relations between search terms: • “books about recommender systems” vs. “systems that recommend books” • Polisemy • “mouth” as part of the body vs. “mouth” as part of a stream • Synonymy • “movies” vs. “films” • Documents about individuals where query keywords do not appear: • “English banks”, individual “Abbey”

  4. Several attempts from the IR community • Early 80s: elaboration of conceptual frameworks and their introduction in IR models • Taxonomies (categories + hierarchical relations) , e.g., The ODP (Open Directory Project) • Thesaurus (categories + fixed hierarchical & associative relations), e.g., WordNet (used by linguistic approaches) • Algebraic methods such as LSA • Limitations: The level of conceptualization is often shallow (specially at the level of relations)

  5. The emergence of the SW • Late 90s: introduction of ontologiesas conceptual framework (classes + instances (KBs) + arbitrary semantic relations + rules) • Semantic search: Exploiting ontologies as a richer conceptualizations & formal languages to enhance traditional keyword-based document retrieval • Semantic search: Need to search this emergent and continuously growing structured information space (the Web of Data) • DPLP, Geonames, DBPedia, BBC Music,... (http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets)

  6. The Web of Data • 2007 • 2008 • 2009 Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

  7. LOD cloud May 2007 • Facts: • Focal points: • DBPedia: RDFizedvesion of Wikipiedia; many ingoing and outgoing links • Music-related datasets • Big datasets include FOAF, US Census data • Size approx. 1 billion triples, 250k links Figure from [4] Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

  8. LOD cloud September 2008 • Facts: • More than 35 datasets interlinked • Commercial players joined the cloud, e.g., BBC • Companies began to publish and host dataset, e.g. OpenLink, Talis, or Garlik. • Size approx. 2 billion triples, 3 million links Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

  9. LOD cloud March 2009 • Facts: • Big part from Linking Open Drug cloud and the BIO2RDF project • Notable new datasets: Freebase, OpenCalais, ACM/IEEE • Size > 10 billion triples Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

  10. The LOD clouds Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

  11. Commercial interest by publishers

  12. Commercial interest by search engines • 2007 Yahoo! Presents Search Monkey

  13. Commercial interest by search engines • July-2008 Microsoft buys Powerset

  14. Commercial interest by search engines • April 2010 Facebook announced the use of the Open Graph protocol

  15. Commercial interest by search engines • May-2009 Google announces Rich Snippets and it’s official use of RDFa and Microformats

  16. Commercial interest by search engines • July-2010 Google buys Metaweb (the company behind FreeBase)

  17. Commercial interest by search engines • November-2010 Google announced the support of the GoodRelations vocabulary for Google Rich Snippets.

  18. Challenges • Exploiting this new information space for semantic search purposes opens new research challenges: • Scalability • Heterogeneity • Uncertainty

  19. Scalability Effective exploitation of the linked data requires infrastructure that scales to a large and ever growing collection of interlinked data!

  20. Heterogeneity SW:Person SW:/en/rudi_studer DATA-LEVEL SCHEMA-LEVEL Reconcile, Combine Align Dbpedia:Professor Dbpedia:Rudi_Studer Dblp:Studer:Rudi.html Dblp:~ley/db/../author Effective exploitation of the data web requires an effective mechanism for • finding the relevant data sources • integrating data sources • combining elements from different data sources

  21. Uncertainty “Find action films directed by some Hong Kong film director and starring Chinese martial actors” • Incomplete Representation of User’s Needs and content meanings • User cannot completely specify the need • The semantic information in the search space is incomplete Effective exploitation requires • match user’s needs to data in an imprecise way • rank the results • be flexible enough to adjust to changes in constraints!

  22. The Search Space: different representations

  23. The search space: different representations • Unstructured search space • The Web of documents (textual and multimedia content) • Structured search space • The Web of data (ontologies + Knowledge Bases) • Hybrid search space • Unstructured content is enriched with metadata • Embedded annotations • Not embedded annotations

  24. The unstructured search space • The Web of human-understandable content. • The Web of documents and links • <a href="http://creativecommons.org/licenses/by/3.0/">CC License</a> Documents Search space

  25. Search engines

  26. The structured search space • The Web of machine understandable content. • The Web of objects and relations • <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> Creative Commons License </a> objects Search space

  27. Search engines

  28. The hybrid search space • Enriching documents with metadata Objects Search space Documents How to interlink documents and data?

  29. Two ways of interlinking metadata and documents • Information Extraction • By relying on Web publishers • More on the section Data on the (Semantic) Web

  30. Search engines

More Related