1 / 27

Making the Web searchable, or the Future of Web Search

Making the Web searchable, or the Future of Web Search. Peter Mika Yahoo! Research Barcelona. Overview. Why a new vision? Context Semantic Web: metadata infrastructure Web 2.0: user-generated metadata Thesis: making the Web searchable Research challenges (SW & IR) Conclusion. Motivation.

mayten
Download Presentation

Making the Web searchable, or the Future of Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making the Web searchable, or the Future of Web Search Peter Mika Yahoo! Research Barcelona

  2. Overview • Why a new vision? • Context • Semantic Web: metadata infrastructure • Web 2.0: user-generated metadata • Thesis: making the Web searchable • Research challenges (SW & IR) • Conclusion

  3. Motivation • State of Web search • Picked the low hanging fruit • Heavy investments, marginal returns • High hanging fruits • Hard searches remain… • The Web has changed…

  4. Hard searches • Ambiguous searches • Paris Hilton • Multimedia search • Images of Paris Hilton • Imprecise or overly precise searches • Publications by Jim Hendler • Find images of strong and adventurous people (Lenat) • Searches for descriptions • Search for yourself without using your name • Product search (ads!) • Searches that require aggregation • Size of the Eiffer tower (Lenat) • Public opinion on Britney Spears • Queries that require a deeper understanding of the query, the content and/or the world at large • Note: some of these are so hard that users don’t even try them any more

  5. Example…

  6. The Semantic Web (1996-…) • Making the content of the Web machine processable through metadata • Documents, databases, Web services • Active research, standardization, startups • Ontology languages (RDF, OWL family), query language for RDF (SPARQL) • Software support (metadata stores, reasoners, APIs)

  7. Problem: difficulties in deployment • Not enough take-up in the Web community at large • Technological challenges • Discovery • Ontology learning • Ontology mapping • Lack of attention to the social side • Over-estimating complexity for users • Need for supporting ontology creation and sharing • Focus shifts from documents to databases --the Web of Data • Enterprise/closed community applications

  8. Web 2.0 (2003-) • Simple, nimble, socially transparent interfaces • Simplified KR • e.g. tagging, microformats, Wikipedia infoboxes • In exchange for a better experience, users are willing to • Provide content, markup and metadata • Provide data on themselves and their networks • Rank, rate, filter, forward • Develop software and improve your site • …

  9. Problem: lack of foundations • No shared syntax or semantics • No linking mechanism • Example: tag semantics • flickr:ajax = del.icio.us:ajax ? • flickr:ajax:Peter = flickr:ajax:John ? • flickr:ajax:Peter:1990 = flickr:ajax:Peter:2006? • Microformats • Separate agreement required for each format

  10. Thesis: making the Web searchable • The Web has changed • Content owners are interested in their content to be found (Web 2.0) • Cf. findability (Peter Morville), reusability (mashups), open data movement • Foundations are laid for a Semantic Web • We need to • Combine the best of Web 2.0 and the Semantic Web • Reconsider Web IR in this new world

  11. Semantic Web 2.0 • Getting the representation right • RDF++ • RDFa (RDF-in-HTML) • Innovations on the interface side • Semantic Wikis • New methods of reasoning • Semantics = syntax + statistics • Bottom-up, emergent semantics • Methods of logical reasoning combined with methods of graph mining, statistics • Scalability • Giving up soundness and/or completeness • Dealing with the mess • Social engineering • Collaborative spaces for creating and sharing ontologies, data • Connecting islands of semantics • Best practices, documentation, advocacy

  12. Example: Freebase

  13. Example: machine tags

  14. Example: folksonomies • Simplified view: “tags are just anchortext” • Can be used to generate simple co-occurrence graphs hilton url1 paris url2 eiffel url3

  15. A A B B

  16. The more complete picture • Folksonomies as tripartite graphs of users, urls and tags user1 user2 hilton url1 paris user3 url2 eiffel url3

  17. Community-based ontology mining • Opportunities for mining community-specific interpretations of the world • Peter Mika. Ontologies are us: A unified model of social networks and semantics.Journal of Web Semantics 5 (1), page 5-15, 2007

  18. Web IR 2.0 • Keep on improving machine technology • NLP • Information Extraction • Exploit the users for the tasks that are hard for the machine • Encourage and support users • Exploit user-generated metadata in any shape or form • Support standards of the SW architecture

  19. Vision: ontology-based search • Query: at the knowledge level • Partial description of a class/instance • Mapping of queries and resources in the conceptual space • Computing relevance in semantic terms • Novel user interfaces

  20. Ideal world • Plenty of precise metadata to harvest • User intent can be captured directly as a SPARQL query • Single ontology used both by the query and the knowledge base • Executed on a single knowledge base, gives the correct, single answer

  21. Technical challenges • Query interface • Data quality • Cleaning up metadata, tags • Spam • Ontology mapping and entity resolution • Ranking across types • Results display • How do you avoid information overload? • How do you display information you partially understand?

  22. Social challenges • Getting the users on your side • Users are unwilling to submit large amounts of structured data to a commercial entity (Google Base) • Provide a clear motivation and/or instant gratification • Trust them… but not too much (Mahalo)

  23. Example: Technorati and microformats <a href="http://technorati.com/tag/semweb" rel="tag">Semantic Web</a> http://technorati.com/posts/tag/semanticweb

  24. Example: openacademia.org and RDFa <span class="foaf:Person" property="foaf:name" about="#peter_mika"> Peter Mika </span>

  25. Conclusion • Why a new vision? • The opportunity: convergence • Semantic Web: metadata infrastructure • Web 2.0: user-generated metadata • Thesis: making the Web searchable • Research challenges

  26. What is there to gain? • Knowledge-based search • Sorting out hard searches • Creating new information needs • Beyond search • Analysis, design, diagnosis etc. on top of aggregated data • Personalization • Rich user profiles • Monetization • No more “buy virgins on eBay”

  27. Questions? • Peter Mika. Social Networks and the Semantic Web. Springer, July, 2007. • Special Issue on the Semantic Web and Web 2.0, Journal of Web Semantics, December, 2007.

More Related