1 / 56

Web 2.0, Tagging, Search engines, RawSugar

Web 2.0, Tagging, Search engines, RawSugar. Frank Smadja RawSugar May 2006. What is Web 2.0. Tim O’Reilly :

risa
Download Presentation

Web 2.0, Tagging, Search engines, RawSugar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006 RawSugar

  2. What is Web 2.0 Tim O’Reilly: Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html RawSugar

  3. What is Web 2.0? Social Web – “Wisdom of Crowds” • Users are publishers • Network effect – SHARE - • e.g: blogger.com, flickr, youtube, del.icio.us, tadalist.com, i4giveu.com, Technology: • Software delivery: Hours, Users are testers • AJAX (more later) • E.g.: 30Boxes, Writely, Google Calendar Business model: • Free for users, Paid Advertisements • Share revenues with users • E.g., Google adsense, simpy, RawSugar • Pageviews => $$$$ RawSugar

  4. Social Web – Wisdom of Crowds • diversity of opinion • independence of members from one another • decentralization and • a good method for aggregating opinions Show: Diggamazon.comYahoo! Movies RawSugar

  5. What is Tagging? From Gary Larson RawSugar

  6. Tagging Example RawSugar

  7. Before Tagging: Classification • Too hard to classify • Too expensive • Not scalable • Yahoo! directory • Dmoz • Semantic Web RawSugar

  8. Categorization is hard!! Object worth remembering (article, image…) Multiple concepts activated Choose ONE of the activated concepts. Categorize it! Analysis-Paralysis! From Rashmi Sinha RawSugar

  9. Tagging is simpler Object worth remembering (article, image…) Multiple concepts are activated Tag it! Note all concepts From Rashmi Sinha RawSugar

  10. The Personal to the Social From Rashmi Sinha RawSugar

  11. Tagging is a reality • Bookmarkers tag: • Delicious, Rawsugar, Shadows, Simpy, Blinklist, … • Bloggers tag: • 27 million blogs, doubles every 6 months • 1/3rd of blog posts now use tags (or categories) • Many more: • BBC – news site • News - Digg • YouTube - Video • Flickr, photo publishing and tagging • Enterprise? Museums? Cell phones? Most user generated content is tagged ! RawSugar

  12. What Tagging is NOT • NOT: Generous and altruistic people classifying the Web for the sake of the community • NOT: Smart software automatically classifying Web pages and tagging them • NOT: A collaborative way to classify the web into a growing giant ontology (folksonomy) RawSugar

  13. So why do People Tag? • Recovery/sharing of personal information: • Bookmarks • Photos • Videos, etc. • Increased traffic and findability • Bloggers • Social reward • Advertisement $  Tagging brings value to the tagger RawSugar

  14. Why is Tagging successful? • Tagging is free • Tagging is easy • Tagging brings value [Marlow, Naaman, Boyd & Davis2006] RawSugar

  15. RawSugar • Covers the last mile of search • Provides Guided Search on tagged pages • Publish guided search • Provide guided search to your site, Blog • Get more traffic • Receive advertising revenues! Search and Explore • Navigate by topics, people, directories • Find Experts RawSugar

  16. Nothing to eat here! RawSugar

  17. Still no food here ! RawSugar

  18. Bingo ! RawSugar

  19. What’s Great What’s not Great ? • Great: • You know what you’re looking for: • “Zibibbo restaurant” - • Not so great: • You’re hungry ! • You want to browse - Discover information, explore. • You want to know what is popular (“restaurants, digital camera, Java Tutorial, Free Games, etc.”) RawSugar

  20. State of the art:The Last Mile of Search • 83% unhappy with search results (WSJ survey) • Most searches point to a list of content websites and directories • Navigation of these sites is cumbersome and tedious • Google 2 steps approach: • Search “restaurants” • While (true) { explore guide; } • Change the query and Repeat “The last mile of search” Examples: Digital Camera Palo Alto bike Daily Kos Sprol dot Com RawSugar

  21. Where is the last mile? Google stops here: Human Knowledge: • Small and mid-size websites and blogs • Content is organized by human and manually: • Categorization • recommendations • Poor search and navigation • Each directory is an island of information and does not connect to related directories RawSugar

  22. What’s Missing?Browsing with Facets “Easy to discover information without prior knowledge of collection contents “ Faceted Search Paradigm Not new: • Library systems: “American history”, “Shakespeare”, etc. • Search Engines: Endeca, Shopping.com, Yahoo! Directories, Dmoz, etc. • Google/MSN/Yahoo! Local Search - Browse by Location - • Current uses: E-Commerce Problems: • Maintained by humans – Expensive • Rely on a world order – Brittle • Facets use a controlled vocabulary – Not easy to define. => Not Scalable RawSugar

  23. Amazon – Faceted Search Search for Tel Aviv RawSugar

  24. Shopping.com Faceted Search Search for Tel Aviv RawSugar

  25. RawSugar Faceted Search RawSugar Refine your search

  26. RawSugar Faceted Search Juniorbonner on del.icio.us vs. Juniorbonner on RawSugar RawSugar

  27. RawSugar Into the Last Mile RawSugar RawSugar inside

  28. RawSugar Into the Last Mile RawSugar RawSugar inside

  29. RawSugar Faceted Search in the last mile Daily Kos Blog Search for Iran on RawSugar RawSugar

  30. RawSugar Technology RawSugar

  31. Problem 1:Searching the TagSpace How would You tag this? How would You search For it? Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo, Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc. RawSugar

  32. Problem 2: Exploring the TagSpace Locations Restaurant Type morphology Not a restaurant! RawSugar

  33. Problem 3: Exploring the TagSpace Not usable ! RawSugar

  34. RawSugar – Tag HierarchyGuided Navigation Food groups Origins groups Locations groups RawSugar

  35. RawSugar Tag Hierarchy • Key idea: Some users (4%) define tag hierarchies – (food>sushi, european>spanish, …) • We mine this tag space to learn simple tag-relations (ISA relations and RELATED) using statistics. • At search time: We apply this learned knowledge to group tags from results. RawSugar

  36. RawSugar –Guided Search Combining Hierarchy Fragments User 3 User 1 food europe cooking recipes UK Scotland User4 Edinburgh Spain Asian Chinese Italy Thai User 2 User 5 food Southwest vegetarian California Sushi Bay Area San Francisco Texas RawSugar

  37. RawSugar: Mining and Clustering • Related tags: Tags that are related – (collocations, synonymy, antinomy, ISA, HASA, …) • Related pages: Pages tagged similarly • Related people: People with similar interests Tags sailing Cycling group Pages RawSugar TagSpace People RawSugar

  38. Related work Rashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91 Paul Heyman (Stanford): “Tag Hierarchies” http://i.stanford.edu/~heymann/taghierarchy.html Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf Siderean fac.etio.us: “Faceted search on delicious tags” http://www.siderean.com/delicious/facetious.jsp Marti Hearst: “Clustering vs. Faceted Search” http://bailando.sims.berkeley.edu/papers/cacm06.pdf And more … RawSugar

  39. Conclusion Questions? RawSugar

  40. Backup Technology Slides RawSugar

  41. What should we do?Smart Backend – Easy Tagging • “Tag Relations improve searchability and exploration.” • Similar tags: • Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged, • Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming, • Tag groups or subtags: • Location -> san francisco, london, new york, etc. • Food -> sushi, sashimi, pizza, etc. • Programming -> html, java, css, etc. Goal : Discover them by Mining the tag space RawSugar

  42. What should we do?Smart Backend – Friendly Frontend • Backend should not dictate Frontend (Patrick Schmitz, Berkeley/Yahoo!) • Smart processing is done by the backend under the hood. • Tagging should be as effortless as possible, assisted but not automatic. Fight Analysis-Paralysis (Rashmi Sinha) • Systems should be built to incite people to tag. Bring Value to the tagger RawSugar

  43. What is Missing?Tag relations • “Tag Relations improve searchability and exploration.” • Similar tags: • Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged, • Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming, • Tag groups or subtags: • Location -> san francisco, london, new york, etc. • Food -> sushi, sashimi, pizza, etc. • Programming -> html, java, css, etc. Goal : Discover them by Mining the tag space RawSugar

  44. Flickr – Clusters RawSugar

  45. Clustering – Step 1Similarity among tags RawSugar

  46. Some good Clusters found RawSugar

  47. Tags that belong to the same clusters - RawSugar

  48. Dmoz – World Order RawSugar

  49. Dmoz – World Order RawSugar

  50. Recommendations: dpreview RawSugar

More Related