1 / 21

A Web of Concepts

This presentation discusses the concept of creating a semantically rich aggregate view of information on the web by transforming hyperlinked bags of words. It explores instances, instance representation, domain, usage studies, extraction techniques, application optimization, challenges, related work, and future developments.

averyg
Download Presentation

A Web of Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger

  2. Vision • Transform hyperlinked bags of words into semantically rich aggregate view of information on the web.

  3. Concept • Things of interest • Searching for information • Accomplishing a task • Reservations, etc.

  4. Instances • Record of a concept • Restaurant • Gochi (19980 Homestead Rd Cupertino CA) • Academia? • Publications, research institutions

  5. Instance Representation • Loosely-structured record (lrec) • Attribute-key, value pairs • Unique id field • Entity matching problem • Metadata • Attribute list

  6. Domain • Set of related concepts • Academic community domain = {publications, people, conferences}

  7. Usage StudyInstance vs. Concept Search • yelp.com • Month of queries resulting in a click (restaurants) • 59% specific business URL • 19% search URL either specific business or group • 11% specific group URL

  8. Usage StudyConcept Attribute Search • Remove restaurant name and location information from query • Co-occuring words: • Menu (3%), coupons (1.8%), online, weekly specials, locations (1.5%) • Nutrition, to go, delivery, careers, cod

  9. Usage StudyAggregation Value • 59% clicked on at least one other URL • 35% clicked on at least two other URLs • Small manual evaluation indicates pages are often about the same business.

  10. Usage StudyConcepts vs. Browsing • 42% of homepage visits are from search engine • Immediately following URL • 11.5% location • 9% menu • 1% coupons • 10.5% of user trails contain more than one distinct instance of the restaurant concept

  11. Extraction • Create new records from the web • Information extraction • Linking • Analysis • Meta-data tagging (cuisine type)

  12. Domain-centric vs. Site-centric Extraction • Site-centric extraction • Wrappers for page structure • Probabilistic models (CRF) • Domain-centric extraction • Fields of interest • Statistical properties (single zip code, etc.) • Structure components (lists, link relationships)

  13. Domain-centric Extraction • Aggregator mining • Learn from extracted knowledge (similar menus) • Matching • Text is “about” a record (restaurant review)

  14. ApplicationAggregation

  15. ApplicationSession Optimization • User understanding • Historical modeling • Session modeling • Content understanding • Example: Birks • Birks and Mayors (luxury Jewelers) vs. Birk’s Steakhouse

  16. ApplicationBrowse Optimization • Alternatives: (Restaurants) • Similar type of cuisine • Similar location • Similar quality • Augmentations: (Camera) • Batteries • Memory cards

  17. Concept Search Result Pages – shows multiple records Concept Pages – information about an instance Article Pages – a piece of authored text

  18. Advertising • Increase in targeted advertisements • Target concepts rather than keywords

  19. Challenges • Transfer learning • Transfer extractor knowledge • Tracking uncertainty • Accuracy issues • “Web of concepts is not a one time affair” • Wrapper problems • Concept updates • Relevance Measures • User satisfaction

  20. Related Work • Information Extraction/Integration Systems • Dataspace Systems • Semantic Web

  21. Future Work • Enrich representation model • Path storage to data • Provenance, versions, uncertainty • Hierarchal relationships (containment or inheritance) • Ranking of disparate sources

More Related