1 / 25

Wrapper Induction for End-User Semantic Content Development

Wrapper Induction for End-User Semantic Content Development. Andrew Hogue MIT CSAIL. Acknowledgments. David Karger (karger@csail.mit.edu) Haystack Group (http://haystack.csail.mit.edu). Labeling the Semantic Web. Semantic Web requires RDF labeling of semantic data

varuna
Download Presentation

Wrapper Induction for End-User Semantic Content Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wrapper Induction for End-User Semantic Content Development Andrew Hogue MIT CSAIL Interaction Design and the Semantic Web

  2. Acknowledgments • David Karger (karger@csail.mit.edu) • Haystack Group (http://haystack.csail.mit.edu) Interaction Design and the Semantic Web

  3. Labeling the Semantic Web • Semantic Web requires RDF labeling of semantic data • Most existing labeling methods geared towards content providers • End-user tools require knowledge of underlying HTML of page • Goal: easy interface for non-technical end-users Interaction Design and the Semantic Web

  4. Labeling the Semantic Web • Our approach: create patterns for existing semantic content • User provides examplesof semantic content in the browser • Induce patterns from examples • Pattern matches provide content-specific context menus Interaction Design and the Semantic Web

  5. Labeling the Semantic Web • Extends Haystack information management client • Provides context-sensitive menus • Matched patterns overlay semantic context on Web documents Interaction Design and the Semantic Web

  6. Demo Interaction Design and the Semantic Web

  7. Wrapper Induction • Wrapper: pattern created from examples • User provides positive examples • Generalize examples into reusable pattern • Existing techniques: • head-left-right-tail (HLRT) descriptors • Hidden Markov models • Support Vector Machines • Other Machine Learning Interaction Design and the Semantic Web

  8. Wrapper Induction • Our approach: take advantage of hierarchical structure of HTML • Each example picks out a subtree of DOM • Calculate tree edit distance between examples • Least-cost edit distance gives best mapping • Remove unmapped nodes to make pattern Interaction Design and the Semantic Web

  9. Edit Distance • Least-cost sequence of operations to transform one tree into the other • Operations: insert, delete, change a node • Cost of an operation = size of subtree it affects • Byproduct: best mapping between elements Interaction Design and the Semantic Web

  10. Mapping Examples Interaction Design and the Semantic Web

  11. Underlying Structure • Each example is built with similar HTML • Only text is different • Tree edit distance provides us with a mapping • Create general pattern by removing unmapped nodes • Replace with wildcards Interaction Design and the Semantic Web

  12. Mapping Examples Interaction Design and the Semantic Web

  13. Mapping Examples Interaction Design and the Semantic Web

  14. Pattern Matching • Look for document subtrees with similar structure • Find alignments of wrapper in tree • Require every node in wrapper be mapped to some node in document subtree • Wildcards match zero or more times • Each valid alignment is a match Interaction Design and the Semantic Web

  15. Matching Example Interaction Design and the Semantic Web

  16. Matching Example Interaction Design and the Semantic Web

  17. Adding Semantics • How to tie wrappers to semantic content? • Assert RDF statements • Tied to wrapper structure • Classes bound to wrappers • Properties bound to wildcards Interaction Design and the Semantic Web

  18. Semantic Labels Interaction Design and the Semantic Web

  19. Semantic Matching Interaction Design and the Semantic Web

  20. Semantic Matching Interaction Design and the Semantic Web

  21. Semantic Matching [ <rdf:type> <TalkAnnouncement> ; <series> “Dertouzos Lect…” ; <dc:title> “Distributed Hash…” ; <time> “3:30 PM” ] Interaction Design and the Semantic Web

  22. Additional Heuristics • Allow us to create more flexible, reusable patterns with as few as a single example • List Collapse • Context • Automatic additional examples • URL Prefixes Interaction Design and the Semantic Web

  23. Our Contributions • Ease-of-use • Few examples required • Wrappers bridge syntactic-semantic gap Interaction Design and the Semantic Web

  24. Future Work and Applications • Document-level classes • Mozilla port • “Push” wrappers • Page reformatting • Autonomous agent interaction • Wrapper sharing • Automatic wrapper induction Interaction Design and the Semantic Web

  25. ahogue@csail.mit.edu http://haystack.csail.mit.edu Interaction Design and the Semantic Web

More Related