Personalized Web Information Extraction and Management Strategies

CSE 574 Extracting, Managing & Personalizing Web Information • Staffing • Dan Weld • Raphael Hoffmann • Content • Intersection of AI, ML, DB & HCI • Student Responsibilities • Reading, Reports, Discussion • Project (for those taking 3 credits)

Class Focus Extracting, Managing & Personalizing Web Information

Why Information Extraction • Next-Generation Search • Citeseer, Google scholar, MSRA Libra • Google product search • Flipdog • Zvents • Zoominfo • Question Answering

People

…Continued

…Continued Some More

Making Structured Content • Information Extraction • E.g. Google Scholar • Cons: Noisy • Communal Content Creation • E.g. Wikipedia • Cons: Bootstrapping & Incentives

Why Managing ? • Select • Store, Index, Aggregate • Search, Query, Explore • Share, Collaborate, “Publish” Example: Personalized Portals cf DBlife, Rexa, Dontcheva UIST-07

DBlife

Summaries - 1

Summaries - 2

Summaries - 3

Summaries - 4

Summaries - 5

Summaries - 6

Why Personalize? • Because we can.

Preliminary Schedule • Information Extraction • Traditional Machine Learning Approaches • Self-Supervised Methods • Other Issues: Coreference & Ontology • Collaborative Content Creation & UI Issues • Applying Contraints from Interaction to Learning • Decision Theoretic Interaction • Faceted Interfaces • Community Information Management • Extraction over Evolving Text • Data Provenance • Mashups & Personalized Web • Next-Generation Search • Inference, Textual Entailment, Machine Reading • Entity Search

For next time • Read • Agichtein, Gravano. Snowball: Extracting Relations from Large Plain-Text Collections. • Add yourself to mailing list • Look at papers on website wiki • Add new ones • Add summary (different from report) • Notate if you wish to present one • Think about project / (form a group?)

Personalized Web Information Extraction and Management Strategies