70 likes | 166 Views
PLUIE: Probability and Logic Unified for Information Extraction. Stuart Russell Patrick G allinari , P atrice Perny. Project Goals. “Open” i nformation extraction Construct knowledge bases from the web Learn new classes, relations, linguistic patterns Learn new predictive regularities
E N D
PLUIE: Probability and Logic Unified for Information Extraction Stuart Russell Patrick Gallinari, Patrice Perny
Project Goals • “Open” information extraction • Construct knowledge bases from the web • Learn new classes, relations, linguistic patterns • Learn new predictive regularities • Integrate facts, entities across multiple documents • Support question answering • Accuracy, consistency, integration, and utility; not scale for its own sake
Approach • Probabilistic inference with the Web as evidence • Generative models when available World Web
Approach, contd. • Open-universe probability models (e.g., BLOG) • First-order expressive power (objects, relations, functions, quantifiers, equality, etc.) • Allow for uncertainty about existence, identity of objects • Generative model consists of • What might be true in the world • Who might choose to say what • How they might choose to say it
Approach contd. • Rigorous ontological framework • Standard taxonomic hierarchy that supports distinctions needed for language • E.g., mass nouns (water) vs count nouns (lake) • Proper treatment of events and time; avoid deficient “facts” such as • Man Utd beat Chelsea; Chelsea beat Man Utd(PowerSet) • Hank Paulson is the CEO of Goldman Sachs (NELL)
Open questions • Efficient inference • What is extracted? Posterior over possible worlds? • How to identify new categories and relations • HCI: Presenting infinite heterogeneous posterior distributions: Who wrote what when when“who,” “what” and “when” vary across worlds? • Making use of partially extracted or unextracted information – “data spaces” (Franklin, Halevy) • Adversarial data: game-theoretic analysis?
Plan • Reading group • Weekly meeting (day and time?) • Participants take turns presenting • Reading list at www.cs.berkeley.edu/~russell/pluie/readings.html • Formal project (ANR) runs 1/1/13 to 8/31/14 • Will continue indefinitely • Hiring two postdocs • Possible collaborations • Tom Mitchell’s NELL project (CMU) • Andrew McCallum (UMass) • Kevin Murphy (Google’s Knowledge Graph project)