510 likes | 518 Views
Explore the importance of expressive probabilistic languages for efficient learning in artificial intelligence. Understand the role of open-universe models in fast reasoning and discover how these languages facilitate faster learning processes. Delve into the history and evolution of expressiveness in probability logic, from atomic propositions to first-order and relational structures. Learn about first-order probabilistic languages like Gaifman and explore their syntax and semantics. Discover the significance of open-universe models in various AI applications like vision, NLP, and information integration. Take a deep dive into BLOG, a generative language that constructs worlds using dependency and number statements. Understand how BLOG models ensure unique probability distributions over possible worlds for accurate reasoning. Dive into real-world applications like Citation Matching and Collaborative Interface Agents to see the practical implications of these languages in AI development.
E N D
Outline • Why we need expressive probabilistic languages [a.k.a. preaching to choir] • Expressiveness and openness • BLOG: a generative language
The world has things in it!! • Expressive language => concise models => fast learning, sometimes fast reasoning E.g., rules of chess: 1 page in first-order logic, ~100000 pages in propositional logic, ~100000000000000000000000000000000000000 pages as atomic-state model [Note: chess is a teeny problem] • Expressiveness essential for general-purpose AI via learning (rather than constant reprogramming)
Hidden expressiveness • Alpha-beta is an atomic state-space search • Chess programs don’t really express the transition model atomically • They use • Universal quantification (defun f (x y) ….) • Loops (loop for x from 1 to 8 do …) • Complex logical terms (cons (cons x y) z)) • But • This “knowledge” is single-purpose, inaccessible • Most learning algorithms don’t output procedural programs - declarative languages seem more suited
Brief history of expressiveness probability logic atomic propositional first-order/relational
Brief history of expressiveness probability 5th C B.C. logic atomic propositional first-order/relational
Brief history of expressiveness 17th C probability 5th C B.C. logic atomic propositional first-order/relational
Brief history of expressiveness 17th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 17th C 20th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 17th C 20th C 21st C probability 5th C B.C. 19th C logic atomic propositional first-order/relational
Brief history of expressiveness 17th C 20th C 21st C probability (be patient!) 5th C B.C. 19th C logic atomic propositional first-order/relational
First-order probabilistic languages • Gaifman [1964]: • distributions over first-order possible worlds • Halpern [1990]: • syntax for constraints on such distributions • Poole [1993], Sato [1997], Koller and Pfeffer [1998], various others: • KB defines distribution exactly (cf. Bayes nets) • assumes unique names and domain closure like Prolog, databases (Herbrand semantics)
Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have?
Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2 First-order logical semantics: Between 1 and ∞
Possible worlds • Propositional
A B C D Possible worlds • Propositional • First-order + unique names, domain closure A B A B A B A B C D C D C D C D
A B C D Possible worlds • Propositional • First-order + unique names, domain closure • First-order open-universe A B A B A B A B C D C D C D C D A B C D A B C D A B C D A B C D A B C D A B C D
Possible worlds contd. • First-order logic is just one way to define sets of open-universe relational worlds • Distinction between “programming languages” and “logic” is not completely clear (cf. Prolog’s dual semantics) • Every “program” is an assertion in temporal logic with exactly one model per input
Open-universe models • Essential for learning about what exists, e.g., vision, NLP, information integration, tracking, life • [Note the GOFAI Gap: logic-based systems going back to Shakey assumed that perceived objects would be named correctly] [IJCAI 97, IJCAI 99, IJCAI 01, NIPS 02, CDC 04,IJCAI 05, AI/Stats 06, UAI 06] Tim Huang, Hanna Pasula, Brian Milch, Bhaskara Marthi, David Sontag, Songhwai Oh, Nimar Arora, Rodrigo Braz, Erik Sudderth
Open-universe models in BLOG • Construct worlds using two kinds of steps, proceeding in topological order: • Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values • Includes setting the referent of a constant symbol (0-ary function)
Open-universe models in BLOG • Construct worlds using two kinds of steps, proceeding in topological order: • Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values • Includes setting the referent of a constant symbol (0-ary function) • Number statements: Add some objects to the world, conditioned on what objects and relations exist so far
Semantics Every well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net * No infinite receding ancestor chains, no conditioned cycles, all expressions finitely evaluable
Example: Citation Matching [Lashkari et al 94] Collaborative Interface Agents, Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, 1994. Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August 1994. Are these descriptions of the same object? Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
Citation Matching Results Four data sets of ~300-500 citations, referring to ~150-300 papers
Example: Sibyl attacks • Typically between 100 and 10,000 real entities • About 90% are honest, have one identity • Dishonest entities own between 10 and 1000 identities. • Transactions may occur between identities • If two identities are owned by the same entity (sibyls), then a transaction is highly likely; • Otherwise, transaction is less likely (depending on honesty of each identity’s owner). • An identity may recommend another after a transaction: • Sibyls with the same owner usually recommend each other; • Otherwise, probability of recommendation depends on the honesty of the two entities.
#Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior () else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y))); Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)
State Estimation for “Aircraft” • Dependency statements for simple model: #Aircraft ~ NumAircraftPrior(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r)if (Source(r) = null) then ~ FalseAlarmDistrib()else ~ ObsCPD(State(Source(r), Time(r)));
Aircraft Entering and Exiting #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t)if t < EntryTime(a) then = falseelseif t = EntryTime(a) then = trueelse = (InFlight(a, t-1) & !Exits(a, t-1)); State(a, t)if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); …plus last two statements from previous slide
Extending the Model: Air Bases • Suppose aircraft don’t just enter and exit, but actually take off and land at bases • Want to track how many aircraft there are at each base • Aircraft have destinations (particular bases) that they generally fly towards • Assume set of bases is known
Extending the Model: Air Bases #Aircraft(InitialBase = b) ~ InitialAircraftPerBasePrior(); CurBase(a, t) if t = 0 then = InitialBase(b) elseif TakesOff(a, t-1) then = null elseif Lands(a, t-1) then = Dest(a, t-1) else = CurBase(a, t-1); InFlight(a, t) = (CurBase(a, t) = null); TakesOff(a, t) if !InFlight(a, t) then ~ Bernoulli(0.1); Lands(a, t) if InFlight(a, t) then ~ LandingCPD(State(a, t), Location(Dest(a, t))); Dest(a, t) if TakesOff(a, t) then ~ Uniform({Base b}) elseif InFlight(a, t) then = Dest(a, t-1) State(a, t) if TakesOff(a, t-1) then ~ InitState(Location(CurBase(a, t-1))) elseif InFlight(a, t) then ~ StateTrans(State(a, t-1), Location(Dest(a, t)));
Unknown Air Bases • Just add two more lines: #AirBase ~ NumBasesPrior(); Location(b) ~ BaseLocPrior();
Experience at UC Irvine • “The first model we designed was the model implemented in BLOG. It is a very intuitive model, which seems to be true of most BLOG models. Writing the BLOG model … was nearly trivial.”
Inference BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well-formed model, for any first-order query Built-in MCMC is M-H on partial possible worlds with generic proposal conditioning on parents only=> SLOOOOW User may substitute any other proposer
Experience at UC Irvine, contd. • “One author set about writing another Markov logic model, while the other began writing a custom Metropolis-Hastings proposer for the BLOG model. This turned out to be a time consuming and non-trivial task…”
BLOG status • BLOG available online • npBLOG (Carbonetto et al., UAI 05) provided nonparametric extensions • DBLOG (open-universe state estimation): see Rodrigo’s poster • pyBLOG (a much faster reimplementation with generalized Gibbs and BUGS-like subproposal “experts”): see Nimar’s poster
BLOG status contd. • Blocking M-H seems to be essential for many applications with deterministic or near-deterministic relations • Need to develop a large library of models to gain experience, develop idioms • Structure-learning algorithms would be helpful • Compiler technology would also be helpful • Develop inference benchmarks • Explore multicore implementations
BLOG status contd. • Blocking M-H seems to be essential for many applications with deterministic or near-deterministic relations • Need to develop a large library of models to gain experience, develop idioms • Structure-learning algorithms would be helpful • Compiler technology would also be helpful • Develop inference benchmarks • Explore multicore implementations, or not