The open universe

The open universe

Outline • Why we need expressive probabilistic languages [a.k.a. preaching to choir] • Expressiveness and openness • BLOG: a generative language

The world has things in it!! • Expressive language => concise models => fast learning, sometimes fast reasoning E.g., rules of chess: 1 page in first-order logic, ~100000 pages in propositional logic, ~100000000000000000000000000000000000000 pages as atomic-state model [Note: chess is a teeny problem] • Expressiveness essential for general-purpose AI via learning (rather than constant reprogramming)

Hidden expressiveness • Alpha-beta is an atomic state-space search • Chess programs don’t really express the transition model atomically • They use • Universal quantification (defun f (x y) ….) • Loops (loop for x from 1 to 8 do …) • Complex logical terms (cons (cons x y) z)) • But • This “knowledge” is single-purpose, inaccessible • Most learning algorithms don’t output procedural programs - declarative languages seem more suited

Brief history of expressiveness probability logic atomic propositional first-order/relational

Brief history of expressiveness probability 5th C B.C. logic atomic propositional first-order/relational

Brief history of expressiveness 17th C probability 5th C B.C. logic atomic propositional first-order/relational

Brief history of expressiveness 17th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational

Brief history of expressiveness 17th C 20th C probability 5th C B.C. 19th C logic atomic propositional first-order/relational

Brief history of expressiveness 17th C 20th C 21st C probability 5th C B.C. 19th C logic atomic propositional first-order/relational

Brief history of expressiveness 17th C 20th C 21st C probability (be patient!) 5th C B.C. 19th C logic atomic propositional first-order/relational

First-order probabilistic languages • Gaifman [1964]: • distributions over first-order possible worlds • Halpern [1990]: • syntax for constraints on such distributions • Poole [1993], Sato [1997], Koller and Pfeffer [1998], various others: • KB defines distribution exactly (cf. Bayes nets) • assumes unique names and domain closure like Prolog, databases (Herbrand semantics)

Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have?

Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2 First-order logical semantics: Between 1 and ∞

Possible worlds • Propositional

A B C D Possible worlds • Propositional • First-order + unique names, domain closure A B A B A B A B C D C D C D C D

A B C D Possible worlds • Propositional • First-order + unique names, domain closure • First-order open-universe A B A B A B A B C D C D C D C D A B C D A B C D A B C D A B C D A B C D A B C D

Possible worlds contd. • First-order logic is just one way to define sets of open-universe relational worlds • Distinction between “programming languages” and “logic” is not completely clear (cf. Prolog’s dual semantics) • Every “program” is an assertion in temporal logic with exactly one model per input

Open-universe models • Essential for learning about what exists, e.g., vision, NLP, information integration, tracking, life • [Note the GOFAI Gap: logic-based systems going back to Shakey assumed that perceived objects would be named correctly] [IJCAI 97, IJCAI 99, IJCAI 01, NIPS 02, CDC 04,IJCAI 05, AI/Stats 06, UAI 06] Tim Huang, Hanna Pasula, Brian Milch, Bhaskara Marthi, David Sontag, Songhwai Oh, Nimar Arora, Rodrigo Braz, Erik Sudderth

Open-universe models in BLOG • Construct worlds using two kinds of steps, proceeding in topological order: • Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values • Includes setting the referent of a constant symbol (0-ary function)

Open-universe models in BLOG • Construct worlds using two kinds of steps, proceeding in topological order: • Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values • Includes setting the referent of a constant symbol (0-ary function) • Number statements: Add some objects to the world, conditioned on what objects and relations exist so far

Semantics Every well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net * No infinite receding ancestor chains, no conditioned cycles, all expressions finitely evaluable

Example: Citation Matching [Lashkari et al 94] Collaborative Interface Agents, Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, 1994. Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August 1994. Are these descriptions of the same object? Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry

(Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

Citation Matching Results Four data sets of ~300-500 citations, referring to ~150-300 papers

Example: Sibyl attacks • Typically between 100 and 10,000 real entities • About 90% are honest, have one identity • Dishonest entities own between 10 and 1000 identities. • Transactions may occur between identities • If two identities are owned by the same entity (sibyls), then a transaction is highly likely; • Otherwise, transaction is less likely (depending on honesty of each identity’s owner). • An identity may recommend another after a transaction: • Sibyls with the same owner usually recommend each other; • Otherwise, probability of recommendation depends on the honesty of the two entities.

#Entity ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Identity(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior () else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y))); Evidence: lots of transactions and recommendations, maybe some Honest(.) assertions Query: Honest(x)

Example: classical data association

State Estimation for “Aircraft” • Dependency statements for simple model: #Aircraft ~ NumAircraftPrior(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r)if (Source(r) = null) then ~ FalseAlarmDistrib()else ~ ObsCPD(State(Source(r), Time(r)));

Aircraft Entering and Exiting #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t)if t < EntryTime(a) then = falseelseif t = EntryTime(a) then = trueelse = (InFlight(a, t-1) & !Exits(a, t-1)); State(a, t)if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); …plus last two statements from previous slide

Extending the Model: Air Bases • Suppose aircraft don’t just enter and exit, but actually take off and land at bases • Want to track how many aircraft there are at each base • Aircraft have destinations (particular bases) that they generally fly towards • Assume set of bases is known

Extending the Model: Air Bases #Aircraft(InitialBase = b) ~ InitialAircraftPerBasePrior(); CurBase(a, t) if t = 0 then = InitialBase(b) elseif TakesOff(a, t-1) then = null elseif Lands(a, t-1) then = Dest(a, t-1) else = CurBase(a, t-1); InFlight(a, t) = (CurBase(a, t) = null); TakesOff(a, t) if !InFlight(a, t) then ~ Bernoulli(0.1); Lands(a, t) if InFlight(a, t) then ~ LandingCPD(State(a, t), Location(Dest(a, t))); Dest(a, t) if TakesOff(a, t) then ~ Uniform({Base b}) elseif InFlight(a, t) then = Dest(a, t-1) State(a, t) if TakesOff(a, t-1) then ~ InitState(Location(CurBase(a, t-1))) elseif InFlight(a, t) then ~ StateTrans(State(a, t-1), Location(Dest(a, t)));

Unknown Air Bases • Just add two more lines: #AirBase ~ NumBasesPrior(); Location(b) ~ BaseLocPrior();

Experience at UC Irvine • “The first model we designed was the model implemented in BLOG. It is a very intuitive model, which seems to be true of most BLOG models. Writing the BLOG model … was nearly trivial.”

Inference BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well-formed model, for any first-order query Built-in MCMC is M-H on partial possible worlds with generic proposal conditioning on parents only=> SLOOOOW User may substitute any other proposer

Experience at UC Irvine, contd. • “One author set about writing another Markov logic model, while the other began writing a custom Metropolis-Hastings proposer for the BLOG model. This turned out to be a time consuming and non-trivial task…”

BLOG status • BLOG available online • npBLOG (Carbonetto et al., UAI 05) provided nonparametric extensions • DBLOG (open-universe state estimation): see Rodrigo’s poster • pyBLOG (a much faster reimplementation with generalized Gibbs and BUGS-like subproposal “experts”): see Nimar’s poster

BLOG status contd. • Blocking M-H seems to be essential for many applications with deterministic or near-deterministic relations • Need to develop a large library of models to gain experience, develop idioms • Structure-learning algorithms would be helpful • Compiler technology would also be helpful • Develop inference benchmarks • Explore multicore implementations

BLOG status contd. • Blocking M-H seems to be essential for many applications with deterministic or near-deterministic relations • Need to develop a large library of models to gain experience, develop idioms • Structure-learning algorithms would be helpful • Compiler technology would also be helpful • Develop inference benchmarks • Explore multicore implementations, or not

The open universe

The open universe

Presentation Transcript

The Universe

The Universe

THE UNIVERSE

The Universe

The Universe

The universe

The Universe

THE UNIVERSE

The Universe

The Universe…

The Universe

The Universe

The Universe

THE UNIVERSE

The Universe

The Universe

The Universe

The Universe

The Universe

The Universe

The Universe

The “Universe”