1.14k likes | 1.15k Views
This article discusses the application of Hierarchical Bayesian Models (HBMs) for understanding Universal Grammar. It explores how HBMs can represent and reason about knowledge at multiple levels of abstraction, and how they have been used in various cognitive problems such as causal reasoning, language, vision, word learning, and decision making.
E N D
Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) Grammar Phrase structure Utterance Speech signal
Vision (Han and Zhu, 2006)
Word learning Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Principles Structure Data
Hierarchical Bayesian models • Can represent and reason about knowledge at multiple levels of abstraction. • Have been used by statisticians for many years.
Hierarchical Bayesian models • Can represent and reason about knowledge at multiple levels of abstraction. • Have been used by statisticians for many years. • Have been applied to many cognitive problems: • causal reasoning (Mansinghka et al, 06) • language (Chater and Manning, 06) • vision (Fei-Fei, Fergus, Perona, 03) • word learning (Kemp, Perfors, Tenenbaum,06) • decision making (Lee, 06)
Outline • A high-level view of HBMs • A case study • Semantic knowledge
Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) P(grammar | UG) Grammar P(phrase structure | grammar) Phrase structure P(utterance | phrase structure) Utterance P(speech | utterance) Speech signal
Hierarchical Bayesian model U Universal Grammar P(G|U) G Grammar P(s|G) s1 s2 s3 s4 s5 s6 Phrase structure P(u|s) u1 u2 u3 u4 u5 u6 Utterance
Hierarchical Bayesian model U Universal Grammar P(G|U) G Grammar P(s|G) s1 s2 s3 s4 s5 s6 Phrase structure P(u|s) u1 u2 u3 u4 u5 u6 Utterance A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy:P({ui}, {si}, G | U) = P ({ui} | {si}) P({si} | G) P(G|U)
Knowledge at multiple levels • Top-down inferences: • How does abstract knowledge guide inferences at lower levels? • Bottom-up inferences: • How can abstract knowledge be acquired? • Simultaneous learning at multiple levels of abstraction
Top-down inferences U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Given grammar G and a collection of utterances, construct a phrase structure for each utterance.
Top-down inferences U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Infer {si} given {ui}, G: P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)
Bottom-up inferences U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Given a collection of phrase structures, learn a grammar G.
Bottom-up inferences U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Infer G given {si} and U: P(G| {si}, U) α P( {si} | G) P(G|U)
Simultaneous learning at multiple levels U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Given a set of utterances {ui} and innate knowledge U, construct a grammar G and a phrase structure for each utterance.
Simultaneous learning at multiple levels U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance • A chicken-or-egg problem: • Given a grammar, phrase structures can be constructed • Given a set of phrase structures, a grammar can be learned
Simultaneous learning at multiple levels U Universal Grammar G Grammar s1 s2 s3 s4 s5 s6 Phrase structure u1 u2 u3 u4 u5 u6 Utterance Infer G and {si} given {ui} and U: P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)
Hierarchical Bayesian model U Universal Grammar P(G|U) G Grammar P(s|G) s1 s2 s3 s4 s5 s6 Phrase structure P(u|s) u1 u2 u3 u4 u5 u6 Utterance
Knowledge at multiple levels • Top-down inferences: • How does abstract knowledge guide inferences at lower levels? • Bottom-up inferences: • How can abstract knowledge be acquired? • Simultaneous learning at multiple levels of abstraction
Outline • A high-level view of HBMs • A case study: Semantic knowledge
Folk Biology The relationships between living kinds are well described by tree-structured representations R: principles mouse S: structure squirrel chimp gorilla D: data “Gorillas have hands”
Folk Biology R: principles Structural form: tree mouse squirrel S: structure chimp gorilla D: data
Outline • A high-level view of HBMs • A case study: Semantic knowledge • Property induction • Learning structured representations • Learning the abstract organizing principles of a domain
Property induction R: principles Structural form: tree mouse squirrel S: structure chimp gorilla D: data
Property Induction Structural form: tree Stochastic process: diffusion R: Principles mouse squirrel S: structure chimp gorilla D: data Approach: work with the distribution P(D|S,R)
Horses have T4 cells. Elephants have T4 cells. All mammals have T4 cells. Horses have T4 cells. Seals have T4 cells. All mammals have T4 cells. Property Induction Previous approaches: Rips (75), Osherson et al (90), Sloman (93), Heit (98)
Bayesian Property Induction Hypotheses
Bayesian Property Induction Hypotheses
Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. } D C
Chimps have T4 cells. Gorillas have T4 cells. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations Choosing a prior
Bayesian Property Induction • A challenge: • We have to specify the prior, which typically includes many numbers • An opportunity: • The prior can capture knowledge about the problem.
Property Induction Structural form: tree Stochastic process: diffusion R: Principles mouse squirrel S: structure chimp gorilla D: data
Biological properties • Structure: • Living kinds are organized into a tree • Stochastic process: • Nearby species in the tree tend to share properties
Stochastic Process • Nearby species in the tree tend to share properties. • In other words, properties tend to be smooth over the tree. Smooth Not smooth
Stochastic process Hypotheses
Generating a property y h where y tends to be smooth over the tree: threshold
The diffusion process where Ө(yi) is 1 if yi≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S
p(y|S,R): Generating a property Let yi be the feature value at node i i } j (Zhu, Lafferty, Ghahramani 03)
Biological properties Structural form: tree Stochastic process: diffusion R: Principles mouse squirrel S: structure chimp gorilla D: data Approach: work with the distribution P(D|S,R)
Horses have T4 cells. Elephants have T4 cells. Cows have T4 cells. } D C
Cows have property P. Elephants have property P. Horses have property P. Dolphins have property P. Seals have property P. Horses have property P. Results Human Model (Osherson et al)
Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Results Human Model Cows have property P. Elephants have property P. Horses have property P. All mammals have property P.
Spatial model Structural form: 2D space Stochastic process: diffusion R: principles squirrel mouse S: structure gorilla chimp D: data
Tree vs 2D Tree + diffusion 2D + diffusion “horse” “all mammals”