1 / 18

On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking

On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking. this research: Computer Science Department Stanford University kpfleger@cs.stanford.edu ksl.stanford.edu/~kpfleger. Karl Pfleger. now: Google kpfleger@google.com. Abstract.

tessa
Download Presentation

On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking this research: Computer Science Department Stanford University kpfleger@cs.stanford.edu ksl.stanford.edu/~kpfleger Karl Pfleger now: Google kpfleger@google.com

  2. Abstract • this talk about compositional models, not compositionality • a compositional hierarchy (CH) is a part-whole hierarchy • predictive CHs are sensitive to statistical properties of the environment and can predict unseen data • this talk discusses on-line learning of such structures • goal: scale automatically from low-level data to higher-level representations • approach: identify frequent patterns in data, enabling the future (hierarchical) discovery of even larger patterns

  3. Outline • Predictive compositional hierarchies • Learning compositional hierarchies • Boltzmann machine embodiment • Hebbian chunking • Key learning adjectives: unsupervised, on-line, data-driven, bottom-up, cumulative • again: not about how to represent transient compositions in one shot (e.g., sentence recognition or production), but about which compositions demonstrated strongly by the environment to learn

  4. Context and Knowledge • context allows lateral inference (fill in the blank, resolve the ambiguity, predict the future/past) • this requires knowledge of common patterns

  5. An Example Compositional Hierarchy • multiple edges can link the same parent and child

  6. Neural Nets and Graphical Models • symmetric recurrent neural networks and graphical models can make general predictions as required • CHs can be encoded into network structure • activation flows along part-whole links • prior work did not learn weights or structure (e.g., IAM)

  7. Structure Learning Strategy • identify frequently occurring patterns of primitives or previously identified patterns, bottom-up • two problems to solve: • how to embody a CH in a predictive representation • how to incrementally grow the CH with new data

  8. Weight Sharing • we need chunks of different sizes (a hierarchy) • duplicated structure with shared weights insures ability to recognize small patterns at any position

  9. A Hybrid Model • Interactive Activation Model of context effects in letter perception (McClelland & Rumelhart) • SEQUITUR (Nevill-Manning): greedy CH learner (chunk as soon as any sequence repeats) • SEQUITUR-IAM: direct encoding of SEQUITUR-induced CH structure in IA-style network • Interesting result: with no weight tuning reproduces many IAM phenomena • hierarchy helps: subword chunks provide added explanation of pronouncable non-word behavior

  10. Categorical Boltzmann Machines • Boltzmann machines have nice probabilistic semantics, but no work embedding CHs in BMs • need to generalize binary variables to categorical • groups (or pools) of nodes represent a variable, with one node per value (instead of one/variable) • connect two pools by connecting all nodes pairwise

  11. Hebb-Rule Based Chunking • add new chunks by growing new hidden nodes • trigger by correlations denoted by large Hebbian weights • promote correlations to 1st-class entities • 1st use of Hebbian dynamics for chunking

  12. Example Learned Chunks • successfully learns a hierarchy of frequent chunks • 2-chunks: he, th, in, nd, an, er, ng, of, ed, hi, ou, ve, st, ly, on, re, as, wa, ll, ha, be, it, co, wi, ur, sh, ow, me, gh, ma, om, wh, by, ut, ch, is, to, ck, fo, ak, ul, at, ac, av, ab, yo, pr, li, br, up, po, im, or, ex, us, ic, ev, un • 3-chunks: the, ing, and, was, her, ver, you, his, hat, wit, for, man, com, oul, hav, ugh, oug, ved, abr, con, red, all, she, eve, vin, uld, ery, hic, ich • improves prediction accuracy with more data • first system to do bottom-up on-line CH learning with statistical sensitivity for arbitrary prediction patterns

  13. Compositional Hierarchies Related Work (in addition to the areas likely represented at this symposium...) • prespecified CHs: HEARSAY-II, IAM (McClelland & Rumelhart ’81) • non-predictive: SEQUITUR (Nevill-Manning ’96), MK10 (Wolff ’75) • SCFG induction: Stolcke & Omohundro ’94, Langley & Stromsten ’00 • hierarchical RL: Ring ’95, Drescher ’93, Andre ’98, Sun & Sessions ’00 • segmentation: Olivier ’68, Redlich ’93, Saffran et al ’96, Brent ’99, Venkataraman ’01, Cohen ’01 • hierarchical HMMs: Fine et al ’98, Murphy ’01 • layered reps. in parameterized nets: Lewicki & Sejnowski ’97, Hinton • speedup learning: Soar, EBL • misc: de Marken, Geman & Potter, Lempel-Ziv compression

  14. Hierarchical Sparse n-grams • part of a larger program on learning compositional hierarchies (Pfleger ’02) • n-grams are non-connectionist, but simpler for direct study of frequent pattern/chunk selection (picking needles from exponential haystack) • hierarchical sparse n-grams: multiwidth n-grams + frequent itemset pruning + on-line learning (Pfleger ’04)

  15. Expressiveness vs. Learning Tradeoff • CHs are the simplest representation capable of embodying unbounded complexity • HHMMs in middle; learning structure is hard (also for SEQUITUR extensions) • work on connectionist symbol processing or sparse distributed representations also has taxonomic aspects; compositional structure learning largely unaddressed; lessons here

  16. Conclusion • constructive node creation rule for Boltzmann machines that specifically builds compositionally hierarchical representations of common patterns • extension of early non-learning CH work • frequent patterns have myriad uses in cognition (memory, communication, learning, associative thought) • future: • combine compositional and taxonomic learning • embed CH learning in other compositional reps (esp. distributed) • fold in goals/reward/transfer to help direct chunking

More Related