Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman- Yor Process

Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman-Yor Process Nicholas Bartlett, David Pfau, Frank Wood Presented by Yingjian Wang Nov. 17, 2010

Outline • Background • The sequential memoizer • Forgetting • The dependent HPY • Experiment results

Background 2006,Teh, ‘A hierarchical Bayesian language model based on Pitman-Yor processes’ N-gram Markov chain language model with the HPY prior. The Sequential Memoizer (SM) with linear space/time inference scheme. (lossless) 2009, Wood, ‘A Stochastic Memoizer for Sequence Data’ Combine the SM with an arithmetic coder to develop a compressor (PLUMP/dePLUMP), see www.deplump.com. 2010, Gasthaus, ’ Lossless compression based on the Sequence Memoizer’ 2010, Bartlett, ‘Forgetting Counts : Constant Memory Inference for a Dependent HPY’ Develop a constant memory/space inference for the SM, by using a dependent HPY. (with loss)

SM-Two concepts • Memoizer (Donald Michie, 1968): A device which • returns former results under the same input instead of recalculating in order to save time. • Stochastic Memoizer (Wood, 2009): The returned results can change since the prediction probability is based upon a stochastic process.

SM-model and trie • model: • The prefix trie: restaurants.

SM-the NSP (1) • The Normalized Stable Process: (Perman, 1990) Pitman-Yor Process: Discount parameter: d=0 Concentration parameter: c=0 Dirichlet Process: A Normalized Stable Process

SM-the NSP (2) • Collapse the middle restaurants: Theorem: If: Then: • Prefix tree: restaurants (Weiner, 1973; Ukkonen, 1995)

SM-linear space inference

Forgetting • Motivation: to achieve constant memory inference on the basis of SM.How to do? --- • Methods – Forgetting/delete the restaurants. • Restaurants - the basic memory units in the context tree: • How to delete? – two deletion schemes: random deletion; greedy deleting.

Deletion schemes • Random deletion: uniformly delete one leaf restaurant. • Greedy deletion: least negatively impacts the estimated likelihood of the observed sequence. Leaf restaurants

The SMC algorithm

The dependent HPY • But wait, what we get after the deletion-addition? Will the processes be independent? – No (Since the seating arrangement in the parent restaurant has been changed.)

The experiment results

Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman- Yor Process