Using CTW as a language modeler in Dasher

Using CTW as a language modeler in Dasher Martijn van Veen 05-02-2007 Signal Processing Group Department of Electrical Engineering Eindhoven University of Technology

Overview • What is Dasher • And what is a language model • What is CTW • And how to implement it in Dasher • Decreasing the model costs • Conclusions and future work

Dasher • Text input method • Continuous gestures • Language model • Let’s give it a try! Dasher

Dasher: Language Model • Conditional probability for each alphabet symbol, given the previous symbols • Similar to compression methods • Requirements: • Sequential • Fast • Adaptive • Model is trained • Better compression -> faster text input

Dasher: Language model • PPM: Prediction by Partial Match • Predictions by models of different order • Weight factor for each model

Dasher: Language model • Asymptotically PPM reduces to fixed order context model • But the incomplete model works better!

CTW: Tree model • Source structure in the model, parameters memoryless • KT estimator: a = number of zeros b = number of ones

CTW: Context tree • Context-Tree Weighting: combine all possible tree models up to a maximum depth

CTW: tree update

CTW: Implementation • Current implementation • Ratio of block probabilities stored in each node • Efficient but patented • Develop a new implementation • Use only integer arithmetic, avoid divisions • Represent both block probabilities as fractions • Ensure denominators equal by cross-multiplication • Store the numerators, scale if necessary

CTW for Text • Binary decomposition • Adjust zero-order estimator

Results • Comparing PPM and CTW language models • Single file • Model trained with English text • Model trained with English text and user input

CTW: Model costs • What are model costs?

CTW: Model costs • Actual model and alphabet size fixed -> Optimize weight factor alpha • Per tree -> not enough parameters • Per node -> not enough adaptivity • Optimize alpha per depth of the tree

CTW: Model costs • Exclusion: only use Betas of the actual model • Iterative process • Convergent? • Approximation: To find actual model use Alpha = 0.5

CTW: Model costs • Compression of an input sequence • Model costs significant, especially for short sequence • No decrease by optimizing alpha per depth?

CTW: Model costs • Maximize probability in the root, instead of the probability per depth • Exclusion based on alpha = 0.5 almost optimal

CTW: Model costs Results in Dasher scenario: • Trained model • Negative effect if no user text is available • Trained with concatenated user text • Small positive effect if user text added to training text, and very similar to it

Conclusions • New CTW Implementation • Only integer arithmetic • Avoids patented techniques • New decomposition tree structure • Dasher language model based on CTW • 6 percent more are accurate predictions than PPM-D • Decreasing the model costs • Only insignificant decrease possible with our method

Future work • Make CTW suitable for MobileDasher • Decrease memory usage • Decrease number of computations • Combine language models • Select locally best model, or weight models together • Combine languages in 1 model • Models differ in structure or in parameters?

Thank you for your attention Ask away!

CTW: Implementation • Store the numerators of the block probabilities

Using CTW as a language modeler in Dasher