10 likes | 207 Views
“t”. “h”. “e”. Feature Extraction. How to represent a keystroke? Vector of features: FFT, Cepstrum Cepstrum features is better Also used in speech recognition. Initial training. wave signal. Feature Extraction. Unsupervised Learning. Language Model Correction. Sample Collector.
E N D
“t” “h” “e” Feature Extraction • How to represent a keystroke? • Vector of features: • FFT, Cepstrum • Cepstrum features is better • Also used in speech recognition Initial training wave signal Feature Extraction Unsupervised Learning Language Model Correction Sample Collector Classifier Builder keystroke classifierrecovered keystrokes 2 11 5 Li Zhuang, Feng Zhou, J. D. Tygar, {zl,zf,tygar}@cs.berkeley.edu, University of California, Berkeley http://redtea.cs.berkeley.edu/~zl/keyboard Sample Collector Motivation Subsequent recognition wave signal Keyboard Acoustics Emanations Revisited • Emanations of electronic devices leak information • How much information is leaked by emanations? • Apply statistical learning methods to security • What is learned from sound of typing on a keyboard? Feature Extraction Before spelling and grammar correction Keystroke Classifier(use trained classifiers for each key to recognize sound samples) After spelling and grammar correction Language Model Correction Alicepassword Feedback-based Training Recovered keystrokes Acoustic Information: Previous and Ours • Feedback for more rounds of training • Output: keystroke classifier • Language independent • Can be used to recognize random sequence of keys • E.g. passwords • Representation of keystroke classifier • Neural networks, linear classification, Gaussian mixtures • Frequency information in sound of each typed key • Why do keystrokes make different sounds? • Different locations on the supporting plate • Each key is slightly different Some Experiment Results Unsupervised Learning 4 date sets (12~27mins of recordings) • Group keystrokes into N clusters • Assign keystroke a label, 1, …, N • Find best mapping from cluster labels to characters • Some character combinations are more common • “th” vs. “tj” • Hidden Markov Models (HMMs) 3 different models of keyboards (12mins recording) Key Observation • Build acoustic model for keyboard & typist • Non-random typed text (English) • Limited number of words • Limited letter sequences (spelling) • Limited word sequences (grammar) • Build language model • Statistical learning theory • Natural language processing 3 different supervised learning methods in feedback Language Model Correction 4/26/2006