400 likes | 604 Views
Keystroke Timing Attacks in a Free-Text Environment. Joshua Pereyda. Outline. Introduction Previous Research Research and Results Conclusion and Future Research. Introduction – Encryption. Encryption Obscures or hides data as it travels over a network
E N D
Keystroke Timing Attacks in a Free-Text Environment Joshua Pereyda
Outline Introduction Previous Research Research and Results Conclusion and Future Research
Introduction – Encryption • Encryption • Obscures or hides data as it travels over a network • Secures financial and personal data, state secrets, etc. • Cryptosystem • Any system that uses encryption • Common target for hackers
Introduction – Attack Vectors • Cryptosystem Attack Vectors • Encryption algorithm • Protocol • Side channel
Introduction – Side Channel Attacks • An attack based on some side-effect of the system • Side channels include • Power usage • Encryption computation time [Koc] • Keyboard acoustics [Aso] • Electromagnetic emanations • Inter-keystroke timings
Introduction – Keystroke Timing Attacks An attack based on timing data only More difficult than audio or EM emanation attacks
Introduction – Terminology • Inter-keystroke latency • Time between down events (some papers deal with time between one key’s up event and another key’s down event) • Keystroke Duration • Length of time a key is held down • Digraph • A pair of keystrokes
Introduction – Terminology T H E Space Down Up Down Up Down Up Down Duration Inter-keystroke Latency
Introduction – Terminology • Fixed Text • Applies when the text is “constrained in some way.” • Free Text • Applies “when users are free to type whatever they want and keystroke analysis is performed on the available information.”
Background • 2001 Attack by Dawn Song, David Wagner, and XuQing Tian Timing Analysis of Keystrokes and Timing Attacks on SSH • Targeted SSH passwords • Extracted 1 bits of information per character pair (estimated 1.2 optimal gain) • Analyzed digraphs (pairs) only • Laboratory data set (one pair at a time)
Background • 2009 Attack by Zhang and Wang • Targeted information leak on multi-user systems (e.g., Unix) based on time taken by system calls • Different vector, same kind of attack • Improved Song et al.’s attack • Laboratory data set (one word at a time)
Background • 2009 Attack by Zhang and Wang • Spaces were distinct; used to identify word breaks in the attack • Space-to-letter transitions were, on average, much greater than letter-to-letter transitions • Took advantage of English language patterns • Letter probabilities factored into algorithm • Dictionary used to spellcheck guesses
Background – Authentication Timing Analysis also used as a behavioral biometric for authentication, identification Greater body of research Some research explores fixed text, some free text
Background - Thesis Goal • Authentication/identification through keystroke timing analysis • Fixed text and free text • Keystroke timing attacks • Fixed text only • Thesis goal: Explore keystroke timing attacks with free text data
Background – Data Set • Many data sets utilized in the literature • Quality issues • Privacy issues
Background – Data Set • Gunetti-Picardi data set [Gun] • First free text data set • Terence Sim, Janakiraman [Sim, Jan] • Very large data set • All real-world data • Unavailable due to privacy concerns • Hempstalk sm-150 data set [Hem] • Only available real world free text data set • 10 users • 15 real-world emails each
Research - Goals • Goal: Explore keystroke timing attacks with free text data • Compare free/fixed text data • Check normality of distribution • Attempt to distinguish word breaks • Zhang and Wang • Observe impact of increased noise
Research – Distribution Analysis Song et al. and Zhang and Wang found normal timing distributions Research Goal: See if this holds for free text
Research – Distribution Analysis Results: Distributions are mostly normal
Research – Distribution Analysis Some less normal than others
Research – Distribution Analysis • Conclusions • Most distributions are largely normal • A small portion of values fall outside the normal distribution • The number of anomalous values depends on the key pair being observed • Need another model • Split analysis approach – consider higher-than-usual values separately from others
Research – Word Breaks Figure 6 from [Zha] • Goal: Reproduce Zhang and Wang found that word breaks were easily distinguishable
Research – Word Breaks First: Look at values in the normal range
Research – Word Breaks Results: Space-letter transitions largely indistinct
Research – Word Breaks Space-letter transitions in the normal distribution range are indistinct Perhaps the “anomalous” values hold more useful information
Research – Word Breaks Key pair frequencies by timing range: Participant A (Hempstalk data set) Space-letter transitions are more common in higher timing ranges
Research – Word Breaks Key pair frequencies in 2000+ ms range: This pattern holds for most users
Research – Word Breaks • Conclusions • Word breaks are not as easily identifiable in free text as in fixed text • May be due to less natural typing in Zhang and Wang’s fixed text experiment • A subset of word breaks are distinct (for most users) • ‘Word breaks are distinct only if timing distributions are distinct’
Research – Word Breaks • Recall Figure 6 from [Zha]: • Timing distributions are distinct
Research – Word Breaks • Word breaks in free text are only sometimes distinct: User A - Distinct User D – Indistinct
Research – Noise Impact • The previous experiment demonstrates that less controlled collection makes the attack more difficult • Final research goal: Observe impact of increased noise on the attack • Approach • Run word break experiment again with all free text “noise” included
Research – Noise Impact Space-letter pairs in the 2000+ range are distinct when considering only space and letter characters
Research – Noise Impact Space-letter pairs become much less distinct when considering all keypairs found in the data set
Research – Noise Impact • Conclusion: • The larger alphabet and less controlled collection in a free text environment both make character pairs harder to distinguish
Conclusion • Summary • This research considered, for the first time, keystroke timing attacks applied to free text data • Conclusions • Word breaks are not as consistently and easily identifiable in free text as in fixed text. • Free text key pair distributions are not truly normal • Increased noise in a real-world data set makes attacks more difficult
Conclusion – Future Work • Exploitation of free text specific data • This thesis demonstrated an attack specific to free text data – more such attacks could be sought out • E.g., backspace character seems to show anomalous behavior
Conclusion – Future Work Investigate usefulness of keystroke durations [Sim, p. 4] Investigate feasibility of user-classification without plaintext
Conclusion – Future Work • Wider context of digraphs • Previous publications [Sim, Jan] suggest that much more information can be gained from looking at more than two keys at a time • Analyze 3+ keys or even whole words at a time • Expand to sentence-level language-based analysis • Demonstrate theoretical limit on attack
Conclusion • Keystroke timing attacks • Potentially wide application • Known impact limited at this point • Future research goals • Expand attack or • Demonstrate upper bound on impact
References • [Koc] Paul Kocher. Cryptanalysis of Diffie-Hellman, RSA, DSS, and Other Cryptosystems Using Timing Attacks. 1996 • [Aso] Dmitri Asonov and Rakesh Angrawal. Keyboard Acoustic Emanations. 2004 • [Son] Dawn Xiaodong Song, David Wagner, Xuqing Tian. Timing Analysis of Keystrokes and Timing Attacks on SSH. 2001 • [Zha] Kehuan Zhang, XiaoFeng Wang. Peeping Tom in the Neighborhood: Keystroke Eavesdropping on Multi-User Systems. 2009 • [Hem] Kathryn Hempstalk. Continuous Typist Verification using Machine Learning. PhD Thesis, The University of Waikato, 2009 • [Sim] Terence Sim, RajkumarJanakiraman. Are Digraphs Good for Free-Text Keystroke Dynamics? 2007 • [Jan] RajkumarJanakiraman, Terence Sim. Keystroke Dynamics in a General Setting. 2007 • [Gun] Daniele Gunetti, Claudia Picardi. Keystroke Analysis of Free Text. 2005