460 likes | 474 Views
An introduction to cryptography and cryptanalysis, with a focus on XOR encryption schemes. Design and implement tools to crack XOR encryption, encrypt and decrypt text, and analyze different techniques. Develop a user-friendly interface.
E N D
Cracking the Code:Foundations of Cryptology A brief introduction to the underlying terms and concepts of cryptography and cryptanalysis Martina Weber
Project Definition & Requirements • Design and implement tools that allow you to quickly crack XOR-encryption schemes. • General Requirements: • XOR-Encrypt a text using a key. • Given an encrypted message, produce the original message. • Analyze the “quality” of various techniques and solutions. • Create a Human Computer Interface for the system.
The Story Line Alice needs to send a classified message to Bob, however, she does not want her archrival, Eve, to know the confidential information. Therefore Alice and Bob agree they will disguise their message by employing an encryption scheme with an agreed upon key - but Eve is clever and devious...
Defining the Terms • Plain text - the text that Alice wishes to transmit to Bob, in its original form • Cipher text - the result of Alice encrypting the text with the key • Decrypt - reconstructing the plaintext using the cipher text and the key
The Conventions • To distinctly identify the original text from the encoded text, plaintext characters will be delimited in lower case and cipher text characters in upper case. • Generally, it is standard to omit all punctuation and spaces from the plaintext. This is done to eliminate analysis based on sentence structure and word length in the cipher text.
Eve’s Attack: Cryptanalysis At first, Eve is baffled, but then she realizes that Alice and Bob only know two encryption schemes. Better yet, Eve is confident in her abilities to crypt analyze these schemes and knows she will be able to crack the code.
Each plaintext character in a message is substituted with a unique alternate character to obtain the cipher text, thus any given letter of the alphabet is always enciphered by the same cipher text letter. The plaintext message is encoded with a keyword of length m. Thus, a character in the original text can be mapped to any of the characters in the keyword to produce the cipher text. The Encryption Schemes Polyalphabetic Substitution Cipher Monoalphabetic Substitution Cipher
A Closer Look at Monoalphabetic Substitution Ciphers • When a monoalphabetic substitution cipher is used, there is a one-to-one correspondence between the characters in the plaintext and the characters in the cipher text
A Simple Example Using a Monoalphabetic Substitution Cipher • The following is the key used: • Example using the key:Plaintext: thisiseasyCipher text: CQRBRBNJBH • To decrypt, simply look up the encrypted character in the table and use the plaintext character listed directly above
A Closer Look at Polyalphabetic Substitution Ciphers • When a polyalphabetic substitution cipher is used, there is NO one-to-one correspondence between the characters in the plaintext and the characters in the cipher text; a character could have been encoded using any of the m letters of the keyword.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 25 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Understanding Polyalphabetic Substitution Ciphers will Require a “New” Alphabet... • Instead of using alphabetic characters, the new notation will be using the numerical position (0 to 25) of a given letter For example, A = 0, B = 1, ..., Y = 24, Z = 25 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 25
= (x mod m) is evaluated as the remainder when dividing x by m Modular arithmetic ([x+y] mod m) is performed by first adding x and y and the reducing the result modulo m. Adding two numbers in the range 0 to m-1 will yield a number in the range 0 to m-1 Examples: ...And a Modest Mathematical Background in Modular Arithmetic 6 mod 3 = 0 5 mod 3 = 2 20 mod 7 = 6 10 mod 7 = 3 ~~~~~~~~~~~~~~~~~~~~~~ Let m = 26 (7+8) mod 26 = 15 (20 + 6) mod 26 = 0 (17 + 11) mod 26 = 2 (23 + 25) mod 26 = 22 + ÷
A Simple Example Using a Polyalphabetic Substitution Cipher • First take the text to be encoded and convert it character by character into the respective numerical equivalent. • Then choose a key to use on the text. Convert this key into its numerical representation as well. • Next, write the converted key above the converted plaintext, repeating it as necessary, and add the characters together character by character, modulo 26. • Finally, convert the encoded numerical text back to alphabetic text (if you so wish) Example continued...
...Example continued... A Simple Example Using a Polyalphabetic Substitution Cipher • Step One: Converting plaintext to numerical equivalent • Step Two: Converting key to numerical equivalent • Step Three: Adding the plaintext with the key, modulo 26 • Step Four: Converting cipher text to its alphabetic equivalent
...Example continued A Simple Example Using a Polyalphabetic Substitution Cipher • Thus, the plaintext “havingfunyet” is encoded with the key “yes” to the cipher text “FENGRYCYFWIL” • Had Alice actually sent this message to Bob, he would decode it using the inverse procedure: subtract the key from the cipher text mod 26 • Note that subtracting in modular 26 means adding the additive inverse of an element. The inverse of a number x can be found by taking 26 - x. The results of this can be seen in the “key” row of the above decryption.
Assumptions about the Language: The plaintext will be based on the English language When doing frequency analysis to determine the key used, I will assume the key is an actual word Assumptions about the Method Used: I will be doing analysis on the XOR polyalphabetic substitution cipher XOR encryption can be considered addition mod 26 as used previously in the example (i.e. A = 0, B = 1, ..., Z = 25) Initial Assumptions
A Note on the Method Using addition mod 26 (instead of converting letters to binary representations and doing XOR bit-by-bit) does not take away from the learning experience. This is because in this type of cryptanalysis, the algorithm analyzes a character at a time without regard to the actual character, noting only that it is a distinct character. The addition mod 26 will simply provide an easier medium for both myself and peers to understand and convey the information as we can talk about specific characters and not be concerned with abnormal or unprintable characters that would otherwise be obtained in the XOR encryption.
Exploiting the Weaknesses After Eve determines that Alice used a polyalphabetic cipher (after all, a monoalphabetic substitution cipher is too simple, even for Alice), she remembers the strategy for cracking the code: find the key length and then use a frequency analysis to determine either the plaintext or the key used for encryption.
Applicable Theories and Terms • Kasiski Test: In a polyalphabetic cipher text message, two identical segments of plaintext will be encrypted to the same cipher text whenever their occurrence in the plaintext is a multiple of the length of the keyword; therefore if a string of characters appears repeatedly in the cipher message, it is possible that the distance between the occurrences is a multiple of the length of the keyword • Friedman Test: Used to determine whether a cipher text has been enciphered using a monoalphabetic or polyalphabetic substitution cipher. If the cipher used is polyalphabetic, the text also suggests the length of the keyword using the Index of Coincidence Continued...
...Continued Applicable Theories and Terms • Index of Coincidence: The probability of two letters randomly selected from a text being equal • The expected frequencies of the letters A through Z in the English language are known. Using these probabilities, the index of coincidence for the language is approximately 6.5%. Hence, if two letters are arbitrarily chosen from an English text, nearly 6.5% of the time the letters would be the same. • In a purely random text, the letters would occur with roughly the same frequency, resulting in the index of coincidence being about 3.8%.
25 Σ fi*(fi - 1) i = 0 Conventions and Abbreviations Employed • n = the length of the cipher text being crypt analyzed • IC = Index of Coincidence, as discussed previously and represented by the following formula IC = n*(n - 1) Where fi represents the frequency of the respective alphabetic character in the cipher text
Let the Code Breaking Begin... Armed with this bank of knowledge, Eve can proceed to crypt analyze Alice’s message to Bob. What are the methods she can use and how effective are the various techniques? What is the best approach?
And Now Onto the Fun Part... Applying the theories and principles!
Determining the Key Length • I employed four distinct, yet related algorithms for finding the key length. These algorithms are outlined on the following slides. • Note: These algorithms can stand alone, however, for increased accuracy, they can be combined
0.027*n (n-1)*IC - 0.038*n + 0.065 (Formula taken from Cryptology by Albrecht Beutelspacher, page 39.) Algorithm One: “Plug it in” • Simply plug data into the following formula:Key Length = Where n is the length of the text and IC is the Index of Coincidence for a specific text
(Algorithm taken from Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington, page 19.) Algorithm Two: Shift and Count 1. Make a duplicate copy of the cipher text. 2. Align the copy under the original, only shifted by x places. 3. Record x and the number of coincidences. (i.e. where the letters match) 4. Increase x and go to step two. 5. The shift with the most coincidences is a likely guess for the key length.
(Algorithm adapted from Cryptological Mathematics by Robert Edward Lewand, pages 90 - 92.) Algorithm Three: Friedman Test 1. for m = 1 to n 2. Fill ROWS of rectangular array with dimensions m x (n/m) with consecutive substrings from the cipher text of length m. 3. Compute the IC of each COLUMN. 4. Find the average of all the column IC’s. 5. If the average IC is approx 0.065, break and m is the likely keyword length. Else continue loop.
(Algorithm adapted from Cryptological Mathematics by Robert Edward Lewand, pages 90 - 92, and Cryptography Theory and Practice by Douglas R. Stinson, page 31.) Algorithm Four: Kasiski Test 1. Determine repeating strings of characters in the cipher text (of length at least three). 2. Tabulate the distances between occurrences. 3. The probable key length is a divisor of the greatest common divisor (GCD) of all the distances.
Theory Behind the Kasiski Test • If a string of characters is repeated in a plaintext message at a distance apart which is equal to a multiple of the length of the keyword, then the cipher text representations of these characters will be identical in each occurrence
And the Winner is... • The most accurate is the Friedman Test, also the slowest algorithm • The Shift and Count algorithm is very accurate as well, taking less time than the Friedman Test • The “Plug it in!” algorithm runs the fastest, but is only accurate on small keys • The Kasiski Test almost always results in output of the correct key length or a multiple thereof, but how many possible lengths must the user try before finding the correct one?
Determining the Plain Text/Key • I used three distinct, yet related algorithms for finding the plain text/key. These algorithms are outlined on the following slides. • These algorithms all require the key length as input, by knowing the key length, the cipher text can be split into rows of that length. Looking down a column, all letters are encrypted by the same key letter - resulting in a Monoalphabetic Substitution cipher!
(Algorithm taken from Beutelspacher and Lewand.) Algorithm One: Basic Frequency Analysis 1. Split text into rows of the same length as the key. 2. For each column, determine the frequencies of each letter. 3. Compare to expected English frequencies (these values are known and tabulated) and "guess" at encryption. 4. Repeat process on next column.
(Algorithm taken from Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington, pages 22 - 23.) Algorithm Two: Permute through All Shifts 1. Split text into rows of the same length as the key. 2. For each column, determine the frequencies of each letter. 3. Take the dot product of the column frequencies with the every possible shift of the standard English alphabet frequencies. 4. The largest value is the most likely shift. 5. Repeat the process on the next column.
(Algorithm taken from Stinson pages 33 - 36.) Algorithm Three: Find Relative Shifts between Key Letters 1. Split text into rows of the same length as the key. 2. For each column, determine the frequencies of each letter. 3. Find all MIc of each column with every other column. 4. Search for the MIc's closest to .065, this yields the relative shift from column i to column j. 5. Form a system of equations and solve in terms of one key letter. 6. The keyword is a cyclic shift of the result. Continued...
...Continued Σ fi*h(i- g) i = 0 Algorithm Three: Find Relative Shifts between Key Letters • MI(c) is represented by the equation on the right, where n and m are the lengths of substrings f and h, fi is the frequency of letter i, and h i - g is the frequency of letter i - g where 0 <= g <= 25. MIC (f, hg) = 25 n*m
And the Winner is... • Permute through All Shifts Algorithm, logical winner since all possibilities are attempted • The Basic Frequency Analysis works okay for small key lengths • What about the Relative Shifts Algorithm? • I need far more computing power (or patience) to test this algorithm. • Yields accurate results when the matrix can be solved
Down the Road: Unaddressed Issues and Enhancements to Implement • When the key length is equal to the plaintext length and the key is perfectly random, this XOR encryption method is considered perfectly secure. But, does key length really have to equal the plaintext length for the encryption to be secure; where exactly is the critical point? • What if a random key is used instead of an actual word? How will this effect the frequency analysis to determine the key?
...Continued Down the Road: Unaddressed Issues and Enhancements to Implement • I used a cipher text only attack (the only available resource to analyze is the encrypted cipher text). Consideration should be given to various types of attacks, such as cribbing (knowledge that a certain word(s) appears in the plaintext) andtaking advantage of multiple cipher texts in which the same key was used (additional information is gained under these circumstances because you KNOW the keys are overlapping starting at the beginning of the cipher text - however, how do you determine initially that the same key was used?).
...Continued Down the Road: Unaddressed Issues and Enhancements to Implement • My final code requires “slimming down” to increase efficiency. • A spell checker/dictionary could be added to increase accuracy • Instead of giving the user all cyclic shifts of the key word on the Find Relative Shifts between Key Letters Algorithm, only give the user actual words • When using the other two algorithms, a spelling-auto-corrector would improve accuracy
...Continued Down the Road: Unaddressed Issues and Enhancements to Implement • In the first two find plain text algorithms, allow the user to select specific letters in the keyword or plaintext to change and display the effect of these changes. • The key length algorithm that attempts to compute the GCD could be altered to throw out “bad” data • i.e. find the number(s) that are preventing a common GCD and ignore those numbers
...Continued Down the Road: Unaddressed Issues and Enhancements to Implement • Combine the various algorithms so they can share the results and base results off of one another. • Finally, how about considering a new method of encryption?
Strategies & Knowledge • Research, research, research! • Understand everything you read, even how the author got from one step to the next • Trial and error, but try it. • Do an example first - ON PAPER (but make sure you do your math right) • No single part of the project was difficult to code, but implementation required an in-depth understanding of the problem
Start EARLY! It goes by FAST. It is almost impossible to stay on target with your first schedule, second schedule, third schedule... Lofty aspirations at the beginning, but reality will hit ASK QUESTIONS! Different professors have different “specialty” areas, take advantage of it Your classmates can provide great insight Don’t re-invent the wheel, check out other solutions first Advice to Next Year’s Seniors