330 likes | 619 Views
Cryptanalysis. The general goal of cryptanalysis is to find the key being used in an encryption, or at the very least, the decryption function Recall that there are four possible starting points for a cryptanalysis Ciphertext only : we only have the ciphertext string y of a message
E N D
Cryptanalysis • The general goal of cryptanalysis is to find the key being used in an encryption, or at the very least, the decryption function • Recall that there are four possible starting points for a cryptanalysis • Ciphertext only: we only have the ciphertext string y of a message • Known plaintext: we have a string x of plaintext and the corresponding ciphertext string y. • Chosen plaintext: we have temporary access to the encryption machine and can choose a plaintext string x and obtain the corresponding ciphertext string y • Chosen ciphertext: we have temporary access to the decryption machine and can choose a ciphertext string y and obtain the corresponding plaintext string x
Cryptanalysis of the Shift Cipher • Ciphertext only Exhaustive search is possible, since there are only 26 possible keys. If the ciphertext has at least 20 characters, it is extremely unlikely that more than one key produces a recognizable message. • Known plaintext All we need is the cipher y for one character x, because then the key K = int(y)-int(x) mod 26 • Chosen plaintext The encryption of ‘a’ is the key • Chosen ciphertext The decryption of ‘A’ is the negative of the key
Cryptanalysis of the Affine Cipher • Ciphertext only Exhaustive search is possible using a computer, since there are only 12*26 = 312 possible keys. If the message is sufficiently long, we can do a statistical analysis. More later. • Chosen plaintext If the key is (, ), then eK(‘a’) = 0 + = and eK(‘b’) = 1 + hence = eK(‘b’) - . • Chosen ciphertext Computing dK(‘A’) and dK(‘B’) we can find the coefficients for the decryption function by the method given for the chosen plaintext situation. We could then compute the encryption function, but why bother?
Cryptanalysis of the Affine Cipher • Known plaintext • With luck, we only need the encrypted form for two characters. • Even if we are not lucky, we will usually be able to reduce the number of possible keys. • Example: suppose we know that “if” maps to “PQ”, that is, eK(8) = 15 and eK(5) = 168 + = 15 and 5 + = 16 • Subtracting the second equation from the first we get 3 26 -1 26 25 • Since 3-1 = 9 mod 26, we then obtain 26 925 26 17 • Substituting the value of in the second equation, we get825 + = 15 = 9 • We have thus found the key to be (17,9)
Cryptanalysis of the Affine Cipher • Known plaintext (continued) • Example: suppose we know that “go” maps to “TH”, that is, eK(6) = 19 and eK(14) = 76 + = 19 and 14 + = 7 • Subtracting the first equation from the second we get 8 26 -12 26 14 • Problem: 8 does not have an inverse mod 26, since gcd(8,26) = 2 • There are two solutions: = 5 and = 18. But 18 is not a possible choice for since it is not relatively prime to 26. Thus we have 65 + = 19 = 15 • We have thus found the key to be (5,15) • A problem arises if the gcd is 13. In this case, we will need an additional character pair
Statistical Analysis Methods • Various people have estimated the relative frequencies of the 26 letters by compiling statistics from numerous novels, magazines and newspapers • The table to the right was obtained by Beker and Piper. • They partition the 26 letters into 5 groups 1) E, having probability 0.12 2) T, A, O, I, N, S, H, R each with 0.06 ≤ probability ≤ 0.09 3) D,L with probability 0.04 4) C, M, U, W, F, G, Y, P, B having 0.015 ≤ probability ≤ 0.028 5) V, K, J, X, Q, Z each having probability ≤ 0.01
Statistical Analysis Methods • It is also useful to consider sequences of two or three consecutive letters, called digrams and trigrams, respectively • The 30 most common digrams are, in decreasing frequency:TH, HR, IN, ER, AN, RRE, ED, ON, ES, ST, EN, AT, TO, NT, HA, ND, OU, EA, NG, AS, OR, TI, IS, ET, IT, AR, TE, SE, IE, OF • The 12 most common trigrams are:THE, ING, AND, HER, ERE, ENT, THA, NTH, WAS, ETH, FOR, DTH
Example: Statistical Analysis • We will use statistical methods to do a ciphertext only attack on an Affine Cipher • The intercepted ciphertext isFMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSHVUFEDKAPRKDLYEVLRHHRH • The frequency analysis of the above ciphertext is as follows: • There are only 57 characters in the ciphertext, but this is usually enough to cryptanalyze an Affine Cipher
Example: Statistical Analysis • FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSHVUFEDKAPRKDLYEVLRHHRH • The most frequent characters are: R (8), D (7), E, H, K (5) and F, S, V (4)
Example: Statistical Analysis • Most frequent characters: R (8), D (7), E, H, K (5) and F, S, V (4) • First guess: e R, t D (t is second most frequent letter) • Numerically: eK(4) = 17 and eK(19) = 3 • Thus 4 + = 17 and 19 + = 3 (mod 26) • Unique solution: = 6, = 19 (in Z26), illegal since gcd(6,26) = 2 • Next guess: e R, t E from which we get = 13, also illegal • Next guess: e R, t H, which gives = 8, also illegal • Next guess: e R, t K, which gives = 3, = 5, LEGAL! • Compute decryption function: dK(y) = 9y-19 and apply to ciphertext: • “algorithmsarequitegeneraldefinitionsofarithmeticprocesses” • Since it is unlikely that the wrong key would yield a meaningful text, we conclude that we have “broken” this encryption.
Cryptanalysis of the Substitution Cipher • Ciphertext obtained from a Substitution Cipher:YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZNZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR • Frequency analysis:
Cryptanalysis of the Substitution Cipher • Obvious guess: dK(Z) = e • Appearing at least 10 times: C, D, F, J, M, R, Y • Guess: dK(C,D,F,J,M,R,Y){t,a,o,I,n,s,h,r} --but frequencies no help • Digrams, especially of form –Z or Z– since we think Z represents e • Most common: DZ & ZW (4); NZ & ZU (3); RZ,HZ,XZ,FZ,ZR,ZV,ZC,ZD,ZJ (2)
Cryptanalysis of the Substitution Cipher Obvious guess: dK(Z) = e Guess: dK(C,D,F,J,M,R,Y){t,a,o,I,n,s,h,r} but frequencies no help • Digrams, especially of form –Z or Z– since we think Z represents e • Most common: DZ & ZW (4); NZ & ZU (3); RZ,HZ,XZ,FZ,ZR,ZV,ZC,ZD,ZJ (2) • Note ZW appears 4 times and WZ no times; also W has low frequency.Since “ed” occurs frequently in English and de does not, this suggests the guess: dK(W) = d • Since DZ (?e) occurs 4 times and ZD (e?) occurs twice, we suspect dK(D) {r,s,t} • Assumptions: dK(Z) = e, dK(W) = d • Note ZRW (e?d) and RZW (?ed) both occur near the beginning of the ciphertext and RW (?d) occurs again later. Since R has high frequency and nd is a common digram, guess dK(R) = n
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d dK(C,D,F,J,M,Y){t,a,o,i,s,h,r} • Partially deciphered text (assuming guesses):------end---------e----ned---e------------YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJ--------e----e---------n--d---en----e----eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ -e---n------n------ed---e---e—-ne-nd-e-e--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-----n-----------e----ed-------d---e--nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Next guess: dK(N) = h. Why? Because NZ (?e) is a common digram and ZN (e?) is not.
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h dK(C,D,F,J,M, Y){t,a,o,i,s,r} • ------end---------e----nedh--e------------YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJh-------e----e---------nh-d---en----e-hh-eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he---n------n------ed---e---e—-ne-ndhe-e--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-----nh---h------e----ed-------d--he--nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Next guess: segment ne-ndhe suggests that dK(C) = a
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h dK(C) = a, dK(D,F,J,M, Y){t,o,i,s,r} ------end-----a---e-a--nedh--e------a-----YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJh-------ea---e-a---a---nhad-a-en--a-e-h--eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he-a-n------n------ed---e---e—-neandhe-e--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-a---nh---ha---a-e----ed-----a-d--he--nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Next: consider second most frequent character, M The guess that RNMdecrypts to nh? suggests that h- begins a word Thus dK(M) should be a vowel, hence dK(M) { i, o } Since ai is a more frequent digram than ao, we guess dK(M) = i
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h dK(C) = a, dK(M) = i, dK(D,F,J,Y){t,o,s,r} -----iend-----a-i-e-a-inedhi-e------a---i-YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJh-----i-ea-i-e-a---a-i-nhad-a-en--a-e-hi-eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he-a-n-----in------ed---e---e—ineandhe-e--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-a--inh---hai--a-e-i--ed-----a-d--he--nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Next: what decrypts as o? Since o occurs frequently, we guess ek(o) {D,F,J,Y} Y most likely; if not, long strings of vowels (aoi from CFM or CJM)
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h, dK(C) = a, dK(M) = i, dK(Y) = o, dK(D,F,J){t,s,r} o----iend--o--a-i-e-a-inedhi-e------a---i-YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJh-----i-ea-i-e-a-o-a-ionhad-a-en--a-e-hi-eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he-a-n--oo-in---o--ed-o-e---e—ineandhe-e--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-a--inh---hai--a-e-i--ed---o-a-d--he--nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Three most frequent remaining characters are D, F and J This leads to the guess the decrypt to r, s, and t in some order Two occurrences of trigram NMD suggests dK(D) = s, giving the trigram his, consistent with the guess dK(D) {r, s, t}
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h, dK(C) = a, dK(M) = i, dK(Y) = o, dK(D) = s, dK(F,J){t,r} o----iend--o--a-ise-a-inedhise------ass-i-YIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJhs----iseasi-e-a-o-a-ionhad-a-en--a-e-hi-eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he-asn--oo-in---o--edso-e---e—ineandhese--NZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-a--inh-s-hai--a-e-i--ed---o-a-ds-hes-nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR The segment HNCMF could be chair, leading to dK(F) = r, DK(H) = c and dK(J) = t (the latter by elimination).
Cryptanalysis of the Substitution Cipher • Guesses so far: dK(Z) = e, dK(R) = n, dK(W) = d, dK(N) = h, dK(C) = a, dK(M) = i, dK(Y) = o, dK(D) = s, dK(F) = r, dK(J)= to-r-riend-ro--arise-a-inedhise--t---ass-itYIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJhs-r-riseasi-e-a-orationhadta-en--ace-hi-eNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ he-asnt-oo-in-i-o-redso-e-ore—ineandhesettNZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ -ed-ac-inhischair-aceti-ted--to-ardsthes-nXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR It is not too hard to now complete the key and obtain a complete decryption.
Cryptanalysis of the Substitution Cipher • ourfriendfromparisexaminedhisemptyglasswitYIFQFMZRWQFYVECFMDZPCVMRZWNMDZVEJBTXCDDUMJhsurpriseasifevaporationhadtakenplacewhileNDIFEFMDZCDMQZKCEYFCJMYRNCWJCSZREXCHZUNMXZ hewasntlookingipouredsomemorewineandhesettNZUCDRJXYYSMRTMEYIFZWDYVZVYFZUMRZCRWNZDZJJ ledbackinhischairfacetilteduptowardsthesunXZWGCHSMRNMDHNCMFQCHZJMXJZWIEJYUCFWDJNZDIR Our friend from paris examined his empty glass with surprise, as if evaporation had taken place while he wasn’t looking. I poured some more wine and he settled back in his chair, face tilted up towards the sun.
Cryptanalysis of the Vigenère Cipher • Direct statistical anaylsis is not effective in dealing with a Viginère cipher. • The reason is that the character frequencies are evened out • If we know the length of the key, we can apply statistical methods • For example, suppose we know the key length is 4 and let (k1, k2, k3, k4) denote the key. • We would expect the character frequencies in positions 1, 5, 9, …to resemble the standard distribution shifted by k1. • Thus we should be able recover the first shift amount k1. • Applying the same to the characters in positions 2,6,10, … we would hope to find k2, etc.
Finding the Key Length • There are several methods for getting the key length m • One method, due to Kasiski (and Babbage), is based on the observation that two identical segments of plaintext will produce the same ciphertext when their distance in the plaintext is divisible by the key length. • Kasiski test: find pairs of identical segments of length at least three and record the distance between their starting positions. • If we obtain several such distances, conjecture that m divides all these distances, hence divides their gcd. • Another method is as follows: for each possible shift amount s, compute the number of times the character in position i is the same as the character in position i+s (mod messageLength). • Then the shift with the highest count is a good guess for m. • Typically, you would restrict your search to shifts that are relatively small with respect to the message length.
Finding the Key Length • Further evidence for the value of m can be determined by the index of coincidence (Friedman, 1920). • Definition.Suppose x = x1x2xn is a string of n alphabetic characters. The index of coincidence of x, denoted Ic(x) is the probability that two randomly chosen elements of x are equal. • Let the frequencies in x of A,B,…,Z be f0, f1, …, f25. • We can choose two elements from x in ways and there are ways that the choice will be the ith character. Thus we have the formula
Finding the Key Length • Let p0, p1, … , p25 denote the probability that a given character in an arbitrary string y of letters is A,B,…,Z, respectively. • Then one would expect that Ic(y) = = 0.065 • Why? Because p02 is the probability that two randomly selected letters are A’s, p12 is the probability that two randomly selected letters are B’s, etc. • We would expect the same value if the string had been encyphered using a mono-alphabetic cipher.
Finding the Key Length • Suppose we have a ciphertext string y = y1y2…yn obtained from a Viginère cipher • For a given m, define string Y1, Y2, …, Ym as follows:Y1 = y1ym+1y2m+1Y2 = y2ym+2y2m+2 Ym = ymy2my3m • If m is indeed the key length, we would expect Ic(Yi) 0.065 • If m is not the key length, we would expect smaller values • Since a completely random string z has Ic(z) = 1/26 = 0.038, which is sufficiently far from 0.065, we will often be able to determine (or confirm) the key length by this method.
Finding the Key Length • Example Ciphertext obtained from a Viginère cipher:Kasiski test: CHR occurs in five places at positions 1, 166, 236, 276, 286 The distances from the first occurrence are 165, 235, 275, 285 Their gcd is 5, which is our guess for the key length • Check using index of coincidences: m = 1 0.045; m = 2 0.046, 0.041; m = 3 0.043, 0.050, 0.047; m = 4 0.042,0.039, 0.046,0.040;m = 5 0.063, 0.068, 0.069, 0.061, 0.072 • This is strong evidence that the key length is 5. CHREEVOAHMAERATBIAXXWTNXBEEOPHBSBQMQEQERBW RVXUOAKXAOSXXWEAHBWGJMMQMNKGRFVGXWTRZXWIAK LXFPSKAUTEMNDCMGTSXMXBTUIADNGMGPSRELXNJELX VRVPRTULHDNQWTWDTYGBPHXTFALJHASVBFXNGLLCHR ZBWELEKMSJIKNBHWRJGNMGJSGLXFEYPHAGNRBIEQJT AMRVLCRREMNDGLXRRIMGNSNRWCHRQHAEYEVTAQEBBI PEEWEVKAKOEWADREMXMYBHHCHRTKDNVRZCHRCLQOHP WQAIIWXNRMGWOIIFKEE
Finding the Key Length • Result of letter matching test: count[1] = 14 count[2] = 16 count[3] = 13 count[4] = 11 count[5] = 22 count[6] = 14 count[7] = 12 count[8] = 15 count[9] = 15 The key length is most likely 5, with count 22
Finding the Key • Now that we have the key length, it is time to find the key itself. • Choose i with 1 i m and let f0, f1, … , f25 denote the frequencies in Yiof the letters A, B, …, Z, respectively • Let n denote the length of Yi, so that n = n/m. • Then the probability distribution p of the 26 letters in Yi is given by • Now Yi is obtained by shift encryption of a subset of the plaintext elements using a shift ki • We would hope that the shifted probability distributionwould be close to the ideal distribution p0, p1, … , p25. • For each g with 0 g 25, define If g = ki, one would expect Mg 0.065 all subscripts mod 26
Finding the Key 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 9 0 13 4 19
Decrypting • We have now found the key to the Viginère encryption: (9,0,13,4,19) • Using this key, the decrypted file (with punctuation inserted) is The almond tree was in tentative blossom. The days were longer, often ending with magnificent evenings of corrugated pink skies. The hunting season was over, with hounds and guns put away for six months. The vineyards were busy again as the well-organized farmers treated their vines and the more lackadaisical neighbors hurried to do the pruning they should have done in November.