200 likes | 484 Views
Cryptanalysis. Four kinds of attacks (recall) The objective: determine the key ( Herckhoff principle ) Assumption: English plaintext text Basic techniques: frequency analysis based on: Probabilities of occurrences of 26 letters Common digrams and trigrams.
E N D
Cryptanalysis • Four kinds of attacks (recall) • The objective: determine the key (Herckhoff principle) • Assumption: English plaintext text • Basic techniques: frequency analysis based on: • Probabilities of occurrences of 26 letters • Common digrams and trigrams.
Cryptanalysis -- statistical analysis • Probabilities of occurrences of 26 letters • E, having probability about 0.120 (12%) • T,A,O,I,N,S,H,R, each between 0.06 and 0.09 • D,L, each around 0.04 • C,U,M,W,F,G,Y,P,B, each between 0.015 and 0.028 • V,K,J,X,Q,Z, each less than 0.01 • See table 1.1, page 26 • 30 common digrams (in decreasing order): • TH, HE, IN, ER, AN, RE,… • 12 common trigrams (in decreasing order): • THE, ING,AND,HER,ERE,…
Cryptanalysis of Affine Cipher • Suppose a attacker got the following Affine cipher • FMXVEDKAPHFERBNDKRXRSREFNORUDSDKDVSHVUFEDKAPRKDLYEVLRHHRH • Cryptanalysis steps: • Compute the frequency of occurrences of letters • R: 8, D:7, E,H,K:5, F,S,V: 4 (see table 1.2, page 27) • Guess the letters, solve the equations, decrypt the cipher, judge correct or not. • First guess: Re, Dt, i.e., eK(4)=17, eK(19)=3 • Thus, 4a+b=17 a=6, b=19, since gcd (6,26)=2, so incorrect. 19a+b=3 • Next guess: Re, Et, the result will be a=13, not correct. • Guess again: Re, Ht, the result will be a=8, not correct again. • Guess again: Re, Kt, the result will be a=3, b=5. • K=(3,5), eK(x)=3x+5 mod 26, and dK(y)=9y-19 mod 26. • Decrypt the cipher: algorithmsarequitegeneraldefinitionsofarithmeticprocesses • If the decrypted text is not meaningful, try another guess. • Need programming: compute frequency and solve equations • Since Affine cipher has 12*26=312 keys, can write a program to try all keys.
Cryptanalysis of substitution cipher • Final goal is to find the corresponding plaintext letter for each ciphertext letter. • Ciphertext: example 1.11, page 28 • Steps: • Frequency computation, see table 1.3, page 29 • Guess Ze, quite sure • C,D,F,J,M,R,Y are t,a,o,i,n,s,h,r, but not exact • Look at digrams, especially –Z or Z-. • Since ZW occurs 4 times, but no WZ, so guess Wd (because ed is a common digram, but not de) • Continue to guess • Look at the trigrams, especially THE, ING, AND,…
Cryptanalysis of Vigenere cipher • In some sense, the cryptanalysis of Vigenere cipher is a systematic method and can be totally programmed. • Step 1: determine the length m of the keyword • Kasiski test and index of coincidence • Step 2: determine K=(k1,k2,…,km) • Determine each ki separately.
Kasiski test—determine keyword length m • Observation: two identical plaintext segments will be encrypted to the same ciphertext whenever they appear positions apart in plaintext, where 0 mod m. Vice Versa. • So search ciphertext for pairs of identical segments, record the distance between their starting positions, such as 1, 2,…, then m should divide all of i’s. i.e., m divides gcd of all i’s.
25 Index of coincidence • Can be used to determine m as well as to confirm m, determined by Kasiskitest • Definition: suppose x=x1x2,…,xn is a string of length n. The index of coincidence of x, denoted by Ic(x), is defined to be the probability that two random elements of x are identical. • Denoted the frequencies of A,B,…,Z in x by f0,f1,…,f25 25 fi 2 ( ) fi(fi-1) i=0 i=0 = ( Formula IC) --Ic(x)= n 2 n(n-1) ()
Index of coincidence (cont.) Suppose x is a string of English text, denote the expected probability of occurrences of A,B,…,Z by p0,p1,…,p25 with values from table 1.1, then Ic(x) pi2 =0.0822+0.0152+…+0.0012=0.065 (since the probability that two random elements both are A is p02, both are B is p12,…) Question: if y is a ciphertext obtained by shift cipher, what is the Ic(y)? Answer: should be 0.065, because the individual probabilities will be permuted, but the pi2 will be unchanged. Therefore, supposey=y1y2…yn is the ciphertext from Vigenere cipher. For any given m, divide yinto m substrings: y1=y1ym+1y2m+1… if m is indeed the keyword length, then y2=y2ym+2y2m+2… each yi is a shift cipher, Ic(yi) is about 0.065. … ym=ymy2my3m… otherwise, Ic(yi) 26(1/26)2 = 0.038.
Index of coincidence (cont.) For purpose of verify keyword length m, divide the ciphertext into m substrings, compute the index of coincidence by formula IC for each substring. If all IC values of the substrings are around 0.065, then m is the correct keyword length. Otherwise m is not the correct keyword length. If want to use Ic to determine correct keyword length m, what to do? Beginning from m=2,3, … until an m, for which all substrings have IC value around 0.065. Now, how to determine keyword K=(k1,k2,…,km)? Assume m is given.
f0f25 n′, , n′ f0+ki n' p0 Determine keyword K=(k1,k2,…,km) • Determine each ki (from yi) independently. • Observation: • 2.1 let f0,f1,…,f25 denote the frequencies of A,B,…,Z in yi and n′=n/m • 2.2 then probability distribution of 26 letters in yi is: 2.3 if the shift key is ki, then f0+ki (i.e., A+ki) is the frequency of a in the corresponding plaintext xi , …, f25+ki (note the subscript 25+ki should be computed by modulo 26) is the frequency of z in xi. Since xi is normal English text, probability distribution of f0+kif25+ki n′, , n′ should be “close to” ideal probability distribution p0,p1,…,p25. f25+ki n' So: p25 +…+ p0, …, p25 p02+…+p252 =0.065
Mg=i=025pi n′ f0+g n' p0 Determine keyword K=(k1,k2,…,km) (cont.) On the other hand, for any g !=ki, f25+g n' will not be close to 0.065. p25 +…+ Therefore, define: fi+g When g=ki, Mg will generally be around 0.065 (i.e., i=025pi2). Otherwise Mg will be quite smaller than 0.065. So let g from 0, until 25, compute Mg, and for some g, if Mg is around 0.065, then ki=g. Note: the subscript i+g should be seen as modulo 26.
Mg=i=025pi n′ Cryptanalysis of Vigenere cipher--example • Example 1.12, page 33. • Using Kasiski test to determine the keyword length • CHR appears five times at 1,166,236,276,286 • the distance is 165, 235,275,285, the gcd is 5, so m=5. • Using index of coincidence to verify m=5. • Divide ciphertext into y1, y2, y3, y4, y5 • Compute f0,f1,…,f25 for each yi and then Ic(yi), get 0.063, 0.068,0.069,0.061,0.072, so m=5 is correct. • Determine ki for i=1,…,5. • Compute Mg for g=0,1,…,25 and if Mg 0.065, then let ki=g. where fi+g As a result, k1=9,k2=0,k3=13,k4=4,k5=19, i.e., JANET
Cryptanalysis of Hill cipher • Difficult to break based on ciphertext only • Easily to break based on both ciphertext and plaintext. • Suppose given at least m distinct plaintext-ciphertext pairs: xj=(x1,j,x2,j,…,xm,j) yj=(y1,j,y2,j,…,ym,j) then define two matrices X=(xi,j) and Y=(yi,j) Let Y=XK, if X is invertible, then K=X-1Y.
Cryptanalysis of Hill cipher--example • Suppose plaintext is: friday and ciphertext is: PQCFKU and the m=2. Then eK(f,r)=(P,Q), eK(i,d)=(C,F). That is: 15 16 2 5 5 17 8 3 ( ) = ( )K 5 17 8 3 15 16 2 5 9 1 2 15 15 16 2 5 7 19 8 3 K=( )-1( )=( )( )=( ) Then using the third pair, i.e., (a,y) and (K,U) to verify K. In case m is unknown, try m=2,3, …
Cryptanalysis of LFSR stream cipher • Vulnerable to known-plaintext attack. • Suppose m, plaintext binary string x1,x2,…,xn and ciphertext binary string y1,y2,…,yn are known, as long as n>2m, the key can be broken: • Keystream is: zi=(xi+yi) mod 2. (i=1,2,…,n) • Thenthe initialization vector of K is z1,…, zm. • Next is to determine coefficients (c0,c1,…,cm-1) of K (recall that zi+m = m-1j=0cjzi+jmod 2 for all i1) i.e, z1 z2 … zm z2z3 … zm+1 …………… zmzm+1 … z2m-1 • (zm+1,zm+2,…,z2m)=(c0,c1,…,cm-1)
z1 z2 … zm z2z3 … zm+1 …………… zmzm+1 … z2m-1 • (c0,c1,…,cm-1)=(zm+1,zm+2,…,z2m) Cryptanalysis of LFSR stream cipher (cont.) Therefore: -1
Cryptanalysis of LFSR stream cipher --example Example 1.14, page 37. Suppose LFSR 5 with the following: Ciphertext string: 101101011110010 Plaintext string: 011001111111000 Then keystream: 110100100001010 Therefore initialization vector is: 11010. For next five key elements: 01000, set up equation for coefficients (c0,c1,c2,c3,c4) and solve it. The result is: (c0,c1,c2,c3,c4) =(1,0,0,1,0) i.e., zi+5=(zi+zi+3) mod 2.