480 likes | 611 Views
Classic Cryptology. Modified version by Dr M. Sakalli Marmara University.
E N D
Classic Cryptology Modified version by Dr M. Sakalli Marmara University
William F. Friedman defines a Cipher message as the one by applying a method of cryptography to the individual letters of the plain text either singly or in groups -- to distribute each letter characteristics to the entirety of the cipher text--. • “Human ingenuity cannot concoct a cipher that human ingenuity cannot resolve”, Edgar Allan Poe, amateur cryptographer.. • Today a similar word finding analogies are employed to analyze genetic motifs called “consensus strings”. Finding words was also posed by Edgar Allan Poe (1809–1849) in his Gold Bugstory
The Gold Bug Problem Cipher message: 53++!305))6*;4826)4+.)4+);806*;48!8`60))85;]8*:+*8!83(88)5*!; 46(;88*96*?;8)*+(;485);5*!2:*+(;4956*2(5*-4)8`8*; 4069285);)6 !8)4++;1(+9;48081;8:8+1;48!85;4)485!528806*81(+9;48;(88;4(+?3 4;48)4+;161;:188;+?; • Decipher the message • Additional hints provided: • The original message is in English, • Each symbol corresponds to one letter. • Punctuation marks are excluded, • Having an idea of the subject would be plus plus..
Naive approach: Frequency of the symbols • Find the frequency count of each symbol • Compare their frequencies with the relative frequencies of the ordinary English, and matching frequency patterns of the letters. • The letter counts of the message fostering bug • From most frequent to the least: e t a o i n s r h l d c u m f p g w y b v k x j q z
The decoding result in vain.. • After substituting the letters.. sfiilfcsoorntaeuroaikoaiotecrntaeleyrcooestvenpinelefheeosnlt arhteenmrnwteonihtaesotsnlupnihtamsrnuhsnbaoeyentacrmuesotorl eoaiitdhimtaecedtepeidtaelestaoaeslsueecrnedhimtaetheetahiwfa taeoaitdrdtpdeetiwt • A better approach: • Examine frequencies of l-tuples, combinations of 2 symbols, 3 symbols, etc. • “The” is the first bug which is the most common 3-tuple in English and in cipher text this is “;48” • Make inferences of unknown symbols by examining other frequent l-tuples if possible… “To, but..”
Mapping “the” to “;48” and substituting all occurrences of the symbols: 53++!305))6*the26)h+.)h+)te06*the!e`60))e5t]e*:+*e!e3(ee)5*!t h6(tee*96*?te)*+(the5)t5*!2:*+(th956*2(5*h)e`e*th0692e5)t)6!e )h++t1(+9the0e1te:e+1the!e5th)he5!52ee06*e1(+9thet(eeth(+?3hthe)h+t161t:1eet+?t • “thet(ee” most likely means “the tree” • Infer “(“ = “r” • “th(+?3h” becomes “thr+?3h” • Can we guess “+” and “?”?
The Solution and the required knowledge • After figuring out all the mappings: AGOODGLASSINTHEBISHOPSHOSTELINTHEDEVILSSEATWENYONEDEGRE ESANDTHIRTEENMINUTESNORTHEASTANDBYNORTHMAINBRANCHSEVENT HLIMBEASTSIDESHOOTFROMTHELEFTEYEOFTHEDEATHSHEADABEELINE FROMTHETREETHROUGHTHESHOTFIFTYFEETOUT • After punctuations inserted: A GOOD GLASS IN THE BISHOP’S HOSTEL IN THE DEVIL’S SEA, TWENY ONE DEGREES AND THIRTEEN MINUTES NORTHEAST AND BY NORTH, MAIN BRANCH SEVENTH LIMB, EAST SIDE, SHOOT FROM THE LEFT EYE OF THE DEATH’S HEAD A BEE LINE FROM THE TREE THROUGH THE SHOT, FIFTY FEET OUT. • Prerequisites to solve the problem: • Need to know the relative frequencies of single letters, and combinations of two and three letters in English • Knowledge of all the words in the English dictionary helps to make accurate inferences..
Knowledge of the grammar and the idioms and the common words will make the deciphering simple. • Revealing motifs of the nucleotide sequences and the regularities that could be explored. The language of genetics, in fact there is a grammar of genetics.. Challenge is not just only a small fraction of sequences encode for motifs; but the size of data to be dealt is enormous. • The patterns revealed with no mutations: cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaatctatgcgtttccaaccatagtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgcaaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaattttagcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtacgtatacactgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaacgtacgtc
Definitions of five components: Finite alphabets, X, K, Y, and, Enc, Dec • X = [x1, x2, ..., xm], is the plaintext in which each xmis a member of a finite alphabet. • Y = [y1, y2, ..., yn]= E(X, K), Encryption. Y is the cipher text, where K = [k1, k2, ..., kj] are the set of keys, and each y is the member of finite cipher set. • K (Key) is the hidden part and provided to the recipientend through a safe channel but adversaries must not be able to figure it out partially or completely..
Functional description • X = D(Y, K).. Decryption • The condition, E(X, K) must be reversible. X= D(E(X, K), K) • (X, K, Enc and Dec) form an Encryption if for all x in X, and k in K, x= D(E(x, K), K). If E is randomized then this equation should hold with probability 1 over the random choices made by Encryption. • Success depends to the strength of the key. There should be no any clue accommodated in sequence Y so that any attempt will thwart the adversaries.
The ciphered message, and the methods of encryption and decryption are all open to the public; but what is not is/are the key/keys.. 1883, Kerckhoffs. • Triple DES, 168 key length, 2168, =~ 3.7*1050 keys.. And if alphabet is unknown, and is zipped.
Security! • Perfect security: • H(x)=H(X|Y) for adversary, H(X|YK)=0 for the recipient side.. • Unconditional security • no matter how powerful computer is, the cipher cannot be broken since the ciphertext provides insufficient information to uniquely determine the corresponding plaintext.. • Computational security (Applicable) • given limited computing resources (eg time needed for calculations is greater than age of universe), therefore cipher cannot be broken..
Perfect security • Suppose DM is some a priori distribution available on the space of possible messages M (for example military commands), and an adversary takes a guess g on what messages could be.. A priori probability of a successful guess is PrmDM{m=g}. Suppose adversary eavesdropping some cipher c sent through, and establishes a posteriori probability distribution on what the message could be, then the probability of having a correct guess conditioned on the c is PrmDm kK{m=g|E(m,K)=c}.. • Shannon’s definition of PS. An Encryption system satisfies perfect security with respect to distribution Dm on M, for every possible gM, and cC,if priori and posteriori probabilities are the same, if neither yield any clue about the other. (k is uniformly chosen from K) PrmDm kK{m=g|E(m,K)=c} = PrmDM{m=g}..
Shannon Secrecy • No matter what message is encrypted, the probability of getting a specific cipher is the same, which introduces the most ambiguity. • A scheme satisfies Shannon secrecy if for two m1 and m2M and for every ciphertext, c, PrkK{E(m1, k) = c} = PrkK{E(m2, k) = c}, k. is any but not the same key from the set of K, and employed just once. • Theorem: A cryptosystem is assumed to satisfy the Shannon Secrecy iff satisfies the perfect security. Proof:..
Classic Encryption Techniques • The encryption algorithms perform two processes on the plaintext: • Substitutions • Transformations • Substitution techniques map plaintext elements (characters, bits) into ciphertext elements. • Transposition techniques systematically transpose the positions of plaintext elements.
Classical Substitution Ciphers • earliest known substitution cipher by Julius Caesar replaces each letter by 3rd letter on • first attested use in military affairs • example: meet me after the toga party PHHW PH DIWHU WKH WRJD SDUWB • Cryptanalysis: Try every shift, (brute force search).
given ciphertext, just try all shifts of letters • For example should be easy to break this ciphertext "GCUA VQ DTGCM"
Affine ciphers • Defined over Zm • To remind (int)(char(x) - 'a'); (int)(char(x)- 'A'), • provides a range of 0-m, which is a kind of shift.. • The notation % represents modular process.. • The key is an ordered pair K = (a, b), where a, b in Zm and gcd(a, m)=1. Then encryption function Y = Ea,b(x) = (a* xi + b) %m • And decryption function X=D(y) = (a-1 *(yi-b)) %m If (a-1%m) does exist which means some numbers cannot be included.. .
Affine ciphers • For m=26, suppose a, b both are taken as 5, then a-1= 21.. • The odd numbers 1 to 25, except 13 are the possible values of a.. • Then the number of possible keys =12*26-1 = 311 • Caesar Cipher is a special case if affine cipher is set with a=1, b=3. • Need tow equations to break, • [c1, c2] =m[p1 p2]+bmod26, solve like linear equations. • But in fact affine ciphers are not linear,
A transformation is linear if T(x+y) = T(x)+T(y) and T(ax) = aT(x), • Affine encryption E(p)=ap+b modm E(p1+p2) = a(p1+p2)+b modm, not linear E(p1) + E(p2) = a(p1+p2)+2b modm • Attempts of consecutive encryption with another affine cipher will not bring additional complexity. K(m1, b1), K(m2, b2) K(m1m2, m2b1+b2), Give a proof!!. • The identity is the cipher with key (1, 0). • The inverse of the key, is (m-1, –m-1b)
r=a%m in C, or r=mod(a,m) in mlab • r=a%m, => a = r+nm, nI, r {0…m-1}, and congruency between a and b.. • a-n*m, is said “a is reduced to r by mod(m)”. • For negative numbers, -x, • (m +sign(x)*(abs(x) %m))%m, there should be a better solution, check this yourself please and find it out.. • If int y is multiplicative inverse of x in mod(m), then: xy == 1 mod m • For given m and x, multiplicative inverse exists iff m and x are relatively prime. gcd(a, m)=1. • Remember reducing with mod(m) at every point in the calculation, result will always be the same. Helps casting out all the mod m.
Finding Inverses, will be revisited • can extend Euclid’s algorithm: EXTENDED EUCLID(m, b) • (A1, A2, A3)=(1, 0, m); (B1, B2, B3)=(0, 1, b) 2. if B3 = 0 return A3 = gcd(m, b); no inverse 3. if B3 = 1 return B3 = gcd(m, b); B2 = b–1 mod m 4. Q = A3 div B3 5. (T1, T2, T3)=(A1 – Q B1, A2 – Q B2, A3 – Q B3) 6. (A1, A2, A3)=(B1, B2, B3) 7. (B1, B2, B3)=(T1, T2, T3) 8. goto 2
Monoalphabetic Cipher • One alphabet mapping from PT to CT. • Rather than just shifting the alphabet, • any permutation of the 26 alphabetic characters could be set as a key sequence, (shuffle the letters arbitrarily, each PT letter mapping to a random CT letter). • Then the number of possible keys is 26! or greater than 4 x 1026, in the 10 orders of magnitude greater than the key space for DES.. and would seem to eliminate brute-force techniques for cryptanalysis. • However, detecting the nature of the text: (if noncompressed English text), then the ir/regularities of the language is exploited.
Redundancy in spoken language & Cryptanalysis • eg Vowels removed, “h m gd s m shphrd shll nt wnt“, ie written Hebrew has no vowels. • In English Emost common • then T,R,N,I,O,A,S • other letters are fairly rare; Z,J,K,Q,X • have tables of single, double & triple letter frequencies, digrams, trigrams
Letter-frequency in English can be broken into 5 groups: • e (most common); • t, a, o, i, n, s, h, r; • d, l; • c, u, m, w, f, g, y, p, b; • v, k, j, x, q, z (least common) • Common digrams and trigrams (in decreasing order) • th, he, in, er, an, re, ed, on, es, st, en, at, • to, nt, ha, nd, ou, ea, ng, as, or, ti, is, et, • it, ar, te, se, hi, of • the, ing, and, her, ere, ent, tha, nth, was, • eth, for, dth
Use in Cryptanalysis • discovered by Arabian scientist (Abu al-Kindi), 9th century, "A Manuscript on Deciphering Cryptographic Messages” • calculate letter frequencies for ciphertext • compare counts/plots against known values • look for common peaks/troughs • peaks at: A-E-I triple, NO pair, RST triple • troughs at: JK, X-Z • key concept - monoalphabetic substitution ciphers do not change relative letter frequencies • for monoalphabetic must identify each letter • tables of common double/triple letters help (Lanaki)
Example Cryptanalysis, with the frequency count • UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ • count relative letter frequencies, • guess P & Z are e and t • (P 13.33, Z 11.67, S 8.33, U 8.33, O 7.50 M 6.67, ….. C,K,L,N,R 0.00) • guess ZW is th and hence ZWP is the • proceeding with trial and error finally get: it was disclosed yesterday that several informal but direct contacts have been made with political representatives of the vietcong in moscow
If the message is long enough, frequency analysis of a cipher encrypted with monoalphabetic source will be successful. Since the simple substitution with Monoalphabetic ciphers will not change the original frequency characteristics of the message. • Using multiple substitutes, known as homophones, used in rotation, or randomly. If the number of symbols assigned to each letter is proportional to the relative frequency of that letter, then the single-letter frequency information will be completely obliterated. • Gauss devised an unbreakable cipher using homophones. But, multiple-letter patterns (e.g., digram frequencies) still there in the ciphertext, making it vulnerable. • Two principal methods: to reduce the visibility of the structure in the ciphertext: • To encrypt multiple letters of plaintext within same alphabet as mentioned, • and the other is to use multiple cipher alphabets. We briefly examine each.
Playfair Cipher • Playfair Cipher, by Charles Wheatstone in 1854, but named on his friend Baron Playfair. • Encrypting one letter with multiple symbols, • a 5X5 matrix of letters based on a keyword (Matrix size can be different, filled with xxx) • Start with the keyword (without duplicating letters) • And fill the rest of matrix with the remaining letters For example. Keyword MONARCHY, I/J.. MONAR CHYBD EFGIK LPQST UVWXZ • Sayer's book "Have His Carcase", Lord Peter Wimsey solves and describes a probably word attack.. • The grouping into five characters is just a telegraphic convention and has nothing to do with actual word lengths.
Playfair • M O N A R • C H Y B D • E F G I K • L P Q S T • U V W X Z • Rules: encrypts two letters at a time in rectangular fashion: • if a pair is a repeated letter, insert a filler like 'X', eg. "balloon" encrypts as "ba lx lo on" • Replace the PT character with the other corner at the same row. eg. “hs" encrypts to “bp",and “EA" to "IM" or "JM" (as desired) • if both letters in the same row/column, encipher right/below and decipher left/above., eg. “ar” encrypts as "RM“, “mu" encrypts to "CM" • SECURITY OF CRYPTOGRAPHIC SYSTEMS, NAVYFM 34-40-2, Chapter6, Chapter7
Identification of individual digrams 676 is more difficult, and the relative frequencies of individual letters exhibitibing a much greater range than that of digrams, making frequency analysis much more difficult. • It was considered unbreakable and used by the British Army in WWI and by the U.S. Army during WWII. By Germans!! • But Playfair leaves much of the structure of the plaintext language intact. A few hundred letters of ciphertext may be generally sufficient to break. (W. Stallings)
Hill cipher • A multiletter cipher, developed by Lester Hill in 1929. Determined by n linear equations. • (a = 0, b = 1 ... z = 25). • For n = 3, the system can be described as follows: C1 k11 k12 k13 P1 • C2 = k21 k22 k23 P2 (mod 26) = KPmod(26) C3 k31 k32 k33 P3 • Decryption requires KK-1 mod(m)=I. • K=[17 17 5; 21 18 21; 2 2 14]; K-1=[4 9 15; 15 17 6; 24 0 17]; • KK-1 mod(26)= [443 442 442; 858 495 780; 494 52 395] mod(28)=[I] • As with Playfair, the Hill cipher completely hides single-letter frequencies. • The use of a larger matrix hides, thus a 3 x 3 Hill cipher hides not only single-letter but also two-letter frequency information. • Strong against a ciphertext-only attack, easily broken with a known plaintext attack. • Any block size possible, but difficult to find good keys of large blocks.
Linearity!!.. Therefore completely vulnerable to known plaintext attack. • Diffusion due to the matrix multiplication when combined with non-linear operation.. • The upper bound of the key (invertible matrices) numbers n2lg(26)=4.7n2, keys.. • 262 (1-1/2)(1-1/22)..(1-1/2n)(1-1/13)(1-1/132)..(1-1/13n).. For n=5, this is 114, wikipedia. • Chinese remainder theorem.
Polyalphabetic Ciphers, Vigenère Cipher • another approach to improving security is to use multi alphabetic substitution ciphers • makes cryptanalysis harder with more alphabets to guess and flatter frequency distribution • using a key to select which alphabet to be chosen and periodically revolves it along the message • deceptive deceptive deceptive • wearedisc overedsav eyourself • zicvtwqng rzgvtwavz hcqyglmgz • have multiple ciphertext letters for each plaintext letter hence letter frequencies are obscured but not totally lost • start with letter frequencies • see if look monoalphabetic or not • if not, then need to determine number of alphabets, since then can attack each.
Φ test-roughness of a frequency count. • SECURITY OF CRYPTOGRAPHIC SYSTEMS, NAVYFM 34-40-2, Chapter2, A measure to indicate roughness of the distribution, • Based on the coincidence probabilities of occurrences.. • Comparisons, normalized to the total number of letters 26, Φr = 0.0385 N (N – 1). N is the length of the text. • In reality the distribution is not flat, so some of them are not as frequent as 0.0385, which means building the others build up hills.. Then the expected value for plaintext coincidences Φp = 0.0667 N (N – 1).. • Total number of coincidences from indvl letters, Φ observed.. Φo = ΦA + ΦB + …+ ΦZ = Σf(f-1) And the index of coincidences for phi test ΔIC= Φo/Φr • If the results close to the expected value then the same roughness of the plaintext frequency is expected to appear which might be considered as an evidence of a simple substitution system employed for enciphering. • In chapter 2 says, in plain text with 50 to 200 of letters, the ΔIC will usually falls between 1.50 and 2.00. Obviously will vary for shorter text, and longer text will be consistently closer to 1.73. For random text ΔIC (polyalphabetic systems) should be close to 1.00.
Digraphic Φ test. • SECURITY OF CRYPTOGRAPHIC SYSTEMS, NAVYFM 34-40-2, Chapter6, How to break into a digraphic count, starting from the first or the second.. The usual expectation is AB, CD… but first one may be skipped as null letter.. A, BC, DE, … or another way. combine two methods.. AB, BC, CD… • The probability of coincidence for 262 comparisons, 0.0015 (uniform), when counted in plaintext the expected value is 0.0069, thus.. Φ2r = 0.0015 N (N – 1), Φ2p= 0.0069 N (N – 1), Φ 2o = Σf(f-1) And the index of coincidences for digraphic phi test ΔIC2p= Φo/Φr
Kasiski Attack • method developed by Babbage / Kasiski • repetitions in ciphertext give clues to period • so find same plaintext an exact period apart • which results in the same ciphertext • of course, could also be random fluke • eg repeated “VTW” in previous example • suggests size of 3 or 9 • then attack each monoalphabetic cipher individually using same techniques as before
Vernam Cipher (1918), OTP • Gilbert Vernam 1918, • Ci = Pi (XOR) Ki Remember xor is lossless.. • Pi = Ci (XOR) Ki • The higher the randomness and the longer key, the less predictable any salient future. • if a truly random key as long as the message is used, the cipher will be secure, and must be used just once.. Therefore called a One-Time pad • have problem of safe distribution of key • A practical problem: Producing large quantities of random keys. • A daunting task is the key distribution.
One-time pad, Vernam cipher, Gilbert Vernam c. 1917 xi =D(yi) = (E(xi) - ki)mod(m). 4 (E) 16 (Q) 13 (N) 21 (V) 25 (Z) ciphertext - 23 (X) 12 (M) 2 (C) 10 (K) 11 (L) key = -19 4 11 11 14 ciphertext - key = 7 (H) 4 (E) 11 (L) 11 (L) 14 (O) (ciphertext - key) (mod 26) >> plaintext http://en.wikipedia.org/wiki/One-time_pad
One-time pad, Vernam cipher, Gilbert Vernam c. 1917 yi =E(xi) = (xi + ki)mod(m) where y is a random sequence. 7 (H) 4 (E) 11 (L) 11 (L) 14 (O) message + 23 (X) 12 (M) 2 (C) 10 (K) 11 (L) key = 30 16 13 21 25 message + key = 4 (E) 16 (Q) 13 (N) 21 (V) 25 (Z) (message + key) (mod (26) >> ciphertext http://en.wikipedia.org/wiki/One-time_pad
Transposition Ciphers • now consider classical transposition or permutation ciphers • these hide the message by rearranging the letter order • without altering the actual letters used • can be recognized since has the same frequency distribution as the original text.. • RAILFENCE cipher: Write the message diagonally (or column wise) over a number of rows, then read off cipher row by row • eg. write message out as: m e m a t r h t g p r y e t e f e t e o a a t • giving ciphertext MEMATRHTGPRYETEFETEOAAT • Subsequent transpositions will improve the diffusion.
Key: 4 3 1 2 5 6 7 • Plaintext: a t t a c k p o s t p o n e d u n t i l t w o a m x y z • Ciphertext: t t n a a p t m t s u o a o d w c o i x k n l y p e t z.. Using the same key and repeating tranpostion.. t t n a a p t m t s u o a o d w c o i x k n l y p e t z • Output: NSCYAUOPTTWLTMDNAOIEPAXTTOKZ 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 03 10 17 24 04 11 18 25 02 09 16 23 01 08 15 22 05 12 19 26 06 13 20 27 07 14 21 28 17 09 05 27 24 16 12 07 10 02 22 20 03 25 15 13 04 23 19 14 11 01 26 21 18 08 06 28
Rotor Machines • before modern ciphers, rotor machines were most common product cipher • were widely used in WW2 • German Enigma, Allied Hagelin, Japanese Purple • implemented a very complex, varying substitution cipher • used a series of cylinders, each giving one substitution, which rotated and changed after each letter was encrypted • with 3 cylinders have 263=17576 alphabets
Product Ciphers • ciphers using only substitutions or only transpositions are not secure because of language characteristics • hence consider using several ciphers in succession to make harder, but: • two substitutions make a more complex substitution if nonlinearity is introduced. • two transpositions make more complex transposition • but a substitution followed by a transposition makes a new much harder cipher • this is the bridge from classical to modern ciphers
Vulnerabilities and type of attacks • The two types of attack on an encryption algorithm are cryptanalysis, based on properties of the encryption algorithm, and brute-force, which involves trying all possible keys. Types of Attacks • Cipherext only - goal, obtain plaintext, or key • Known plaintext (partially known plaintext, crib) - goal, obtain key • Chosen plaintext - goal, obtain key • Encryption key (with asymmetric cipher) - goal, obtain decryption key.