340 likes | 488 Views
Polyalphabetic CIPHERS. Linguistics 484. Summary. The idea How to recognize: index of coincidence How many alphabets: Kasiski. The idea. Remove the invariant that a plaintext letter always maps to the same cryptotext letter. Smooth out the frequency distribution, removing clues.
E N D
Polyalphabetic CIPHERS • Linguistics 484
Summary • The idea • How to recognize: index of coincidence • How many alphabets: Kasiski
The idea • Remove the invariant that a plaintext letter always maps to the same cryptotext letter. • Smooth out the frequency distribution, removing clues.
Monoalphabetic Ciphertext Plaintext Cryptosystem
Polyalphabetic A B Ciphertext Plaintext C Cryptosystem
Polyalphabetic A B Ciphertext Plaintext C Cryptosystem
Polyalphabetic A B C Ciphertext Plaintext Cryptosystem
Polyalphabetic A Ciphertext Plaintext B C Cryptosystem
Polyalphabetic system • Cryptosystem with several components. • Systematic way of moving from one cryptosystem to the next.
Vigenère (simplified) • Component ciphers are shift ciphers, using so called Direct Standard Alphabet • You use each alphabet for one character, then move on.
Vigenère (simplified) • You and your friend agree a single letter key, say ‘S’. • Encrypt the first letter with the ‘S’ alphabet, second with ‘T’ alphabet, and so on.
Vigenère (simplified) key=”s” BOOK JWVP
Polyalphabetic system • Cryptosystem with several components. • Systematic way of moving from one cryptosystem to the next. • But two weaknesses in simplified Vigenère. • Direct standard alphabets. Breaking one character gets whole alphabet. • Pattern of movement is too obvious.
Full Vigenère • Use keyword to control jump between alphabets • Pattern of movement no longer as obvious.
Vigenère key=”SYMBOL” THE ATOMIC ENERGY L..
Exercise • Encipher THE ATOMIC ENERGY with the keyword SYMBOL • Decipher AVYUL HWLEE UCZLL LTYVI YOFJI ZSLNI knowing that the keyword is HOUSE
Vigenère key=”HOUSE” • AVYUL TH...
Breaking Vigenère. • How many alphabets? • Index of co-incidence • Babbage-Kasiski examination • Once you have how many alphabets, use frequency analysis as for regular shift ciphers.
Index of co-incidence • Based on arguments about probability. • Intuition: measure roughness of frequency distribution • Mathematical details follow
Roughness of distributions • Smoothest distribution has each letter happening 1/26th of the time. • Roughest has one letter happening 100% of the time • Normal English has some uneveness, less smooth than totally uniform.
Index of co-incidence • Get a frequency f[letter] for each letter. • Multiply f[letter]*(f[letter]-1) to get number of co-incidences involving that letter. Add the results for all letters together. • Divide by the number of co-incidences you would expect if all the letters were the same.
Index of co-incidence IC = sum(f[letter]*(f[letter]-1)) / N(N-1)
Index of coincidence • IC has a value of 0.038 if the letters are evenly distributed, which is what you get if the polyalphabet uses many many alphabets • It has a value of 0.066 for English text, monoalphabetic encryptions of English text, many other things
Idea to quantify roughness • Count the number of times a pair of letters drawn at random from the text happen to be the same. • For the roughest possible, we always get the same letter, so a text of length N has N(N-1) repeats. • For the smoothest possible, we get way fewer.
Babbage-Kasiski method • Explained well in Code Book p 67-72 • Key idea: what does it mean if we find a sequence of repeated characters in a message that has been encoded using a repeated keyword.
Babbage-Kasiski method • Key idea: what does it mean if we find a sequence of four or more repeated characters in a message that has been encoded using a repeated keyword. • Most likely: a sequence of four or more English characters in the plaintext has been encoded twice starting from the same place in the repeating keyword. • Less likely: it’s an accident, some other arrangement of English letters gives rise to a repeat by chance.
Repeats • OK, if a repeat is due to the fact that the same thing is encoded twice in the same way, then the keyword must be used a whole number of times to get from one to the other. • So, keep track of the spacing between repeats.
Repeats • So, keep track of the spacing between repeats. • Nearly every repeat will have a spacing that divides evenly by the length of the keyword. • So, break the spacings into factors and look for something that (almost) always turns up.
Repeats • So, break the spacings into factors and look for something that (almost) always turns up.
Breaking Vigenère. • Once you have how many alphabets, use frequency analysis as for regular shift ciphers. • If there are five different alphabets, tally up characters 1,6,11,... into one table, 2,7,12,... into a second, 3,8,13,... into the third, and so on up to the fifth. • The results will show the characteristic frequency pattern of a shifted alphabet (high A, E close to each other, low J,K next to each other, X,Y,Z all low, etc.)
Breaking Vigenère. • Once you have how many alphabets, use frequency analysis as for regular shift ciphers. • The results will show the characteristic frequency pattern of a shifted alphabet (high A, E close to each other, low J,K next to each other, X,Y,Z all low, etc.) • See if the keyword is sensible. Might be an English word. Then plug in letters and check whether the message works out.