770 likes | 778 Views
Dive into the world of secret communication with steganography and cryptography. Learn how to hide messages within larger documents using microdotting and pixel modifications. Understand encryption systems, keys, and decryption methods to protect your messages. Discover the art of cryptography, frequency analysis, and Caesar cipher, along with simple encryption methods like transposition and substitution. Explore the importance of encryption keys and various improvement techniques to strengthen your message security.
E N D
Outline • Secret communication • Steganography • Cryptography • Goal • Sender has message intended only for recipient. How to protect from detection or eavesdropping? • Definitions • Examples • Please read: • Chapter 1 in The Code Book
Steganography • Secret message is hidden inside larger document • How? • Microdotting • requires special equipment • Having this equipment makes you target of suspicion • Sprinkle the message into much larger area • Ex. Add a letter to every 7th page of a book • Slightly modify pixels of an image
Cryptography • Modify the message so that if it’s discovered, only the recipient can understand it. • A “cipher system” consists of the following: • Plaintext = the message you want to send/conceal • Ciphertext = what actually gets sent • Encryption and decryption functions • Each function requires a key • To encrypt a message, sender uses the encryption function: inputs are the encryption key and plaintext message. • Receiver decrypts the message: uses the decryption function: inputs are the decryption key & ciphertext.
Cryptography (2) • Cryptography is the science of developing good cipher systems • What encryption and decryption functions should we use? • What are good key values? • Should the keys be the same or different? • Cryptanalysis = trying to figure out plaintext from the ciphertext, when you don’t have the key • Who would want to do that?
Frequency analysis • Top 10 letters • Sources: "Secret Codes and Cryptograms" by ElonkaDunin http://www.lionelwindsor.net/bibleresources/koine/grabble/frequency.htm
Text data s = "Bring me my bow of burning gold" password = "123" value = 123 • How does the computer know if a variable contains a string? • All data is internally stored in binary. • We know how numbers look in binary √ • Binary representation of characters? Two step process: First, we have to assign each symbol a number. Then, convert the number to binary.
ASCII code • To manipulate text, we need to treat each letter in a message like a number. • “American Standard Code for Information Interchange” • The letters A – Z are assigned the consecutive values 65 thru 90. Examples: ‘A’ + 10 = ‘K’ ‘K’ – 10 = ‘A’ • ASCII codes are also assigned for lowercase letters and all other symbols you can type
Caesar cipher • The key is a secret number from 1-25. This number gets added to each letter. • Julius Caesar liked the number 3 • How do you decrypt the message? • What happens if we add 3 to X, Y or Z? • Cryptanalysis: How does one try to “break” the code? OAAW FTQ DAMEF UZ HUZQSMD
Encryption • Simple methods of encryption • Transposition • Substitution • Better substitution methods • Important fact: the method you use might not be a secret, but the key must be! • Please read chapter 2 in Code Book
Transposition • Change the order of the letters in our plaintext message • Easy approach: collect the odd # letters (1st, 3rd, 5th, 7th, …), then the even # letters (2nd, 4th, 6th, 8th, …). • In this case, we say that the period or key is 2. • Can try a larger key, which means a longer period. • Ex. Key = 4 means collect the 1st, 5th, 9th, 13th, … followed by the 2nd, 6th, 10th, 14th, etc. • In other words: Write your message in rows 4 letters long. Add Z’s to end to make last row complete. The ciphertext is obtained by reading columns down!
Substitution • Most cipher systems use substitution: instead of moving letters around, change each letter into a different symbol. We can get very creative! • Caesar cipher • Easy to use, but only 26 possible keys (including a bad one) • The general substitution cipher • Not restricted in merely shifting the alphabet • We have 26! keys instead of 26
continued • Instead of adding a constant to each letter, scramble the assignment more randomly • One problem is how to “remember” the key. The key is not a number, but the sequence of 26 letters. • Ex. Identify a key by the letters of some word or phrase. Cryptography, page 26: “we hope you enjoy this book” gives the key: wehopyunjtisbkacdfglmqrvxz • Cryptanalysis? • Newspapers have “cryptogram” puzzles • Uncover a weakness of substitution method • Even worse if the message is long
Improvements • Nulls: cipher includes symbols/numbers that mean nothing • Misspell words: use unusual letters more often • Homophonic cipher: common letters can be represented by 1 of several possible values • Code words • Pure encoding means we encipher by words, not letters • Nomenclator: just encode certain words • Polyalphabetic cipher: use multiple ciphers and alternate. Useful to hide double letters
Chapter 2 • Pinprick method • Great Cipher of Louis XIV • Vigenère cipher • Cryptanalysis • Book (Beale) cipher • Some implementation details • Caesar cipher • List of lists for homophonic ciphers • Try/except statement • Please read chapter 3 of Code Book
Early examples • Pinprick method • Long history from Ancient Greece… Victorian England • Is this steganography or cryptography? • Great Cipher of Louis XIV • Encode syllables • To thwart the cryptanalyst , also add cipher values for individual letters (even silent ones) Backspace symbol to tell decrypter to delete previous symbol • Homophonic cipher • E.g. Table on page 53
Cipher system recap • Caesar – add same number to each plaintext character • Transposition – place the plaintext into rows, read off the columns • Homophonic – Assign multiple possible values to more common letters; select them randomly • “Great” cipher – encode syllables • Polyalphabetic – Use more than one Caesar, alternate what numbers you add • Vigenère – A polyalphabetic cipher: the key is itself a word or phrase
Vigenère cipher • Example of polyalphabetic cipher • Has the effect of using many Caesar ciphers simultaneously. • The key is a word or phrase • Cycle through the letters of the key, to tell you what should be added to the next plaintext letter • Repeat the key as often as needed. • If key = “DOG”: add 4, 15, 7, 4, 15, 7, 4, 15, 7, … • Considered unbreakable until mid 19th century.
Vigenère variations • When we “add” a letter from the key to a letter of plaintext, what exactly does this mean? • At least 3 possibilities • Letters A-Z represented as 1-26 • Letters A-Z represented as 0-25 • Use ASCII code values • Just an implementation detail. The choice doesn’t matter as long as you are consistent.
Breaking Vigenère • Charles Babbage performed the cryptanalysis, to show this cipher is not perfectly secure • Look for repetitions of short letter sequences • How far apart are they? • Determine length of the key • Since Vigenère is several Caesar ciphers, do each pattern separately (1,6,11,16…) ; (2,7,12,17…); etc. • Compare statistical distribution of letters. Shift it over until it matches A-Z: shift amount is the key.
Book cipher • Reminiscent of homophonic cipher: each letter can be represented by 1 of several possible numbers • The key is a long document, e.g. hundreds of words • Number the words 1,2,3… • Take note of first letter of each word • E.g. (1)take (2)note (3)of (4)first (5)letter (6)of (7)each (8)word. In this case, ‘o’ may be enciphered by 3 or 6.
Python coding • See handout for Caesar cipher example • In Python, we need to be able to: • Open files for reading and writing • Read entire file into a string variable • For each letter in the file, we need to perform arithmetic on it, before writing it to the output file • Obstacle: Need to convert string’s letter to a number so we can add/subtract; convert back to letter. • To do homophonic encoding… • Key is a list of lists! For each letter A-Z, specify what numbers could represent it.
Try / except statement • To get ready for lab, we need to understand how to gracefully handle abnormal run-time situations • General structure has 2 essential parts: try: # put the risky code here except <name of error>: # what we should do if error occurs # remainder of program follows
Examples (handout) • Distribute pieces of candy among children. But what if # children is zero? (ZeroDivisionError) • Ask user for name of input file. What if it doesn’t exist? (FileNotFoundError) • Add 13 to user’s input number. But what if the user’s input is not a number? (ValueError) • Return 10th character from input string. What if it isn’t long enough? (IndexError) • You don’t have to memorize the type of error. How would we find out what it is?
Chapter 3 • Review Python example • File I/O, loop on characters, change letters • Running the Caesar cipher • Messages sent by radio • Review polyalphabetic ciphers (Vigenère) • One time pad improvement • Please read Code Book, pp. 127-160 • covering Enigma
Radio technology • No longer necessary to send message by • Line of sight • Messenger, mail • Physical wire • Enemy can just as easily hear your message • Increased demand for cryptography by WW1 • 1 more piece of information: traffic analysis • Can identify radio operators by how they tap Morse code • Take note of direction and strength of signal • Over time, deduce movement of battalions…
Polyalphabetic • Vigenère cipher was good • How does it work? • Babbage’s cryptanalysis • We know that the key needs to repeat while enciphering. • Look for repeated substrings in ciphertext key length • Once you know key length, you have n Caesar ciphers. Check the letter frequency of each to see how much the alphabet “shifted” • But also possible to break even if key is very long & no substrings repeat • Example pp. 117-119
continued • Observations • Let’s assume that the key is a real word or phrase. • Trial & error in plaintext: find where “the” could be. See which locations yield possible words in key. “can” and “ypt” are plausible, but “bsj” is not. • When you find letters in the key that form part of a word, fill in the remaining letters of the word, to see if more of the plaintext can be deciphered. We know the plaintext has to contain real words. e.g. key fragment Egypt seems to work, but not apocalyptic • Once you have a word of the key, see if it’s part of a logical pattern, such as a list of countries.
Lesson • To use a Vigenère cipher effectively, the key: • Should be long, so that repeated letters in ciphertext are not obvious or do not occur • Should consist of random letters, because real words make it breakable, as we just saw. e.g. Inserting “the” at various places in the plaintext should not give clues about the key • The result is called a one-time pad • To get a really long key, you distribute a “code book” to your agents. • Japanese version: encode entire words. Instead of replacing with a random number, add to a random number
Random key helps • Cryptanalyst is forced to try all possible keys to see which one generates meaningful plaintext • But, many possible messages are possible. • E.g. if message length is 21, then all possible messages of size 21 will be found. Impossible to tell which one is “correct” because key does not have any clues. • 3 problems with one-time pad • How to distribute it to the field, maintain security • How to create the random key • Too difficult to use under extreme conditions. In some cases, on a negative acknowledgement, people resend message in plaintext in frustration or desperation.
Improvements • Lab recap • Improvements to Vigenère cipher • We want a long, random key • Doing it for Japanese • Automating the process: do it by machine Enigma, Typex, SIGABA • Please finish chapter 4 in Code Book
String loop • Let’s understand the loop we used in steganography for i in range (0, len(text)): if i % 100 == 0 and i/100 < len(secretMessage): outFile.write(secretMessage[i/100]) else: outFile.write(text[i]) • Let’s interpret this in English • For every 100 characters of “text” we write 1 letter of the secret message. Ex. What happens when i equals 700?
Japan’s code • One-time pad (e.g. JN-25) • Dictionary table: convert each word to a 5-digit number • Additive table: add the next random number to each word • Preface the message by indicating where in additive table you are starting the encoding • Tables may be periodically changed. • Example: encryption code book.xlsx
Enigma • Arthur Scherbius, 1918 • Used by Germany through WW2 • Commercial and military versions • Served as basis for other cipher machines • How to use • Set machine to today’s starting position • Has usual keyboard • As you type, display lights up the ciphertext letter, which you need to write down • As you type, scramblers rotate to next position, i.e. to next letter of the Vigenère “key”
Scrambler • Disk containing wires that connect each letter on keyboard with different ciphertext letter for display board. • You type “A”, may be connected to “D” • But then the scrambler rotates, so the next “A” may be connected to a different letter (p. 129) • Effect Vigenère cipher with random key of length 26 • Two scramblers • Effectively means we have two keys. • Plaintext + key1 + key2 = ciphertext • Like odometer, second scrambler rotates less often
Strength of cipher • Enigma used 3 scramblers out of a possible 5 • Length of key = 26 * 26 * 26 • Number of scrambler choices = 5 * 4 * 3 = 60 • (Similar American device “SIGABA” used 15 scramblers) • Plugboard • 6 pairs of letters were swapped before entering scrambler Ex. Change “A” to “B” before adding the cipher keys • This increases # of possible keys, making cryptanalysis more confusing • “Code book” tells operator what daily machine settings are
Chapter 4 • Let’s review Vigenère, one-time-pad and Enigma • How do you break a Vigenère cipher? • Suppose you intercept a message from Brazil. It uses a Vigenère cipher. The ciphertext says TQX. • We don’t know the key! Can try all possibilities. • If key = AHZ, what is the plaintext? • If key = ZTI, what is the plaintext? • Breaking the unbreakable… • Exploit weaknesses in the implementation. • Look for mathematical patterns in rotors. • Consider the plugboard settings separately.
Enigma • mechanized Vigenère • Breaking the code • If you can mechanize enciphering, why not mechanize cryptanalysis? • Intelligence on how Germans used Enigma • The work of Marian Rejewski and Alan Turing
Breaking the code • What skills are needed to be a cryptanalyst? • Prepare for the worst: The Germans might make your work in the future more difficult. • Day key and message key • Four messages today might begin like this • L and R are encryptions of the same letter, etc.
Letter chains • Look for patterns! Example: A F W A • All letters belong to some chain • The length of the chain is a fingerprint of the scrambler settings for today. • We separate out the problem of the letter identity • Bombe: automatically try all 263 settings. • Plugboard? Look for words that are almost spelled right.
German mistakes • Repeating the message key • Not allowing a scrambler to be in the same position (left, middle, right) two days in a row • Plugboard cannot link 2 letters next to each other on keyboard • Predictable structure of message, time of message
Alan Turing • Bletchley Park contributions • Extending the work of Rejewski • Relied on fact that weather reports start with word “wetter” and used this to deduce initial scrambler setting But what if you are off by a couple of letters? Fortunately, more help from German constraints. • Turing Machine • He was interested in theoretical boundaries of what a computer could accomplish • Machine has: instructions, internal memory, tape • Turing proved that not all problems can be solved
By the numbers • New flavor of cryptography: numbers • Transpose bits • “add” bits • Data encryption standard • Problem of sharing keys… solved! • Diffie-Hellman protocol • Please finish reading chapter 6 in Code Book
Binary • Computers represent all data, including text, in the form of binary numbers • p. 246: ASCII codes for capital letters • Cryptography can now be done at a minute level: manipulate individual bits • Transposition cipher • E.g. Every 3rd bit swaps with bit to its right • It’s possible the last bit of a letter swaps with first bit of the next letter
Vigenère in binary • If we adapt the Vigenère cipher to the binary representation of text, we get XOR cipher • Example p. 247 • Line up the bits for the addition: 0 + 0 = 0 1 + 1 = 0 0 + 1 = 1 1 + 0 = 1 • In other words, if the bits are the same, result is 0. If bits differ, result is 1. There is no carry. • Data Encryption Standard • Performs many successive steps of transposition and XOR • Key length 56, later increased.
Key problem • One lingering problem since antiquity: how do 2 people share the key privately? • Need to coordinate in person periodically, or • Hire a trusted 3rd party to relay your keys • Expensive • Whitfield Diffie’s vision • People will need privacy online • Handling keys in person only is impractical • Is there a technological solution? Yes! • Collaborated with Martin Hellman and Ralph Merkle at Stanford
Math idea • Some functions are 2-way (easy to undo) • We need a 1-way function (hard to undo) so that an eavesdropper cannot discover the key • The 1-way function they used was “mod” or remainder calculations. • Example, f(n) = 3n mod 19? • Given a value of n, calculating f(n) isn’t too hard. • But given a value of f(n), there is no obvious way to determine n except trial and error! • In practice, the “19” should be replaced by a much larger number.
Diffie - Hellman • Method for 2 people to establish a private key • Choose values p (prime) and q • Sender • chooses secret value a, and computes A = qa mod p • Sends A, p, q • Eavesdropper cannot easily determine a • Receiver • Chooses secret value b • Computes B = qb mod p and K = Ab mod p • Sends B back to sender, who can compute K = Ba mod p • Both methods of computing secret K are equivalent • Ab mod p = (qa)b mod p • Ba mod p = (qb)a mod p
Calculations • Recap Diffie-Hellman protocol – what is it? • How to calculate something like qa mod p? • Asymmetric cipher system: RSA • Review chapter 6
“mod” properties • (In Python, the symbol for mod is %) • x mod y is always less than y • If x < y, the answer is x. • If x is a multiple of y, the answer is 0. • The 10th day of the year is in which month? • What about the 20th day, the 40th, the 100th ? These are essentially mod calculations, except that the month lengths are not quite the same! • What time is it, 75 hours after midnight?
Asymmetric ciphers • Perennial problem in cryptography: key security • One solution is Diffie-Hellman protocol • Diffie thought of another approach • Use separate keys for encrypting and decrypting (this is why it’s called asymmetric) • Each person has a public encryption key, and a private decryption key • These keys don’t need to change often • No messages need to be passed to compute value of keys. It can be arbitrary & predetermined.