Cryptographic Hash Functions

Cryptographic Hash Functions CS432

Overview • Hash Functions • Hash Algorithms: • MD5 (Message Digest). • SHA1: (Secure Hash Algorithm)

Hash functions • A hash function computes a fixed length value from a variable length source • Example: Check sums in communication protocols • Indices in databases • More convenient to handle a hash of a document instead of the document itself

Cryptographic Hash Functions (answers.com) • In cryptography, a cryptographic hash function is a hash function with certain additional security properties to make it suitable for use as a primitive in various information security applications, such as authentication and message integrity. • A hash function takes a long string (or 'message') of any length as input and produces a fixed length string as output, sometimes termed a message digest or a digital fingerprint.

Hash functions, definition • A hash function is a function f:{0,1}*  {0,1}n. • The size of the output, n, is a property of the function. Common values are 128, 160 and 256. • Informally, A transformation of a message of arbitrary length into a fixed-length number is called a hash function • Alternate names are fingerprint or digest • Commonly used hash functions are MD5, SHA and, SHA-1

Simple Examples • f(m) = first 70 bits of m • f(m) = last 80 bits of m • f(m) = XOR of the bytes of m

Properties of a Good Hash Function Let H be a hash function • One-way • Given x, unfeasible to compute a v such that H(v) = x • Collision-free • Unfeasible to find x1 and x2 such that H(x1) = H(x2) and x1x2

Applications of Hash Functions Hash functions are used for • message and file integrity • secure login • fingerprints of keys • authentication • digital signatures

Required Properties of Hash Functions • Preimage resistant given h it should be hard to find any m such that h = hash(m). • Second preimage resistant: given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2). This property is implied by collision-resistance. • Collision-resistant: given hash(m1), it should be hard to find a message m2 such that hash(m1) = hash(m2). • Due to a possible birthday attack, this means the hash function output must be at least twice as large as what is required for preimage-resistance.

Birthday Attack • A birthday attack is a type of cryptographic attack which exploits the mathematics behind the birthday paradox, making use of a space-time tradeoff. Specifically, if a functionf(x) yields any of H different outputs with equal probability and H is sufficiently large, then after evaluating the function for about different arguments we expect to obtain a pair of different arguments x1 and x2 with f(x1) = f(x2), known as a collision

Birthday Paradox • In probability theory, the birthday paradox states that given a group of 23 (or more) randomly chosen people, the probability is more than 50% that some pair of them will have the same birthday. • For 57 or more people, the probability is greater than 99%, although it cannot be exactly 100% unless there are at least 367 people.[1]Calculating this probability (and related ones) is the birthday problem. • The mathematics behind it has been used to devise a well-known cryptographic attack named the birthday attack.

Birthday Attack Any function H: {0,1}* ->{0,1}n must have infinitely many collisions. It requires O(2n/2) evaluations of H to find two messages m and m’ that have a collision, H(m)=H(m’). This means n must be reasonably large, otherwise it cannot be collision resistant.

Attacks Suppose a hash function H produces n bit values. Compose a document good doc and about 2n/2+1 semantically equivalent versions. Similarly, compose a bad doc and about 2n/2+1 semantically equivalent versions. With probability ½ or more there will be a version of the good doc and a version of the bad doc that have the same hash value.

Hash Algorithms • We will consider two main examples: • The message digest algorithm MD5 by Ron Rivest with 128 bit hash values. • The secure hash algorithm SHA-1. It was developed by NSA and standardized by NIST. This algorithm uses 160 bit hash values encoded in 5 x 32 bit words. • Other variations of SHA include SHA-256, SHA-384, SHA-512. Collisions in SHA-1 can be found by 263 attempts Collision in MD5 can be found in 8 hours using a notebook PC...

MD5: Message Digest Algorithm It compresses messages of 512 bits length into a hash of length 128 bits. A message of arbitrary length is padded to length k = 448 mod 512 A 64 bit string describing the length of the message is added. The message length is now a multiple of 512. The hashing is done block-by-block.

Step 1: Append padding bits • Padded so that its bit length  448 mod 512 (i.e., the length of padded message is 64 bits less than an integer multiple of 512 bits) • Padding is always added, even if the message is already of the desired length (1 to 512 bits) • Padding bits: 1000….0 (a single 1-bit followed by the necessary number of 0-bits)

Step 2: Append length: • 64-bit length: contains the length of the original message modulo 264 • The expanded message is Y0, Y1, …, YL-1; the total length is L  512 bits • The expanded message can be thought of as a multiple of 16 32-bit words • Let M[0 … N-1] denote the word of the resulting message, where N = L  16

MD5 Algorithm Architecture

Initialization Vector A buffer containing four words A,B,C,D of 32 bits is used to compute the hash value. Initializations are: A = 01 23 45 67 B = 89 ab cd ef C = fe dc ba 98 D = 76 54 32 10

   MD5 processing of a single 512-bit block (MD5 compression function)

A Typical MD5 Single Step

Truth table x y z F G H I 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 MD5 - functions • The procedure uses four boolean functions that operate bitwise on 32 bit words: • F(X,Y,Z) = (X Y)  (X  Z) • G(X,Y,Z) = (X  Z)  (Y  Z) • H(X,Y,Z) = X  Y  Z • I(X,Y,Z) = Y  (X Z)

What is X[k]? • The array of 32-bit words X[0..15] holds the value of current 512-bit input block being processed • Within a round, each of the 16 words of X[i] is used exactly once, during one step • The order in which these words is used varies from round to round • In the first round, the words are used in their original order • For rounds 2 through 4, the following permutations are used • 2(i) = (1 + 5i) mod 16 • 3(i) = (5 + 3i) mod 16 • 4(I) = 7i mod 16

T[i] • T is constructed from the sine function: T[i] = integer part of 232 abs(sin(i)), where i is in radians

Typical Values for T[i]

Circular Left Shift (CLS) <<< s - circular left shift (rotation) of the 32-bit arguments by s bits Values of s: Round 1: 7 12 17 22 Round 2: 5 9 14 20 Round 3: 4 11 16 23 Round 4: 6 10 15 21

//Note: All variables are unsigned 32 bits and wrap modulo 2^32 when calculating varint[64] r, T //r specifies the per-round shift amounts r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22} r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20} r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23} r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}

//Use binary integer part of the sines of integers as constants: for i from 0 to 63 T[i] := floor(abs(sin(i + 1)) × (2 pow 32)) //Initialize variables: varint h0 := 0x67452301 varint h1 := 0xEFCDAB89 varint h2 := 0x98BADCFE varint h3 := 0x10325476 //Pre-processing: append "1" bit to message append "0" bits until message length in bits ≡ 448 (mod 512) append bit (bit, not byte) length of unpadded message as64-bit little-endian integerto message

//Process the message in successive 512-bit chunks: for each512-bit chunk of message break chunk into sixteen 32-bit little-endian words X[i], 0 ≤ i ≤ 15 //Initialize hash value for this chunk: varint a := h0 varint b := h1 varint c := h2 varint d := h3

//Main loop: for i from 0 to 63 if 0 ≤ i ≤ 15 then g:= (b and c) or ((not b) and d) p := i else if 16 ≤ i ≤ 31 g := (d and b) or ((not d) and c) p := (5×i + 1) mod 16 else if 32 ≤ i ≤ 47 g := b xor c xor d p := (3×i + 5) mod 16 else if 48 ≤ i ≤ 63 g := c xor (b or (not d)) p := (7×i) mod 16

temp := d d := c c := b b := b + leftrotate((a + g + T[i] + X[p]) , r[i]) a := temp //Add this chunk's hash to result so far: h0 := h0 + a h1 := h1 + b h2 := h2 + c h3 := h3 + d varint digest := h0 append h1 append h2 append h3 //(expressed as little-endian) //leftrotate function definition leftrotate (x, c) return (x << c) or (x >> (32-c));

Secure Hash Algorithm (SHA) • Developed by NIST (National Institute of Standards and Technology) • Published as a FIPS PUB 180 in 1993 • A revised version is issued as FIPS PUB 180-1 • Generally referred to as SHA-1 • Input: a message with a maximum length of less than 264 bits • Output: 160-bit message digest • 32-bit word units, 512-bit blocks • 4 rounds  20 steps per block • Closely models MD4 • Slower, stronger than MD5

SHA Algorithm • The overall structure and logic is similar to MD5 • Step 1: Append padding bits • Step 2: Append length • Step 3: Initialize MD buffer • 160-bit buffer (five 32-bit registers A,B,C,D,E) is used to hold intermediate and final results of the hash function • A,B,C,D,E are initialized to the following values • A,B,C,D = same as in MD5, E = C3D2E1F0 • Stored in big-endian format (most significant byte of a word in the low-address byte position) • E.g. word E: C3 D2 E1 F0 (low address … high address)

Step Number Hexadecimal Ineteger Part of 0  t  19 Kt = 5A827999 [230  2] 20  t  39 Kt = 6ED9EBA1 [230  3] 40  t  59 Kt = 8F1BBCDC [230  5] 60  t  79 Kt = CA62C1D6 [230  10] SHA Algorithm • Step 4: Process message in 512-bit (16-word) blocks • Heart of the algorithm called a compression function • Consists of 4 rounds of processing of 20 steps each • The 4 rounds have a similar structure, but each uses a different primitive logical functions, referred to as f1, f2, f3, and f4 • Each round takes as input the current 512-bit block (Yq), 160-bit buffer value ABCDE and updates the contents of the buffer • Each round also uses the additive constants Kt, where 0  t  79 indicates one of the 80 steps across 4 rounds • In fact only 4 constants are used: • The output of 4th round (80th step) is added to the CVq to produce CVq+1

SHA-1 processing of a single 512-bit block (SHA-1 compression function)

Elementary SHA operation (single step)

Step Function Name Function Value ( 0  t 19) f1 = f(t,B,C,D) (B  C)  (B’  D) (20  t 39) f2 = f(t,B,C,D) B  C  D (40  t 59) f3 = f(t,B,C,D) (B  C)  (B  D)  (C  D) (60  t 79) f4 = f(t,B,C,D) B  C  D Truth table B C D f1 f2 f3 f4 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 Each primitive function takes three 32-bit words as input and produces a 32-bit word output Each function performs a set of bitwise logical operations

Note: All variables are unsigned 32 bits and wrap modulo 232 when calculating Initialize variables: h0 := 0x67452301 h1 := 0xEFCDAB89 h2 := 0x98BADCFE h3 := 0x10325476 h4 := 0xC3D2E1F0 Pre-processing: append the bit '1' to the message append k bits '0', where k is the minimum number >= 0 such that the resulting message length (in bits) is congruent to 448 (mod 512) append length of message (before pre-processing), in bits, as 64-bit big-endian integer Process the message in successive 512-bit chunks: break message into 512-bit chunks

for each chunk break chunk into sixteen 32-bit big-endian words w[i], 0 ≤ i ≤ 15 Extend the sixteen 32-bit words into eighty 32-bit words: for i from 16 to 79 w[i] := (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1 Initialize hash value for this chunk: a := h0 b := h1 c := h2 d := h3 e := h4

Main loop: for i from 0 to 79 if 0 ≤ i ≤ 19 then f := (b and c) or ((not b) and d) k := 0x5A827999 else if 20 ≤ i ≤ 39 f := b xor c xor d k := 0x6ED9EBA1 else if 40 ≤ i ≤ 59 f := (b and c) or (b and d) or (c and d) k := 0x8F1BBCDC else if 60 ≤ i ≤ 79 f := b xor c xor d k := 0xCA62C1D6 temp := (a leftrotate 5) + f + e + k + w[i] e := d d := c c := b leftrotate 30 b := a a := temp

Add this chunk's hash to result so far: h0 := h0 + a h1 := h1 + b h2 := h2 + c h3 := h3 + d h4 := h4 + e Produce the final hash value (big-endian): digest = hash = h0 append h1 append h2 append h3 append h4 The following equivalent expressions may be used to compute f in the main loop above: (0 ≤ i ≤ 19): f := d xor (b and (c xor d)) (alternative) (40 ≤ i ≤ 59): f := (b and c) or (d and (b or c)) (alternative 1) (40 ≤ i ≤ 59): f := (b and c) or (d and (b xor c)) (alternative 2) (40 ≤ i ≤ 59): f := (b and c) + (d and (b xor c)) (alternative 3

Cryptographic Hash Functions