370 likes | 631 Views
Cryptography on Graphics Processors. Francisco Chinchilla COMP 290-GPGP Presentation. Papers. “Brute Force Attack on UNIX Passwords with SIMD Computer” Gershon Kedem and Yuriko Ishihara, Computer Science Department, Duke University. Usenix ’99 Security Symposium
E N D
Cryptography on Graphics Processors Francisco Chinchilla COMP 290-GPGP Presentation
Papers • “Brute Force Attack on UNIX Passwords with SIMD Computer” • Gershon Kedem and Yuriko Ishihara, Computer Science Department, Duke University. Usenix ’99 Security Symposium • "CryptoGraphics: Secret Key Cryptography Using Graphics Cards" • Debra L. Cook, John Ioannidis, Angelos D. Keromytis, and Jake Luck, Columbia University. To appear in the Proceedings of the RSA Conference, Cryptographer's Track (CT-RSA). February 2005, San Francisco, CA. • Remotely-Keyed Cryptographics • Debra L. Cook, Ricardo Baratto, and Angelos D. Keromytis, Columbia University.
Outline • Quick Tour of Cryptography • Brute force password hacking using PixelFlow • Implementing Cryptographic systems on a GPU • Using GPU based cryptographic systems in remote keyed systems in unfriendly environments • Conclusions
Cryptography Basics • Two functions of two variables • E(keye, text) encrypts • D(keyd, E(keye, text)) decrypts • If keye==keyd cryptosystem is called symmetric, otherwise it’s asymmetric • We’ll deal with symmetric
Cipher Types • Block Cipher • Encrypts fixed-size blocks of data • Stream Cipher • Encrypts input data one bit (sometimes one byte) at a time • Useful for data with unknown length • Stream ciphers are faster in hardware, but slower in software than block ciphers • Both use XOR heavily
Attacks to Break Encryption • Techniques used • Differential • Linear • Brute Force • Underlying methods • Chosen Plaintext • Ciphertext-only • Known Plaintext • Goals • Find a password • Find the key
DES • Block cipher • Developed by IBM, made a FIPS in ‘76 • Uses only 56-bit keys, not 64, not 128, possibly due to NSA • Designed against differential cryptoanalysis, a ‘new’ (not to IBM nor NSA) field of cryptography in the late 1980’s. Takes 2^47 chosen plaintexts to break. • Not designed against linear cryptoanalysis. Takes 2^39-2^44 known plaintexts • Broken in 56 hours by EFF $250k machine in 1998
DES Encryption Details Key Schedule The Feistel (F) function Subkey(48 bits)
DES – Other Boxes • http://en.wikipedia.org/wiki/Substitution_box • http://en.wikipedia.org/wiki/DES_supplementary_material
Crypt • A variation of the DES algorithm • Used to encrypt Unix passwords • Sample invocations: • crypt(password)=pajXG3p8vEaok • crypt(God)=GoqTni0y.7CH2 • crypt(sex)=se8lvCoF9FQE. • crypt(money)=moIzVjU7RTRr6 • crypt(love)=loYMPnrPlj9gk
AES • Competition to replace DES FIPS • Competition won by Rijndael • AES is fast in both SW and HW • Relatively easy to implement • Requires little memory
AES 128-bit Details • AES operates on a state, a 4×4 array of bytes. • For encryption, each round of a total of 10 consists of four stages: • SubBytes — a non-linear substitution step where each byte is replaced with another according to a lookup table, an S-block. This is the non-linear step. • ShiftRows — a transposition step where each row of the state is shifted cyclically a certain number of steps. • MixColumns — a mixing operation which operates on the columns of the state, combining the four bytes in each column using a linear transformation. This is the “Diffusion” step and is omitted in the final round! • AddRoundKey — each byte of the state is combined with the round key; each round key is derived from the cipher key using a key schedule.
RC4 • Stream cipher • Designed by Ron Rivest in ’87 • In 1994 a description was anonymously posted, which lead to ARCFOUR • In 2001 it was found that the first 256 bytes of the output were non-random (this leads to one big vulnerability in WEP) • Unusually simple • Requires only byte-length manipulations
RC4 - details RC4 Key Scheduling Algorithm (KSA) Encrypt(bits, key){ S = new byte[256]; KSA(key,S); randomBits = PRGA(S); for i=0 to length(bits)-1{ bits[i] = XOR(bits[i],randomBits[i]) } } Decrypt(bits, key){ Encrypt(bits, key) } for i from 0 to 255 S[i] := i j: = 0 for i from 0 to 255 j := (j + S[i] + key[i mod l]) mod 256 swap(S[i],S[j]) pseudo-random gen. algorithm (PRGA) i := 0 j := 0 while GeneratingOutput: i := (i + 1) mod 256 j := (j + S[i]) mod 256 swap(S[i],S[j]) output S[(S[i] + S[j]) mod 256]
Password Hacking • Two full chassis=18 SIMD Arrays=147,456 PE’s • Find the key used to encrypt a stream of bytes using RC4 • Check Unix passwords
Password Hacking Results • Each PE checks 1,000 RC4 keys in 3.75s = 38.8M/s for the system • Each PE checks 1,000 crypt passwords in 6s =24.6M/s for the system • System encrypts 614M DES passwords/second
Secret Key Cryptography • Test capability of XOR on different cards using different ciphers • RC4 • AES • Used OpenGL and GLUT, no CG • Did not use ‘the latest greatest’ cards • Basic steps • Disable dithering! • Copy pixels between coordinates, with colormapping and XOR on or off • Why colormaps?
Test systems Note: 256KB RAM = 256 MB RAM?
RC4 on a GPU • Random bit stream must be precomputed • Simply load the bit stream and the data to the GPU and then perform the XOR
AES on a GPU • Key schedule is not done on GPU • Only store 1 byte of data in each pixel • (De/En)crypts 4*n blocks simultaneously, given that the data is stored in n pixels • Tested against reference and optimized CPU implementations
Cryptography on GPU -Conclusions • Performance isn’t great, now what? • GPU’s can still offload some cycles off the CPU? • Not really, the CPU was still fully utilized • Look for a new application, like video system described in their next paper
Remotely Keyed Cryptographics • Extend their previous work • RC4 can be used to sustain rates that are ok for live video • The video is not susceptible to attacks from within the OS • Finally? explained why they didn’t use CG: “Higher level languages do not allow the developer to specify which OpenGL commands are utilized when there are multiple ways of implementing a function via OpenGL commands and do not even guarantee the operations will be transformed into OpenGL commands but instead may transform it intoC code… that may be executed on the OS and not using OpenGL.”
Regarding compression and transmission • Encrypted data doesn’t compress • 100 Mbps = 16.7 500x500 RGB frames/s • If you know about the RFP protocol (used in VNC) or MPG video, you know that the entire screen isn’t necessarily updated from frame to frame, only the sections that need it
Test systems (again) Note: 256KB RAM = 256 MB RAM?
Remote Keyed Cryptography- Conclusion and Comments • 99% of time taken on GPU is writing the keystream • Rates still good for thin-clients • Great for encrypting very sensitive images in a system that is not completely trusted
Presentation Conclusion • GPU’s can have lots and lots of FLOPs • Can we find applications that use them as the main PU? • As a co-processor to offload cycles? • If we fail, can we apply a different twist or find a special case/market of the setting where we had failed previously?
Discussion • Could make the performance of these algorithms better if we could use fragment programs for example?
Additional References • Wikipedia • lots and lots of info and images taken from here • PixelFlow Website • In particular:Eyles, John, Steven Molnar, John Poulton, Trey Greer, Anselmo Lastra, and Nick England, "PixelFlow: The Realization", Proceedings of the Siggraph/Eurographics Workshop on Graphics Hardware, Los Angeles, CA, August 3-4, 1997, 57-68.