320 likes | 328 Views
Learn about cryptography, encryption, decryption, and symmetric and public-key encryption. Explore the DES and RSA cryptosystems, as well as steganography and types of cryptographic attacks.
E N D
Software SecurityCompSci 725Cryptography and Software Watermarking(Handout 12) Clark Thomborson University of Auckland
Applied Cryptography (2nd Ed.)Bruce Schneier • “Suppose a sender wants to send a message to a receiver. Moreover, this sender wants to send the message securely: She wants to make sure an eavesdropper cannot read the message.” • Exercise 1. Draw a picture of this scenario. • Exercise 2. Which of Pfleeger’s four threats is a concern to this sender?
Terminology of Cryptography • Plaintext, ciphertext, encryption, decryption: • Cryptography: the art (science) of keeping messages secure. • Cryptanalysts seek to “break” cyphertexts (that is, to discover the plaintext, given the cyphertext). Sender Decryption plaintext cyphertext plaintext Receiver Encryption
A Simple Encryption Scheme • Rot(k,s) : “rotate” each character in string s by k: for( i=0; i<len(s); i++ ) s[i] = ( s[i] + k ) mod 26; return(s); • Exercise: write the corresponding decryption routine. • Exercise: how many keys must you try, before you can “break” a ciphertext Rot(k,s)? • This is a (very weak) “secret-key” encryption scheme, where the secret key is k.
Symmetric and Public-Key Encryption • If the decryption key p can be computed from the encryption key k, then the algorithm is called “symmetric”. • Question: is Rot(k,s) a symmetric cipher? • If the decryption key p cannot be computed (in a reasonable amount of time) from the encryption key k, then the algorithm is called “asymmetric” or “public-key”.
Algebraic Notation for Cryptography • The encryption function is E( )or Ek( ). • The decryption function is D( ) or Dk( ). • The receiver can read the message if, and only if, they know how to compute D( ): D( E( M ) ) = M. • In public-key cryptography, we use key-pairs (k, p). • We let “everyone” know our public-key k. • Only we (and possibly a few trusted associates) know the corresponding private-key p; this knowledge is required to compute D( ) . • Anyone can send us secret messages using Ek( ); only people who know (or guess) p can read these messages.
Authentication in PK Cryptography • We can encrypt messages, using Ep( ), that everyone can read (!) • Ep(M) is a “signed message”. • Anyone who knows p can forge our “signature”. • We may want to have several public/private key pairs: • For our email, • For our bank account (our partner knows this private key too), • For our workgroup (shared with other members), … • A “public key infrastructure” (PKI) will help us discover other people’s public keys (k1, k2, …) if we know the name(s) under which they registered their keys. • The registry database is called a “certificate authority” (CA). • Warning: someone might register a key under your name!
Types of Cryptographic Attack • A ciphertext may be broken by… • Discovering the “restricted” algorithm (if the algorithm doesn’t require a key). • Discovering the key by non-cryptographic means (bribery, theft, ‘just asking’). • Discovering the key by “brute-force search” (through all possible keys). • Discovering the key by cryptanalysis based on other information, such as known pairs of (plaintext, ciphertext). • The weakest point in the system may not be its cryptography! • (What identification is required, when a CA accepts a key?) • See Schneier’s latest book.
Steganography • “Steganography serves to hide secret messages in other messages, such that the secret’s very existence is concealed.” [Schneier] • A cryptanalyst is given a ciphertext, and possibly some additional information such as key frequencies, cipher-plaintext pairs, “black-box” encrypters, etc. • Problem: to discover the corresponding plaintext. • Approach: try various encryption algorithms, various keys, … • A steganalyst is given a (large) collection of data. • Problem: discover a secret message, if any exists. • Approach: look in “places” where a message might be hidden. • If a secret message is enciphered, cryptanalysis is required to read the secret plaintext.
The DES Cryptosystem[Stinson, Cryptography: Theory & Practice, CRC Press, 1995] • The Data Encryption Standard (DES) was adopted in 1977 by the US government, and is one of three “FIPS-approved” algorithms. • Triple-DES (DES encryption applied three times) is the “FIPS-approved symmetric encryption algorithm of choice.” [http://csrc.nist.gov/cryptval/des.htm] • DES is somewhat slow in software (look at the bit-manipulations shown in Stinson’s article), but special-purpose hardware runs at 15 Gb/s. [http://www.xilinx.com/xapp/xapp270.pdf] • 3DES at 15 Gb/s requires 16K lookup tables (LUTs) and 16K flip-flops: one Xilinx chip.
RSA System[Stinson, Cryptography: Theory & Practice, CRC Press, 1995] • Rivest, Shamir & Adleman [1977] were the first to implement the public-key cryptosystem proposed by Diffie & Hellman [1976]. • Public-key cryptography is based on “one-way functions”. • Anyone can compute f( ). • Infeasible to compute f -1( ) unless you know the “secret”. • All NP-complete problems have hard instances (unless P = NP); but we need “average-case hardness”. • RSA encryption is f(x) = xb mod n. • n is the product of two primes: n = pq. • Public key is (n, b). Private key is (p,q). • Much slower than 3DES, even with hardware assistance.
Public Key Infrastructure[Adams & Lloyd, Introduction to PKI, MacMillan 1999] • If we can trust our database of public keys, then we have secure public messaging services: • Authentication: if a message is signed with a private key, then it was written & signed by an entity who knows the corresponding public key. • Integrity: if a signed message is altered, then you can detect this (if you know the public key). • Confidentiality: only if you have the private key, can you read a message that was encrypted under a public key. • (Modern email clients, such as Outlook Express, use standard PKI technology such as X.509 certificates and S/MIME to provide these services.)
Remote vs Local Authentication • You can authenticate yourself to a local machine by • what you have (e.g. a smart card), • what you know (e.g. a password), • what you “are” (e.g. your thumbprint) • what you do (e.g. your handwriting style) • Your local machine may have a copy of your private key, and it may have networked access to a CA: • it can authenticate your outgoing messages, using PKI; • it can verify the integrity of your incoming messages; • it can send confidential messages to other people; but • you have to trust the software on your local machine!
Limitations and Usage of PKI • If the Certificate Authority is offline, you may continue to use public keys stored in your local computer. • Warning: public keys may be revoked at any time (e.g. if someone reports their “key was stolen”). • Ephemeral key-pairs may be created (at significant computational cost) for one-time use. • Entity Naming Problem: • if you receive a message signed with a private key that is registered to “Clark Thomborson”, should you believe that your instructor sent you this message? • Alternative to PKI: “Key Continuity Management” (SSH; Gutmann; Garfinkel in http://www.simson.net/thesis/pki3.pdf) • The first time someone presents a key, you decide whether or not to accept it. • When someone presents a key that you have previously accepted, it’s probably ok. • When someone presents a changed key, you should think carefully before accepting!
Watermarking, Tamper-Proofing and Obfuscation – Tools for Software Protection Christian Collberg & Clark Thomborson IEEE Transactions on Software Engineering 28:8, 735-746, August 2002
Watermarking and Fingerprinting Watermark: an additional message, embedded into a cover message. • Messages may be images, audio, video, text, executables, … • Visible or invisible (steganographic) embeddings • Robust (difficult to remove) or fragile (guaranteed to be removed) if cover is distorted. • Watermarking (only one extra message per cover) or fingerprinting (different versions of the cover carry different messages).
Our Desiderata for (Robust, Invisible) SW Watermarks • Watermarks should be stealthy -- difficult for an adversary to locate. • Watermarks should be resilient to attack -- resisting attempts at removal even if they are located. • Watermarks should have a high data-rate -- so that we can store a meaningful message without significantly increasing the size of the object.
Attacks on Watermarks • Subtractive attacks: remove the watermark (WM) without damaging the cover. • Additive attacks: add a new WM without revealing “which WM was added first”. • Distortive attacks: modify the WM without damaging the cover. • Collusive attacks: examine two fingerprinted objects, or a watermarked object and its unwatermarked cover; find the differences; construct a new object without a recognisable mark.
Defenses for Robust Software Watermarks • Obfuscation: we can modify the software, so that a reverse engineer will have great difficulty figuring out how to reproduce the cover without also reproducing the WM. • Tamperproofing: we can add integrity-checking code that (almost always) renders it unusable if the object is modified.
Classification of Software Watermarks • Static code watermarks are stored in the section of the executable that contains instructions. • Static data watermarks are stored in other sections of the executable. • Dynamic data watermarks are stored in a program’s execution state. Such watermarks are resilient to distortive (obfuscation) attacks.
Dynamic Watermarks • Easter Eggs are revealed to any end-user who types a special input sequence. • Execution Trace Watermarks are carried (steganographically) in the instruction execution sequence of a program, when it is given a special input. • Data Structure Watermarks are built (steganographically) by a program, when it is given a special input sequence (possibly null).
Easter Eggs • The watermark is visible -- if you know where to look! • Not resilient, once the secret is out. • See www.eeggs.com
Goals for Dynamic Datastructure Watermarks • Stealth. Our WM should “look like” other structures created by the cover (search trees, hash tables, etc.) • Resiliency. Our WM should have some properties that can be checked, stealthily and quickly at runtime, by tamperproofing code (triangulated graphs, biconnectivity, …) • Data Rate. We would like to encode 100-bit WMs, or 1000-bit fingerprints, in a few KB of data structure. Our fingerprints may be 1000-bit integers that are products of two primes.
Permutation Graphs (Harary) 1 • The WM is 1-3-5-6-2-4. • High data rate: lg(n!) lg(n/e) bits per node. • High stealth, low resiliency (?) • Tamperproofing may involve storing the same permutation in another data structure. • But… what if an adversary changes the node labels? 3 4 5 2 6 Node labels may be obtained from node positions on another list.
Oriented Trees • Represent as “parent-pointer trees” • There are 1: 2: 22: oriented trees on n nodes, with c = 0.44 and = 2.956, so the asymptotic data rate is lg() 1.6 bits/node. 48: A few of the 48 trees for n = 7 Could you “hide” this data structure in the code for a compiler? For a word processor?
Planted Plane Cubic Trees n = 3 n = 2 n = 1 • One root node (in-degree 1). • Trivalent internal nodes, with rotation on edges. • We add edges to make all nodes trivalent, preserving planarity and distinguishing the root. • Simple enumeration (Catalan numbers). • Data rate is ~2 bits per leaf node. • Excellent tamperproofing. n = 4
Open Problems in Watermarking • We can easily build a “recogniser” program to find the WM and therefore demonstrate ownership… but can we release this recogniser to the public without compromising our watermarks? • Can we design a “partial recogniser” that preserves resiliency, even though it reveals the location of some part of our WM?
State of the Art in SW Watermarking • Davidson and Myhrvold (1996) encode a static watermark by rearranging the basic blocks of a code. • Venkatesan et al. (2001) add arcs to the control-flow graph. • The first dynamic data structure watermarks were published by us (POPL’99), with further development: • http://www.cs.arizona.edu/sandmark/ (2000- ) • Palsberg et al. (ACSAC’00) • Charles He (MSc 2002) • Collberg et al (WG’03) • Thomborson et al (AISW’04) • Jasvir Nagra, a PhD student under my supervision, is implementing execution-trace watermarks (IHW’04)
Software Obfuscation • Many authors, websites and even a few commercial products offer “automatic obfuscation” as a defense against reverse engineering. • Existing products generally operate at the lexical level of software, for example by removing or scrambling the names of identifiers. • We were the first (in 1997) to use “opaque predicates” to obfuscate the control structure of software.
A A A T T T F F F pT PT P? B B B Bbug B’ “always true” “indeterminate” “tamperproof” Opaque Predicates {A; B } (“always false” is not shown)
Opaque Predicates on GraphsDynamic analysis is required! g.Merge(f) f g f g f.Insert(); g.Move(); g.Delete() if (f = = g) then …
Conclusion • New art in software obfuscation can make it more difficult for pirates to defeat standard tamperproofing mechanisms, or to engage in other forms of reverse engineering. • New art in software watermarking can embed “ownership marks” in software, that will be very difficult for anyone to remove. • More R&D is required before robust obfuscating and watermarking tools are easy to use and readily available to software developers.