Software Security CompSci 725 Cryptography and Software Watermarking (Handout 12)

Software SecurityCompSci 725Cryptography and Software Watermarking(Handout 12) Clark Thomborson University of Auckland

Applied Cryptography (2nd Ed.)Bruce Schneier • “Suppose a sender wants to send a message to a receiver. Moreover, this sender wants to send the message securely: She wants to make sure an eavesdropper cannot read the message.” • Exercise 1. Draw a picture of this scenario. • Exercise 2. Which of Pfleeger’s four threats is a concern to this sender?

Terminology of Cryptography • Plaintext, ciphertext, encryption, decryption: • Cryptography: the art (science) of keeping messages secure. • Cryptanalysts seek to “break” cyphertexts (that is, to discover the plaintext, given the cyphertext). Sender Decryption plaintext cyphertext plaintext Receiver Encryption

A Simple Encryption Scheme • Rot(k,s) : “rotate” each character in string s by k: for( i=0; i<len(s); i++ ) s[i] = ( s[i] + k ) mod 26; return(s); • Exercise: write the corresponding decryption routine. • Exercise: how many keys must you try, before you can “break” a ciphertext Rot(k,s)? • This is a (very weak) “secret-key” encryption scheme, where the secret key is k.

Symmetric and Public-Key Encryption • If the decryption key p can be computed from the encryption key k, then the algorithm is called “symmetric”. • Question: is Rot(k,s) a symmetric cipher? • If the decryption key p cannot be computed (in a reasonable amount of time) from the encryption key k, then the algorithm is called “asymmetric” or “public-key”.

Algebraic Notation for Cryptography • The encryption function is E( )or Ek( ). • The decryption function is D( ) or Dk( ). • The receiver can read the message if, and only if, they know how to compute D( ): D( E( M ) ) = M. • In public-key cryptography, we use key-pairs (k, p). • We let “everyone” know our public-key k. • Only we (and possibly a few trusted associates) know the corresponding private-key p; this knowledge is required to compute D( ) . • Anyone can send us secret messages using Ek( ); only people who know (or guess) p can read these messages.

Authentication in PK Cryptography • We can encrypt messages, using Ep( ), that everyone can read (!) • Ep(M) is a “signed message”. • Anyone who knows p can forge our “signature”. • We may want to have several public/private key pairs: • For our email, • For our bank account (our partner knows this private key too), • For our workgroup (shared with other members), … • A “public key infrastructure” (PKI) will help us discover other people’s public keys (k1, k2, …) if we know the name(s) under which they registered their keys. • The registry database is called a “certificate authority” (CA). • Warning: someone might register a key under your name!

Types of Cryptographic Attack • A ciphertext may be broken by… • Discovering the “restricted” algorithm (if the algorithm doesn’t require a key). • Discovering the key by non-cryptographic means (bribery, theft, ‘just asking’). • Discovering the key by “brute-force search” (through all possible keys). • Discovering the key by cryptanalysis based on other information, such as known pairs of (plaintext, ciphertext). • The weakest point in the system may not be its cryptography! • (What identification is required, when a CA accepts a key?) • See Schneier’s latest book.

Steganography • “Steganography serves to hide secret messages in other messages, such that the secret’s very existence is concealed.” [Schneier] • A cryptanalyst is given a ciphertext, and possibly some additional information such as key frequencies, cipher-plaintext pairs, “black-box” encrypters, etc. • Problem: to discover the corresponding plaintext. • Approach: try various encryption algorithms, various keys, … • A steganalyst is given a (large) collection of data. • Problem: discover a secret message, if any exists. • Approach: look in “places” where a message might be hidden. • If a secret message is enciphered, cryptanalysis is required to read the secret plaintext.

The DES Cryptosystem[Stinson, Cryptography: Theory & Practice, CRC Press, 1995] • The Data Encryption Standard (DES) was adopted in 1977 by the US government, and is one of three “FIPS-approved” algorithms. • Triple-DES (DES encryption applied three times) is the “FIPS-approved symmetric encryption algorithm of choice.” [http://csrc.nist.gov/cryptval/des.htm] • DES is somewhat slow in software (look at the bit-manipulations shown in Stinson’s article), but special-purpose hardware runs at 15 Gb/s. [http://www.xilinx.com/xapp/xapp270.pdf] • 3DES at 15 Gb/s requires 16K lookup tables (LUTs) and 16K flip-flops: one Xilinx chip.

RSA System[Stinson, Cryptography: Theory & Practice, CRC Press, 1995] • Rivest, Shamir & Adleman [1977] were the first to implement the public-key cryptosystem proposed by Diffie & Hellman [1976]. • Public-key cryptography is based on “one-way functions”. • Anyone can compute f( ). • Infeasible to compute f -1( ) unless you know the “secret”. • All NP-complete problems have hard instances (unless P = NP); but we need “average-case hardness”. • RSA encryption is f(x) = xb mod n. • n is the product of two primes: n = pq. • Public key is (n, b). Private key is (p,q). • Much slower than 3DES, even with hardware assistance.

Public Key Infrastructure[Adams & Lloyd, Introduction to PKI, MacMillan 1999] • If we can trust our database of public keys, then we have secure public messaging services: • Authentication: if a message is signed with a private key, then it was written & signed by an entity who knows the corresponding public key. • Integrity: if a signed message is altered, then you can detect this (if you know the public key). • Confidentiality: only if you have the private key, can you read a message that was encrypted under a public key. • (Modern email clients, such as Outlook Express, use standard PKI technology such as X.509 certificates and S/MIME to provide these services.)

Remote vs Local Authentication • You can authenticate yourself to a local machine by • what you have (e.g. a smart card), • what you know (e.g. a password), • what you “are” (e.g. your thumbprint) • what you do (e.g. your handwriting style) • Your local machine may have a copy of your private key, and it may have networked access to a CA: • it can authenticate your outgoing messages, using PKI; • it can verify the integrity of your incoming messages; • it can send confidential messages to other people; but • you have to trust the software on your local machine!

Limitations and Usage of PKI • If the Certificate Authority is offline, you may continue to use public keys stored in your local computer. • Warning: public keys may be revoked at any time (e.g. if someone reports their “key was stolen”). • Ephemeral key-pairs may be created (at significant computational cost) for one-time use. • Entity Naming Problem: • if you receive a message signed with a private key that is registered to “Clark Thomborson”, should you believe that your instructor sent you this message? • Alternative to PKI: “Key Continuity Management” (SSH; Gutmann; Garfinkel in http://www.simson.net/thesis/pki3.pdf) • The first time someone presents a key, you decide whether or not to accept it. • When someone presents a key that you have previously accepted, it’s probably ok. • When someone presents a changed key, you should think carefully before accepting!

Watermarking, Tamper-Proofing and Obfuscation – Tools for Software Protection Christian Collberg & Clark Thomborson IEEE Transactions on Software Engineering 28:8, 735-746, August 2002

Watermarking and Fingerprinting Watermark: an additional message, embedded into a cover message. • Messages may be images, audio, video, text, executables, … • Visible or invisible (steganographic) embeddings • Robust (difficult to remove) or fragile (guaranteed to be removed) if cover is distorted. • Watermarking (only one extra message per cover) or fingerprinting (different versions of the cover carry different messages).

Our Desiderata for (Robust, Invisible) SW Watermarks • Watermarks should be stealthy -- difficult for an adversary to locate. • Watermarks should be resilient to attack -- resisting attempts at removal even if they are located. • Watermarks should have a high data-rate -- so that we can store a meaningful message without significantly increasing the size of the object.

Attacks on Watermarks • Subtractive attacks: remove the watermark (WM) without damaging the cover. • Additive attacks: add a new WM without revealing “which WM was added first”. • Distortive attacks: modify the WM without damaging the cover. • Collusive attacks: examine two fingerprinted objects, or a watermarked object and its unwatermarked cover; find the differences; construct a new object without a recognisable mark.

Defenses for Robust Software Watermarks • Obfuscation: we can modify the software, so that a reverse engineer will have great difficulty figuring out how to reproduce the cover without also reproducing the WM. • Tamperproofing: we can add integrity-checking code that (almost always) renders it unusable if the object is modified.

Classification of Software Watermarks • Static code watermarks are stored in the section of the executable that contains instructions. • Static data watermarks are stored in other sections of the executable. • Dynamic data watermarks are stored in a program’s execution state. Such watermarks are resilient to distortive (obfuscation) attacks.

Dynamic Watermarks • Easter Eggs are revealed to any end-user who types a special input sequence. • Execution Trace Watermarks are carried (steganographically) in the instruction execution sequence of a program, when it is given a special input. • Data Structure Watermarks are built (steganographically) by a program, when it is given a special input sequence (possibly null).

Easter Eggs • The watermark is visible -- if you know where to look! • Not resilient, once the secret is out. • See www.eeggs.com

Goals for Dynamic Datastructure Watermarks • Stealth. Our WM should “look like” other structures created by the cover (search trees, hash tables, etc.) • Resiliency. Our WM should have some properties that can be checked, stealthily and quickly at runtime, by tamperproofing code (triangulated graphs, biconnectivity, …) • Data Rate. We would like to encode 100-bit WMs, or 1000-bit fingerprints, in a few KB of data structure. Our fingerprints may be 1000-bit integers that are products of two primes.

Permutation Graphs (Harary) 1 • The WM is 1-3-5-6-2-4. • High data rate: lg(n!)  lg(n/e) bits per node. • High stealth, low resiliency (?) • Tamperproofing may involve storing the same permutation in another data structure. • But… what if an adversary changes the node labels? 3 4 5 2 6  Node labels may be obtained from node positions on another list.

Oriented Trees • Represent as “parent-pointer trees” • There are 1: 2: 22: oriented trees on n nodes, with c = 0.44 and  = 2.956, so the asymptotic data rate is lg()  1.6 bits/node. 48: A few of the 48 trees for n = 7 Could you “hide” this data structure in the code for a compiler? For a word processor?

Planted Plane Cubic Trees n = 3 n = 2 n = 1 • One root node (in-degree 1). • Trivalent internal nodes, with rotation on edges. • We add edges to make all nodes trivalent, preserving planarity and distinguishing the root. • Simple enumeration (Catalan numbers). • Data rate is ~2 bits per leaf node. • Excellent tamperproofing. n = 4

Open Problems in Watermarking • We can easily build a “recogniser” program to find the WM and therefore demonstrate ownership… but can we release this recogniser to the public without compromising our watermarks? • Can we design a “partial recogniser” that preserves resiliency, even though it reveals the location of some part of our WM?

State of the Art in SW Watermarking • Davidson and Myhrvold (1996) encode a static watermark by rearranging the basic blocks of a code. • Venkatesan et al. (2001) add arcs to the control-flow graph. • The first dynamic data structure watermarks were published by us (POPL’99), with further development: • http://www.cs.arizona.edu/sandmark/ (2000- ) • Palsberg et al. (ACSAC’00) • Charles He (MSc 2002) • Collberg et al (WG’03) • Thomborson et al (AISW’04) • Jasvir Nagra, a PhD student under my supervision, is implementing execution-trace watermarks (IHW’04)

Software Obfuscation • Many authors, websites and even a few commercial products offer “automatic obfuscation” as a defense against reverse engineering. • Existing products generally operate at the lexical level of software, for example by removing or scrambling the names of identifiers. • We were the first (in 1997) to use “opaque predicates” to obfuscate the control structure of software.

A A A T T T F F F pT PT P? B B B Bbug B’ “always true” “indeterminate” “tamperproof” Opaque Predicates {A; B }  (“always false” is not shown)

Opaque Predicates on GraphsDynamic analysis is required! g.Merge(f) f g f g f.Insert(); g.Move(); g.Delete() if (f = = g) then …

Conclusion • New art in software obfuscation can make it more difficult for pirates to defeat standard tamperproofing mechanisms, or to engage in other forms of reverse engineering. • New art in software watermarking can embed “ownership marks” in software, that will be very difficult for anyone to remove. • More R&D is required before robust obfuscating and watermarking tools are easy to use and readily available to software developers.

Software Security CompSci 725 Cryptography and Software Watermarking (Handout 12)

Software Security CompSci 725 Cryptography and Software Watermarking (Handout 12)

Presentation Transcript

CompSci 230 Software Construction

CompSci 230 Software Construction

Cryptography and Network Security

Cryptography and Network Security

COMPSCI 725 Presentation

Software Security CompSci 725 Cryptography and Steganography (Handout 11)

Software Security CompSci 725 Handout 14: Academic Writing; Introduction to Cryptography

CompSci 725 Presentation

Computer Science 725 – Software Security Presentation

Software Security

Software Security CompSci 725 Handout 23: Report Writing #1

Digital Watermarking Software

Software Security CompSci 725 Handout 11: Report Writing #1

Software Security CompSci 725 Handout 13: Finalising your Term Paper

CompSci 725 Presentation

Software Security CompSci 725 Handout 9: Report Writing #1