Insights into Steganography: Detection Methods & Applications

Security and Error Correction/Detection in 802.1x and GSM Hide and Seek: An Introduction to Steganography Niels Provos and Peter Honeyman, University of Michigan IEEE Security and Privacy Journal, May-June 2003 (Vol. 1, No. 3) Sweety Chauhan October 24, 2005 CMSC 691I Clandestine Channels

Overview • New and Significant • What is Steganography? • Previous Work • Steganographic systems for JPEG images • Steganography Detection on the Internet • Results

New and Significant • Detection of Steganographic systems via statistical steganalysis • Practical application of detection algorithms

What is Steganography? • Art and Science of hiding communication • A steganographic system embeds hidden content in unremarkable cover media • A steganographic system consists of : • Identifying cover’s medium redundant bits • Embedding process which creates a stego medium by replacing the redundant bits with hidden message data

Statistical Steganalysis • Modern Steganography’s goal is to keep its mere presence undetectable • But steganographic systems – leave behind detectable traces in the cover medium • Though secret content is not revealed but its existence can be detected • Modifying the cover medium changes its statistical properties • Eavesdroppers can detect the distortions in the resulting stego medium’s statistical properties The process of finding these distortions is called statistical steganalysis

Information Hiding Systems • Three different aspects in information-hiding systems contend with each other: • Capacity – amount of information that can be hidden in the cover medium • Security – eavesdropper inability to detect hidden information • Robustness – amount of modification the stego medium can withstand before an adversary can destroy hidden information • Watermarking system – high level of robustness • Steganography – high security and capacity • Hidden information is fragile

Steganographic Systems • Classical Steganography system • Security relies on the encoding system’s secrecy • e.g. – Roman General shaving slave’s head and tattooing a message on it. After the hair grew back, the slave was sent to deliver the hidden message • Modern Steganography • Attempts to be detectable only if secret information is known (secret key) • Similar to Kerckhoffs’ Principle of cryptography which holds that “a cryptographic system’s security should rely solely on the key material”

Modern Steganography • Steganographic communication senders and receivers agree on a : • steganographic system • a shared secret key – determines how message is encoded in the cover medium

Overview of Encoding Step • To send a hidden message, for example, • Alice creates a new image with digital camera • Alice supplies the steganographic system with her shared secret and message • The steganographic systems uses the shared secret to determine how the hidden message should be encoded in the redundant bits • The result is the stego image that Alice sends to Bob • When Bob receives the image, he uses the shared secret and the agreed steganographic system to retrieve the hidden message

Hide and Seek in JPEG images • Why steganographic systems for JPEG format? • System operate in a transform space • Not affected by visual attacks (as in BMP images) • Modifications are in the frequency domain instead of the spatial domain • Neil F. Johnson and Sushil Jajodia showed steganographic systems for palette-based images leave easily detected distortions

Discrete Cosine Transform (DCT) For each color component, the JPEG image format uses a Discrete Cosine Transform (DCT) to transform successive 8x8 pixel block of the image into 64 DCT coefficients each The DCT coefficients F(u, v) of an 8 x 8 block of image pixels f(x, y) are given by The following operation quantizes the coefficients: where Q(u,v) is a 64-element quantization table

Steganographic Systems • Sequential – for example: JSteg • Pseudo Random – for example: Outguess 0.1 • Subtraction – for example: F5 • Statistics aware embedding

Least-significant bits of the quantized DCT coefficients is used as redundant bits to embed the hidden message Sequential Embedding (I) • Derek Upham’s JSteg Algorithm - does not require a shared secret Input: message, cover image Output: stego image while data left to embed do get next DCT coefficient from cover image if DCT ≠ 0 and DCT ≠1 then get next LSB from message replace DCT LSB with message LSB end if insert DCT into stego image end while • As a result anyone who knows the steganographic system can retrieve the message hidden by JSteg

Sequential Embedding Steganalysis (I) • Andreas Westfeld and Andreas Pfitzmann noticed that • steganographic systems that change least-significant bits sequentially cause distortions detectable by steganalysis • for a given image, the embedding of high-entropy data (often due to encryption) changed the histogram of color frequencies in a predictable way. • Embedding uniformly distributed message bits reduces the frequency difference between adjacent DCT coefficients’ • By observing differences in the DCT coefficients’ frequency, embedding can be detected

Frequency Histograms Histogram before (a) and after (b) a hidden message is embedded in a JPEG image Sequential changes to the (a) original and (b) modified image’s least-sequential bit of discrete cosine transform coefficients tend to equalize the frequency of adjacent DCT coefficients in the histograms

Sequential Embedding Steganalysis (II) • Westfeld and Pfitzmann χ2-test • determine whether the observed frequency distribution in an image matches a distribution that shows distortion from embedding hidden data • The probability of embedding is determined by calculating p for a sample from the DCT coefficients • The samples start at the beginning of the image and for each measurement the sample size is increased

Sequential Embedding Steganalysis (III) • A high probability of embedding indicates that the image contains steganographic content • Hidden message’s length can also be determined by JSteg

Pseudo Random Embedding • Niels Provos’s Outguess 0.1 steganographic system • Improves the encoding step by using a pseudo-random generator to select DCT coefficients at random • The LSB of a selected DCT coefficient is replaced with encrypted message data

The algorithm replaces the least-significant bit of pseudo-randomly selected discrete cosine transform (DCT) coefficients with message data Outguess 0.1 Algorithm • The OutGuess 0.1 algorithm : Input: message, shared secret, cover image Output: stego image initialize PRNG with shared secret while data left to embed do get pseudo-random DCT coefficient from cover image If DCT ≠ 0 and DCT ≠1 then get next LSB from message replace DCT LSB with message LSB end if insert DCT into stego image end while

Embedded Message Detection (I) • χ2 -test can be extended to detect the local distortions in an image • Two identical distributions produce about the same χ2 values in any part of the distribution • Instead of increasing the sample size and applying the test at a constant position, • a constant sample size is used and the sample position is increased (slided)

Embedded Message Detection (II) • The extended χ2-test detects pseudo-randomly embedded messages in JPEG images • The detection rate depends on • hidden message’s size • number of DCT coefficients in an image • can be improved by applying a heuristic that eliminates coefficients likely to lead to false negatives The graph shows the detection rates for three different false-positive rates The change rate refers to the fraction of discrete cosine transform (DCT) coefficients available for embedding a hidden message that have been modified

Subtraction • Andreas Westfeld’s steganographic system, F5 • Instead of replacing the least-significant bit of DCT coefficient with message data • F5 decrements its absolute value in a process called matrix encoding • There is no coupling of any fixed pair of DCT coefficients • χ2-test cannot detect F5

Matrix Encoding • Matrix encoding computes an appropriate (1, (2k– 1), k) Hamming code by calculating the message block size k from • the message length and • the number of nonzero non-DC coefficients • The Hamming code (1, 2k– 1, k) encodes a k-bit message word m into an n-bit code word a with n = 2k– 1 • can recover from a single bit error in the code word

The F5 algorithm Input: message, shared secret, cover image Output: stego image initialize PRNG with shared secret permutate DCT coefficients with PRNG determine k from image capacity calculate code word length n←2k – 1 while data left to embed do get next k-bit message block repeat G←{n non-zero AC coefficients} s←k-bit hash f of LSB in G s←s k-bit message block if s ≠0 then decrement absolute value of DCT coefficient Gs insert Gs into stego image end if untils = 0 or Gs ≠ 0 insert DCT coefficients from Ginto stego image end while

F5 Detection Algorithm • Embedding information with F5 leads to double compression • Most of the images are stored already in the JPEG format which could confuse this detection algorithm. • Fridrich and her group proposed a method for eliminating the effects of double compression by estimating the quality factor used to compress the cover image

Statistics-aware embedding • Previous discussed algorithms overwrite image data without directly considering the distortions that the embedding will cause • To embed a single bit, • a DCT coefficient’s value can either increment or decrement which allows change of DCT coefficient’s least-significant bit in two different ways • Creating groups of DCT coefficients and using the parity of their least-significant bits as message bits • For every DCT block, the space of all possible changes is searched to find a configuration that minimizes the change to image statistics

Detection Algorithms • Two Different classes of algorithms: • Based on inherent statistical properties • no need to find a representative training set • estimate an embedded message’s length • Based on class discrimination • Creating a representative training set is often difficult • Do not provide an estimate of the hidden message’s length

Steganography Detection on the Internet • How previous discussed steganalytic methods can be used in real world setting? • Created a steganography detection framework that • gets JPEG images off the Internet and • uses steganalysis to identify subsets of the images likely to contain steganographic content

Steganography Systems in use • JSteg • supports content encryption and compression before JSteg embeds the data • uses the RC4 stream cipher for encryption • JPHide • uses Blowfish as a PRNG Version 0.5 supports additional compression of the hidden message • uses slightly different headers to store embedding information • Before the content is embedded, the content is Blowfish-encrypted with a user-supplied pass phrase • OutGuess • All use some form of least-significant bit embedding and are detectable with statistical analysis

Detection Framework • Stegdetect is an automated utility that can analyze JPEG images that have content hidden with JSteg, JPHide, and OutGuess 0.13b • Stegdetect’s output lists • the steganographic systems it finds in each image or • writes “negative” if it couldn’t detect any • Stegdetect’s false-negative rate depends on: • The steganographic system and the embedded message’s size • The smaller the message, the harder it is to detect by statistical means. • Stegdetect is very reliable in finding images that have content embedded with JSteg • For JPHide, detection depends also on the size and the compression quality of the JPEG images

Detection Results Using Stegdetect over the Internet. (a) JPHide and (b) JSteg produce different detection results for different test images and message sizes

Finding Images • Images from eBay auctions and discussion groups in the Usenet archive for analysis. • Developed Crawl, a simple, efficient Web crawler that makes a local copy of any JPEG images it encounters on a Web page • Crawl performs a depth-first search and has two key features: • Images and Web pages can be matched against regular expressions • Hence, include or exclude Web pages in the search • Minimum and maximum image size can be specified • Hence exclude images that are too small to contain hidden messages • Calculation of true positive rate – the probability that an image detected by Stegdetect really has steganographic content

Percentages of (false) positives for analyzed images Test EBAY USENET JSteg 0.003 0.007 JPHide 1 2.1 OutGuess 0.1 0.14 Percentages of positives for analyzed images • After processing 2 million ebay images with Stagdetect • Over 1% of all the images seemed to contain hidden content • JPHide was detected most often

Verifying Hidden Content • Stegdetect cannot guarantee a hidden message’s existence • To verify the hidden content, Stegbreak must launch a dictionary attack against the JPEG files • JSteg-Shell, JPHide, or Outguess all hide content based on a user-supplied password • an attacker can try to guess the password by taking a large dictionary and trying to use every single word in it to retrieve the hidden message • embedded header information, so attackers can verify a guessed password using header information

Stegbreak Performance on a 1,200- MHz Pentium III System ONE IMAGE (words/second) FIFTY IMAGES (words/second) JPHide 4,500 8,700 OutGuess 18,000 34,000 JSteg 36,000 47,000 Stegbreak Performance

Results: Steganography Detection on the Internet • From eBay and Usenet research • No single hidden message was found • Explanations for inability to find steganographic content on the Internet: • All steganographic system users carefully choose passwords that are not susceptible to dictionary attacks • Maybe images from sources that were not analyze carry steganographic content • Nobody uses steganographic systems that researchers could find • All messages are too small for analysis to detect Either they are looking in the wrong place or there is no widespread use of steganography on the Internet

Conclusion • Today, computer and network technologies provide easy-to-use communication channels for steganography • Research work • Provides an overview of existing steganographic systems • presents methods for detecting them via statistical steganalysis

Future Work • Research new algorithms to • Hide information • Improve Steganalysis

References • Hide and Seek: An Introduction to Steganography, Niels Provos, Peter Honeyman, IEEE Security and Privacy Journal, May-June 2003 • Cyber warfare: steganography vs. steganalysis , Huaiqing Wang, Shuozhong Wang , Communications of the ACM, Volume 47, Issue 10, October 2004 • http://www.outguess.org/detection.php • http://www.jjtc.com/Security/stegtools.htm • http://www.stack.nl/~galactus/remailers/index-stego.html

Thanks a lot … For Your Presence And Patience

Any Questions

Homework Presentation Slides and Research Papers are available at : www.umbc.edu/~chauhan2/CMSC691I/

Insights into Steganography: Detection Methods & Applications