330 likes | 338 Views
This article provides an introduction to steganalysis schemes used to detect hidden messages in multimedia files, focusing on LSB encoding and JPEG compatibility methods. It discusses the goals, automatic checking methods, and application scenarios of steganalysis, as well as presents a proposed scheme and detection algorithm for identifying images with secret messages embedded in them. The limitations, experimental results, parameter optimization, and threshold selection are also discussed.
E N D
Introduction to Steganalysis Schemes Multimedia Security
Outline • Steganalysis to LSB encoding • Steganalysis based on JPEG compatibility • Some discussions
Introduction • Steganography • The art of secret communication • Stego content (e.g. images) should not contain any easily detectable artifacts due to message embedding • The less information is embedded, the smaller the probability of introducing detectable artifacts
Fidelity Watermarking Capacity Robustness Steganography Watermarking vs. Steganography
Goal • To inspect one or possibly more images for statistical artifacts due to message embedding in color images using the LSB method • To find out whichimages are likely to contain secret messages • To estimate the reliability of decisions • Type I error (false-alarm) and Type II error (Miss)
Automatic Checking Internet Internet node with a special filter Images in Seized computer Images sent to a certain address Forensics Expert Application Scenarios
LSB Encoding • Replacing the LSB of every gray-level of color channel with message bits • On average 50% of the LSB are changed • Logic behind this scheme • LSB in scanned or camera-taken images are essentially random • Encrypted (randomized) message are random • No statistical artifacts will be introduced
Important Observation • Number of unique colors in cover images • Typically smaller than the number of pixels in the images • 1:2 for high quality scans in BMP format • 1:6 or lower for JPEG images or video • Many true-color images have a relatively small “palette” • After LSB embedding, new color palette will have a distinct feature • Many pairs of close colors • An evidence of LSB encoding-based steganography
Formulations • U: number of unique colors in an image • P: number of close color pairs • Two colors (R1,G1,B1) and (R2,G2,B2) are close if |R1-R2|≤1 and |G1-G2|≤1 and |B1-B2|≤1 • R: ratio between the number of close pairs of colors and all pairs of colors • R=P/C(U, 2) , C(., .) # of combination
The Proposed Scheme • After embedding, U will be increased to U’, and we can evaluate the number of unique pairs of P’. • The value of R for an image that does not have a message will be smaller than that of an image that already has a message already embedded in it
The Proposed Scheme (cont.) • It is impossible to find a threshold of R for all images • Due to a large variation of U • Observations for reliable distinguishing • For an image already contains a large message • Embedding another message in it does not modify R significantly • For an image not containing a message • R increases significantly • Use the relative comparison of R as the decision criterion
Detection Algorithm • To find out whether or not an image has a secret message • Calculate R=P/C(U, 2) • Using LSB embedding in randomly selected pixels • Size of the test message: 3‧a‧M‧N (for M by N color images) • Calculate R’=P’/C(U’,2) • Decide whether an image is embedded • R~=R’ the image already had a large message hidden • R’>R the image did not have a message in it R’/R: the separating statistics
Limitations • If the secret message size is too small • the two ratio will be very close to each other • We cannot distinguish images with and without messages
Experiments • Using an image database of 300 color images • 350x250 pixels • JPEG compressed • Capacity for each image: 32.8k bits (350x250*3/8) • A message of length 20KB (2/3 of maximal capacity) was embedded into each image to form a new database of images with messages • The detection algorithm is run for both database and the message presence is tested by embedding a test message of size 1KB (a=1/30)
1.1 _ : original database … : embedded database Experimental Results
Parameter Optimization • Model the density functions as Gaussian distributions • N(μ, σ) and N(μs, σs) • Different size of secret messages ,denoted as s, and test messages are tested • Secret messages: 1% to 50% • Test messages: a=0.01 – 0.5 • Results • μ>μs for all s • s decreases N(μs, σs) become flat and the peak moves right • s increases N(μs, σs) become narrower and the peak moves left • Easier to separate the two peaks for larger secret message sizes
Threshold Selection Type I Error = Type II Error (equals minimizing overall error) Change the threshold Th to adjust for the importance of not missing an image with a secret message at the expense of false-alarm
K K K K Experimental Results
K K Experimental Results (cont.)
Conclusions • The probability of error prediction is mainly determined by the size of the secret message • The influence of the test message size is much smaller • The optimal test message size is different for different secret message size • The detection algorithm mainly targets for images with smaller number of unique colors • The results for high-quality scanned and loselessly compressed images (U>0.5MN) may be unreliable
Image Steganography • Image formats • Uncompressed (BMP) • Offering the highest capacity and best overall security • Palette (GIF) • Difficult to provide security with reasonable capacity • Lossy compressed (JPEG, JPEG 2000) • Difficult to hide message in JPEG stream in a secure manner while keeping the capacity practical
Goal of this Paper • To show that images may be extremely poor candidates for cover images if • Initially acquired as JPEG images and later decompressed to a loseless format • For steganalysis methods, minimal amount of distortion is to be achieved to reduce visible artifacts • The act of message embedding will not erase the characteristic structure created by JPEG compression • Analyzing the DCT coefficients of images to recover even the values of JPEG quantization table • Evidence for steganography • An image stored in loseless format that bears a strong fingerprinting of JPEG compression, yet is not fully compatible with JPEG compressed image
DCT Uncompressed Image Borig dk(i), i=0,…,63 Huffman coder Zigzag-scan Dk(i)=Round (dk(i)/Q(i)) JPEG Quantization Matrix Q JPEG Compression
JPEG Decompression • Huffman decoding • QDk(i)=Q(i)*Dk(i) • Multiplying quantized DCT step with quantization step • Braw=DCT-1(QD) • Inverse DCT • B=[Braw] • rounded to integers in the range of 0-255
Observations • If the block B has no pixels saturated at 0 or 255 • ||Braw-B||2 ≤ 16 , ||·||: L2 norm • Since |Braw(i)–B(i)| ≤0.5 for all i
The Proposed Scheme • Question • Given an arbitrary 8x8 block B of pixel values, could this block have arisen through the process of JPEG decompression with the quantization matrix Q (if available)? • ||B-Braw||2 =||DCT(B)- DCT(Braw)|| =||QD’-QD|| ≤ 16 • Additional check • Σ(QD’(i)-qp(i)(i))2 ≤ 16, qp(i):integer multiples of Q(i) close to QD(i) • B=[DCT-1(QD)], where QD(i)=qp(i)(i) By Parseval’s Equality ≧Σ|QD’(i)-Q(i)round(QD’(i)/Q(i)| = S
Algorithm • Divide the images into 8x8 blocks • Arrange the blocks in a list, and remove all saturated blocks from the list • T: number of remaining blocks • Extract the quantization matrix Q from all T blocks • If all elements of Q are 1s, the image is not calculated
Algorithm (cont.) 4. For each block B, calculate S 5. If S>16, B is not compatible with JPEG compression. else Perform the additional check 6. After going through T blocks, if no incompatible blocks is found, no evidence of steganography is available. 7. Repeat the algorithm for different 8x8 division for detecting cropped images
Reference • J. Fridrich, R. Du and M. Long, “Steganalysis of LSB encoding in color images, ” ICME 2000, New York, 2000 • J. Fridrich, M. Goljan and R. Du, “Steganalysis based on JPEG compatibility,” SPIE Multimedia Systems and Applications IV, Denver, 2001 • G. Goth, “Steganalysis gets past the hype,’ IEEE Distributed Systems Online, April 2005