A Markov Process Based Approach to Effective Attacking JPEG Steganography

  1. A Markov Process Based Approach to Effective Attacking JPEG Steganography By Y. Q. Shi, Chunhua Chen, Wen Chen NJIT Presented by Hanlin Hu and Xiao Zhang

  3. What is Steganography ? • Art and science of invisible communication to conceive the very existence of hidden messages • Images convey large size of message • Because of non-stationarity, Image Steganography is hard to attack • JPEG is popularly used format for Staganography as it is possible to compress JPEG images up to 1:10 ratio without significant loss

  4. Modern Stego Techniques • Outguess • F5 • MB(model -based)

  5. Modern Stego Techniques Outguess • Stego framework is created by embedding hidden data using redundancy of cover image. • Outguess preserves statistics of the BDCT coefficients histogram • Stego takes two measures before embedding data - Redundant BDCT coefficients, which has least effect on cover image. - Adjusts the untouched coefficients.

  6. Modern Stego Techniques (cont’d.) F5 • Works on JPEG format only. • Two main security actions against steganalysis attacks: - Straddling: scatters message uniformly over the cover image - Matrix Embedding: Improves embedding efficiency (no. of bits/ change of BDCT coeff.)

  7. Modern Stego Techniques (cont’d.) MB (Model-based Steganography) • Correlates embedded data with cover image • Splits cover image into two parts - Models parameter of distribution of second given first part - Encodes second part using model and hidden message - Combine these two parts to form stego image • MB1 operates on JPEG images, uses Cauchy distribution to model JPEG histogram

  9. Steganalysis Art of detecting hidden messages from stego images

  10. Previous Work on Steganalysis Universal Steganalyzer - proposed by Farid • Based on Image’s higher order statistics Universal Steganalysis – proposed by Shi et al • Based on statistical moments of characteristic functions of image, its prediction-error image and their DWT subbands

  11. Previous Work on Steganalysis • Fridrich proposed set of distinguishing features from BDCT and spatial domain for detecting messages embedded in JPEG images. • Specific Steganalysis with spread spectrum – by Sullivan et al - Inter-pixel dependencies used and Markov chain model is adopted. - Some loss is inevitable due to random feature selection - Markov chains used only in horizontal direction

  13. Markov Processes – Wikipedia • Named after mathematician Markov for random evolution of memoryless system • Definition: A stochastic process whose state at time t is X(t), for t>0 and whose history of states is given by x(s) for times s<t is a Markov process if Probability of its having state y at time t+h conditioned on having particular state x(t) at time t, is equal to conditional probability of its having that same state y but conditioned on its value for all previous times before t, presenting future state is independent of its past states.

  15. Feature Construction for Steganalysis • To classify as stego or non-stego image • In this Steganalysis scheme, second order statistics are used to detect JPEG steganographic method. • Steps: - Defining JPEG 2-D array - Introducing Difference JPEG 2-D array in different directions - Modeling this difference array using Markov random process (Transition Probability Matrix) - Thresholding technique to reduce computational cost.

  16. Defining JPEG 2-D array • Generation of features from 8 x 8 BDCT domain to attack steganography • 2-D array of same size as given image with each 8 x 8 block filled up with corresponding JPEG quantized 8 x 8 BDCT coeff. • Absolute value is taken resulting array as shown

  17. Difference JPEG 2-D array • Disturbance caused by Steganographic methods in JPEG images can be enlarged by observing difference between an element and one of its neighbors. • 4 JPEG 2-D difference arrays are generated. Fh(u, v) = F(u, v) – F(u+1, v) Fv(u, v) = F(u, v) – F(u, v+1) Fd(u, v) = F(u, v) – F(u+1, v+1) Fmd(u, v) = F(u+1, v) – F(u, v+1)

  18. Defining JPEG 2-D array (Cont’d.) • We choose absolute value of coefficients - BDCT coefficients do not obey Gaussian distribution - Power of 8 x 8 block of DCT coefficients is highly concentrated in DC and low freq. - These coefficients are non-increasing along zig-zag order. they are correlated. - difference of absolute values of two immediate neighbors is highly concentrated around 0 having Laplacian-like distibution.

  19. Difference JPEG 2-D array • Distribution of difference array elements is Laplacian with most values close to 0 • Most of the elements is difference array are in [-T, T] as long as T is large enough.

  20. Transition Probability Matrix • We use Markov Random Process with one-step transition probability matrix. • Second order statistics are used in order to reduce computational complexity dramatically • In order to reduce complexity further, thresholding technique is used. Hence dimensionality of matrix is reduced to (2T+1)X(2T+1) • By choosing proper ‘T’ value, good steganalysis capability with manageable computational complexity is achieved.

  21. Transition Probability Matrix (Cont’d.) • From equations beside, we have 4 X (2T+1) X (2T+1) elements • Choosing proper value of T gives steganalysis capability with manageable computational complexity

  22. Feature Formation Procedure

  23. Support Vector Machine • Classifier for pattern Recognition. • Easy to use than Neural Networks of Image analysis and Performance is comparable. • SVM is based on idea hyperplane classifier. • Optimal separation hyperplane is calculated by Langrangian multipliers. • SVM can be used for both linear and nonlinear separable case. • In linear case SVM, looks for Hyperplane (H) and two planes (H1 & H2 M) parallel to H. It maximizes distance b &w these two planes With no data points in between. • In nonlinear case SVM uses kernels ( Polynomial kernel) functions to locate linear hyperplane.

  25. Experiments and Results • Images used were 7560 JPEGs with QF ranging from 70-90 • Each one is cropped to 768*512 or 512*768 dimension • Chrominance set to zero and Luminance untouched before embedding.

  26. Experiments and Results (Cont’d.) • Stego Images Generation Embedding rate is ratio of message length to non-zero elements in JPEG 2-D array measured in bpc Considered embedding rates are - For OutGuess: 0.05, 0.1, 0.2 bpc and stego images generated are 7498, 7452, 7215 resp. - For F5 and MB1: 0.05, 0.1, 0.2, 0.4 bpc and 7560 stego images are generated. Step size equal to two for MB1

  27. Results obtained using SVM • Half of non-stego and stego image pairs selected to train SVM classifier and others are using trained classifier • 4 steganalysis schemes compared as shown to detect OutGuess, F5 and MB • Result: The proposed steganalyzer outperforms the prior-arts by significant margin • F5 has low detection rate on same embedding rate than MB1

  28. Result with features from one direction at a time • Contributions made from horizontal and vertical direction are more than that from main and minor diagonal directions. Contribution • Contribution made from main diagonal larger than that from the minor diagonal direction.

  30. Discussion • Taking absolute values in JPEG 2-D array is an advantage - Not taking absolute value degrades performance - Dynamic range of JPEG 2-D array will be increased - Following table shows performance comparison with and without absolute values for MB1

  31. Discussion (Cont’d.) • Detection Rates of F5 Detection rates for MB1 are higher than F5 for same embedding rates • Reasons: - F5 reduces magnitude of non-zero DCT AC coefficients by 1 in order to embed a bit and has larger probability to keep difference JPEG 2-D array elements unchanged after data embedding - Following statistics show that at low rates F5 changes fewer DCT co-eff. Than MB1 but reverse case for higher rate.

  32. Conclusion • Taking absolute value in JPEG 2-D array reduces computation complexity and raises analysis capability • Difference JPEG 2-D Arrays along horizontal, vertical, diagonal and minor diagonal directions have enlarged changes caused by Steganographic methods • Thresholding technique greatly reduces dimensionality of feature vectors to a manageable extent • Markov process to model difference JPEG 2-D arrays and using all elements in transition probability matrices as features, the second order statistics have been used

  33. References • C. J. C. Burges. “A tutorial on support vector machines for pattern recognition”, Data Mining and Knowledge Discovery, 2(2):121-167, 1998 • H. Farid, “Detecting hidden messages using higher-order statistical models”, International Conference on Image Processing, Rochester, NY, USA, 2002 • Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen, and C. Chen,“Steganalysis based on moments of characteristic functions using wavelet decomposition, prediction-error image, and neural network,” International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005 • www.wikipedia.org

