470 likes | 685 Views
Breaking HUGO – the Process Discovery presented jointly with Steganalysis of Content-Adaptive Steganography in Spatial Domain. jessica FRIDRICH jan KODOVSK Ý miroslav GOLJAN vojt ě ch HOLUB. Are there “issues” with adaptive stego ?.
E N D
Breaking HUGO – the Process Discovery presented jointly with Steganalysis of Content-Adaptive Steganography in Spatial Domain jessicaFRIDRICH janKODOVSKÝ miroslavGOLJAN vojtěchHOLUB
Are there “issues” with adaptive stego? • Content adaptive embedding leakage about placement of embedding changes. • Is HUGO’s probabilistically-known selection channel a problem? • Why should it be a problem? • It is all about how well we can model the content. • Honestly, fellow BOSS competitors, you all started here, • haven’t you? Fridrich, Kodovský, Holub, Goljan
Probability of embedding change … can be estimated from the stego image fairly well: cover estimated actual changes true Fridrich, Kodovský, Holub, Goljan
Complex texture of 512×512 images 512×512 image 4MP image Fridrich, Kodovský, Holub, Goljan
Look at what HUGO did … Seven images from BOSSrank can be detected visually as stego images: Close-up of its LSB plane BOSSrank image No. 235 Fridrich, Kodovský, Holub, Goljan
Weighted-Stego attack for HUGO? Assume that we can estimate Problem:E[c] varies much with content, cannot be easily thresholded or calibrated despite the fact that E[c] < E[s] in general (and sometimes by as much as 60% but on average by 1.74%). Fridrich, Kodovský, Holub, Goljan
Pixel domain is not useful, right? HUGO approximately preserves ~107 statistics computed from neighboring pixels. Intimidating, isn’t it? Forget the pixel domain, go to a different domain. Wavelet perhaps? Brushed off dust from WAM, put it on steroids, whacked HUGO with it. What we tried: added moments from LL band to inform steganalyzer about content (makes sense for content adaptive stego) add the same feature vector from re-embedded image (relying on “saturation effect” with re-embedding) replace Wiener filter in WAM with adaptive filter based on estimated probability of change: BOSSrank score: 59% Fridrich, Kodovský, Holub, Goljan
Go back to pixel domain! • Your best chances for detection are in the embedding domain. • Compute the residual where is an estimator of xij from its local neighborhood. • Advantages of computing detection statistics from rij: • narrower dynamic range • image content suppressed • higher SNR between stego-signal and noise • Undoubtedly, the best estimator is xij. However, should not depend on xij to avoid biased estimate (this is why denoising filters do not work well). Fridrich, Kodovský, Holub, Goljan
Higher-order local models (HOLMES) • HUGO approximately preserves joint distribution of three 1st-order differences among four neighboring pixels. • We need to get out of HUGO’s model: • Use four or more differences – cooc dimension grows too fast, bins in coocs become empty or underpopulated. • Use higher-order differences – they “see” beyond 4 pixels. SPAM feature set uses locally constant model constant model linear model quadratic model … … Fridrich, Kodovský, Holub, Goljan
Higher-order local models, cont’d Hugo is likely to embed here even though the content is modelable in the vertical direction However, pixel differences will mostly be in the marginal. Linear or quadratic models bring the residual back inside the cooc matrix Edge close up Image with many edges Fridrich, Kodovský, Holub, Goljan
Quantize and truncate Before computing the coocs, the residual is first quantized and then truncated. Note that we marginalize instead of cutting. The marginals (bins at the boundary) arevery important! Fridrich, Kodovský, Holub, Goljan
First successful features Take min/max of 2nd-order residuals in 4 directions: Features are two 3D cooc matrices: MINMAX: T = 4, q = 1, dim = 2×(2T+1)3 = 1458 QUANT : T = 4, q = 2, dim = 1458 Fridrich, Kodovský, Holub, Goljan
Encouraging results Early October Features: MINMAX, dim 1458 Training database: 2×9074 BOSSbase 0.91 Classifier: FLD BOSSrank: 71% Features: MINMAX+QUANT, dim 2916 Training database: 2×9074 BOSSbase 0.91 Classifier: G-SVM BOSSrank: 73% Fridrich, Kodovský, Holub, Goljan
Unexpected stego-source mismatch BOSSbase 0.91 was prepared with 4, 10 BOSSrank with 1 BOSSbase 0.92 embedded with 1. Retraining our classifier on the correct stego database gave: October 14 Features: MINMAX+QUANT, dim 2916 Training database: 2×9074 BOSSbase 0.92 Classifier: G-SVM BOSSrank: 75% Fridrich, Kodovský, Holub, Goljan
Do not say “hop” before you jump 79 78 77 Hugobreakers’ frustration BOSSrank 76 75 74 Oct 14 Nov 13 This is when BOSS became GOSS: “Guess Our Steganographic Source” Fridrich, Kodovský, Holub, Goljan
The dreaded cover-source mismatch The tell-tale symptom of the mismatch: Adding more features improved score on BOSSbase but worsened BOSSrank score. The problem: we trained on one source but tested on another (different) source. Our detector lacked robustness. Note that this is an issue of robustness rather than overtraining. Well recognized in detection and estimation. Very difficult problem as the mismatch can have so many different forms. Fridrich, Kodovský, Holub, Goljan
Trying to resolve the CSM a) Train on a more diverse source (adding 6000 images to BOSSbase lowered BOSSrank – making mismatch worse?) b) Use classifiers with a simpler decision boundary (L-SVM) (the same problem and lower accuracy) c) Contaminate the training set with BOSSrank images: - put denoised BOSSrank covers (use adaptive denoising based on estimated probabilities) - put re-embedded BOSSrank stego (unable to obtain consistent results with contamination when experimenting with BOSSbase, decided to toss it) d) Find out more about the cover source - estimate resampling artifacts – we could obtain info about the original image size (no artifacts detected by Farid’s code) - extract fingerprint from BOSSbase cameras, detect in images from BOSSrank, train on images from the right source. Fridrich, Kodovský, Holub, Goljan
Forensic analysis of BOSSrank • Fingerprint extracted from all 7 BOSSbase cameras and detected in BOSSrank. • ~500 images tested positive for Leica M9, no other camera tested positive • Leica • Rebel PCE BOSSrank images Fridrich, Kodovský, Holub, Goljan
Forensic analysis of BOSSrank, cont’d Most images taken in Pacific North-West
Forensic analysis of BOSSrank, cont’d Fingerprint extracted from 25 JPEG images from Tomas Filler’s camera (Panasonic Lumix DMC-FZ50) taken previously at SPIE conferences. Resized to 512×512 using the same script. Positively identified in ~77 BOSSrank images. Could not use for BOSS as other competitors did not have this opportunity. We closed our investigation with ~50% from Leica, the rest declared unknown. PCE BOSSrank images Fridrich, Kodovský, Holub, Goljan
Forensic-aidedsteganalysis Option #1: Buy Leica M9 and generate our own database. Oops … price is $7,000!! Option #2: LensRentals.com, rent it for a week. Took 7,301 images with Leica M9. Experiment#1 Train two classifiers – one trained only on Leica to analyze only Leica images, and one trained on all to analyze the rest. Merge the prediction files. Experiment#2 Add Leica images to the BOSSbasebatabase and train on all. Result: BOSSrank score either the same or slightly worse. Bummer Fridrich, Kodovský, Holub, Goljan
Can a cover source be replicated? • Cover source is a very complex entity shaped by: • Camera and its settings • short exposure lower dark current • high ISO increased level of noise • stopping lens at 5.6 sharper images than when stopped at 2.0 • Lens • short focus low depth of field easier for analysis • Content • Binghamton in Fall is a poor replacement for French Riviera. • Average amount of edges, smooth regions. • We rented the wrong lens (50 mm), Patrick used 35 mm. Fridrich, Kodovský, Holub, Goljan
Model diversity is the key QUANT, go 4D, use 3rd order differences (quadratic model), merge. Difference order Cooc. Tq dim 2nd 3 32 686 3rd 3 32 686 2nd 4 2 2 1250 3rd 4 2 2 1250 November 13 Features: dim 3872 Training database: 2×9074 BOSSbase 0.92 Classifier: G-SVM BOSSrank: 76% With increased dimensionality, machine learning became a serious bottleneck. Fridrich, Kodovský, Holub, Goljan
Ensemble classifier (SVM) • To facilitate further development, we started using ensemble classifiers instead of SVMs. • Set l1 • Randomly select k features out of d, kd. • Train a FLD on this random subspace on all BOSSbase images, set threshold to obtain minimum PE, store the eigenvector el. • Make decisions on BOSSrank (fjis the jth feature): • fj el > 0 Dec(l,j) 1 (stego) • fj el < 0 Dec(l,j) 0 (cover) • Repeat 2–4 L-times, obtain L decisions Dec(1..L, 1..1000) for each test image. • For each image, fuse decisions by voting. • Advantages • Low complexity (training of a 9288-dim set on 2×17,000 images with L31 and k 1600 takes only 8 minutes on a PC. • Performance comparable to SVM. Fridrich, Kodovský, Holub, Goljan
Scaling up feature dim seemed to work Mid November Feature set: Previous 3872 + 1458 (MINMAX) = 5330 Training database: 2×9074 BOSSbase v. 0.92 Classifier: Ensemble, L 31, k 1600 BOSSrank: 77% However, adding more features computed from various residuals did not improve BOSSrank, despite steady improvement on BOSSbase. Fridrich, Kodovský, Holub, Goljan
A little more empirical magic … Train on 2N images where N is about 20–50% larger than feature dimension. November 29 Feature set: 5330 + QUANT4 + SQUARE + KB = 9288 Training database: 2×9074 + 2×6500 = 2×15,574 Classifier: Ensemble, L 31, k 1600 BOSSrank: 78% 2500 2500 1458 QUANT4: SQUARE: + “square” cooc KB (Ker-Bőhme) kernel: cooc = H + V -1/4 1/2 -1/4 1/2 0 1/2 -1/4 1/2 -1/4
The final behemoth of dim 24,933 • Combination of 32 feature subsets containing • 1st–6th order differences • multiple versions with different values of q (quantization) • EDGE residuals (effective around edges) • Calibrated features (from a low-pass filtered image) • 5D coocs with T = 1 December 31 Feature set: 24,933 Training database: 2×34,719 Classifier: Ensemble, L 71, k 2400 BOSSrank: 81% Accuracy on Leica: 82.3% Accuracy on Panasonic: 70.0% Fridrich, Kodovský, Holub, Goljan
Score progress Fridrich, Kodovský, Holub, Goljan
Detecting HUGO without cover source mismatch alias • Steganalysis of Content-Adaptive • Steganography in Spatial Domain Fridrich, Kodovský, Holub, Goljan
Effect of quantization Quantization allows the features to sense changes in textured areas and around edges. 3D coocs are best quantized with q = c = central coefficient in the residual computation. c 1 c 2 c 3 c 6 c 10 c 20 Fridrich, Kodovský, Holub, Goljan Fridrich, Kodovský, Holub, Goljan
Best quantization value for 3D and 4D coocs Feature set MINMAX, 4th-order differences, 3D, T= 4. q 2 4 6 8 10 12 PE 30.5 26.8 26.1 26.8 27.7 28.2 Feature set MINMAX, 4th-order differences, 4D, T= 2. q 2 4 6 8 10 12 PE 34.2 30.7 28.2 26.8 27.5 28.4 For 3D coocs, the best q is c For 4D coocs, the best q is 1.5c Fridrich, Kodovský, Holub, Goljan
Testing higher-order residuals Average accuracy when training on 8074 and testing on 1000 images from BOSSbase repeated 100 times (all results with ensemble). Fea. type (diff, q, T) dPE Best Worst Lk “SPAM”(3D)* (2nd,1,4) 1458 71.4 74.5 69.0 31 1000 MINMAX(3D) (2nd,1,4) 1458 72.7 74.9 68.7 31 1000 QUANT(3D) (2nd,2,4) 1458 73.8 76.8 71.6 31 1000 QUANT(3D)+ (2nd–6th,c,4) 7290 80.0 82.2 77.4 81 1600 QUANT(4D)+ (2nd–6th,c,2) 6250 79.1 81.0 76.5 81 1600 *“SPAM” is a direct equivalent of SPAM vector with 1st order differences replaced with 2nd order. +2nd–6this a merger of QUANT features from 2nd–6th differences quantized with q =c= central coefficient in the residual Fridrich, Kodovský, Holub, Goljan
Accuracy on BOSSbase across cameras Accuracy per image of BOSSbase on 1000 splits 8074/1000 (trn/tst). Lines = avgs for each camera 6627 cover images always classified as cover 6647stego images always classified as stego 4836 images always classified correctly as cover AND stego
PentaxK20D is the easiest ROC and scatter plot with QUANT (dim 1458) Fridrich, Kodovský, Holub, Goljan
Canon Rebel is the hardest Scatter plot with QUANT (dim 1458) Fridrich, Kodovský, Holub, Goljan
Accuracy correlates with texture FLD scatter plot with QUANT (dim 1458) Average absolute 2nd difference Fridrich, Kodovský, Holub, Goljan
Leicaimages Typical Leica image histogram (possibly caused by the resizing script). Decreased dynamic range makes detection of embedding easier. Fridrich, Kodovský, Holub, Goljan
Scatter plot for LSB matching (QUANT 1458) Dependence on content is much weaker! Fridrich, Kodovský, Holub, Goljan
Comparison to 1 embedding and CDF … ensemble with 33,963-dim behemoth HUGO with BOSS payload, accuracy 84.2% Fridrich, Kodovský, Holub, Goljan
Implications for steganalysis • As steganography becomes more sophisticated, steganalysis needs to use more complex models to capture more subtle dependencies among pixels. • The key is diveristy! The model should be rich – a union of smaller submodels. • Feature dimensionality will inevitably increase. • Automatic handling of the dimensionality problem is preferable to hand-tweaking – ensemble classifiers scale well w.r.t. feature dim and training set size and are suitable for this task. • Detectability of HUGO embedding in larger images will increase faster than what Square Root Law dictates because neighboring pixels will be more correlated • Cover source mismatch is an extremely difficult problem that will hamper deployment of steganalysis in practice. • Robust machine learning is badly needed. Fridrich, Kodovský, Holub, Goljan
Implications for steganography • Adaptive stego implemented to minimize distortion in model space is the way to go • Critical: choice of model and distortion function • HUGO’s model is high-dim but too narrow • By making the model more diverse (rich) better steganography can likely be built • Despite progress made during BOSS, HUGO remains the most secure stego algorithm we ever tested Fridrich, Kodovský, Holub, Goljan
BOSS jump-started new directions • Optimal choice of residual and its quantization? • Perhaps learning both from given source and for stego algorithm? • Alternative to coocs as statistical descriptors of the random field of residuals? • Helped us develop ensemble classification as alternative to SVMs • Drew attention to CSM • training set contamination • training only on (processed) test images Fridrich, Kodovský, Holub, Goljan
Our current results on detection of HUGO and much more in the Rump Session. Fridrich, Kodovský, Holub, Goljan
Some more interesting stats 1000 splits of BOSSbase into 8074/1000 BEST … images always classified correctly as cover AND stego FAs …… images always classified as stego when cover MDs ….. images always classified as cover when stego Images Avg. gray Satur. pixls Texture BEST 74.1 2046 1.73 FAs 101.3 4415 4.66 MDs 102.0 5952 3.95 Texture: Scaled average |xij– xi,j+1| Fridrich, Kodovský, Holub, Goljan
Effect of quantization Cooc covers only this range Thin marginal Quantized distribution Original distribution Thick marginal Changes to elements from marginal are undetected. Fridrich, Kodovský, Holub, Goljan
Another example after scaling to 512×512 4MP image Fridrich, Kodovský, Holub, Goljan