230 likes | 373 Views
The Unseen Challenge Data Sets. Anderson Rocha Walter Scheirer Siome Goldenstein Terrance Boult. The Data Sets. Two data sets are provided PNG: lossless compression JPEG: lossy compression Prevalence of images on the Internet Sources: Google images, Yahoo Images, and Flickr.
E N D
The Unseen Challenge Data Sets Anderson Rocha Walter Scheirer Siome Goldenstein Terrance Boult
The Data Sets • Two data sets are provided • PNG: lossless compression • JPEG: lossy compression • Prevalence of images on the Internet • Sources: Google images, Yahoo Images, and Flickr
Message Sizes • For each tool, we provide four different embedding size: • Tiny: < 5% of the channel capacity • Small: > 5% & < 15% of the channel capacity • Medium: > 15% & < 40% of the channel capacity • Large: > 40% of the channel capacity • For the PNG set, the message size is explicitly stated • For the JPEG set, the message size is NOT stated
Message Content • Random bit sequences • Snippets of mp3 songs • Plain text • Other images A B C
Categories • Each set consists of clean and stego images • Clean set • Modified: cropping, overlay, object-appending • Non-modified: original • Stego set • 4 categories for JPEG, 3 categories for PNG, one for each tool
Categories • JPEG subcategories • Stego • Animals • Business • Maps • Natural • Tourist • Vacation • Clean • Misc
Clean Manipulated Images Object Appending Image Cropping Overlay
PNG Tools • Camaleão(http://www.ic.unicamp.br/~rocha/sci/stego) • Simple LSB insertion/modification software • Uses cyclic permutations and block ciphering to hide messages in LSBs • SecurEngine (http://www.sharewareplaza.com/SecurEngine-download_4268.html) • Incorporates 5 crypto algorithms: Blowfish, Gost, Vernam, Cast256, and Mars • LSB encoding
PNG Tools • Stash-It (http://www.smalleranimals.com/stash.htm) • Windows based stego tool • Simple LSB insertion/modification software • No encryption feature
JPEG Tools • F5 (http://www.inf.tu-dresden.de/~aw4) • Resilient to 2 statistical attack • Instead of replacing LSBs directly, F5 decreases the absolute value of the DCT coefficients • Chooses DCT coefficients randomly • Matrix embedding • JPHide (http://linux01.gwdg.de/~alatham) • Uses blowfish to generate a stream of pseudo-random control bits to define bit encodings • Large embeddings trivial to detect
JPEG Tools • JSteg (http://zooid.org/~paul/crypto/jsteg) • 40 bit RC4 Encryption • Channel capacity determination • LSB encoding in quantized DCT coefficients • Outguess (http://www.outguess.org/detection.php) • Preserves statistics based on frequency counts • Seed based iterator available to choose embedding locations • Change minimization calculation for each seed • Remains one of the most difficult tools to detect
PNG Data Set - Breakdown • Training 4,000 total images in the PNG clean category 4,731 total images in the PNG stego category
PNG Data Set - Breakdown • Testing 2,993 total images in the PNG stego category
JPEG Data Set - Breakdown • Training 29,185 total images in the JPEG stego category
JPEG Data Set - Breakdown • Training 29,185 total images in the JPEG stego category
JPEG Data Set - Breakdown • Testing 4,596 total images in the JPEG stego category
Sample Usage: stegdetect • JPEG Training Set Detected, C: correct algorithm detected Detected, I: incorrect algorithm detected Overall false detect rate for the clean image set is 8.6%
Sample Usage: stegdetect • JPEG Testing Set Overall false detect rate for the clean image set is 8.0%
Sample Usage: stegdetect • Detailed results for JPHide Test Set
Sample Usage: stegdetect • Conclusions • Significant differences between the results of training and testing • Weaker performance overall for testing • Designed difficulty of testing set • Stegdetect performs poorly for large embeddings (non-intuitive), as well as small and tiny embeddings (expected)
The Unseen Challenge Data Sets • Lossy (JPEG) and Lossless (PNG) imagery • 3 tools for PNG set, 4 tools for JPEG set • 4 distinct embedding sizes for PNG, varying sizes for JPEG • Clean imagery across all sets
The Unseen Challenge Data Sets • Valid approaches for use: • Detection • Detection and recovery (size or content) • Detection and destruction • Fusion No standard data set exists for steg evaluation! This set is a step in that direction!
Download! http://www.liv.ic.unicamp.br/wvu/datasets.php