Effect of Linearization on Normalized Compression Distance

Effect of Linearization on Normalized Compression Distance Jonathan Mortensen Julia Wu DePaul University July 2009

Introduction • Kolmogorov Complexity is an emerging similarity metric • Transformation Distance • Universal Similarity Measure • Does not require feature identification and selection • How can it be applied to images? • CBIR, Classification • Investigate its effectiveness • Discovered some fundamentals have been overlooked thus far

Outline • Background • Kolmogorov Complexity and Complearn • Research Topics • Spatial Transformations • Intensity Transformations • Image Groupings • Conclusion • Future Work

Background • Li (2004): successful clustering of phylogeny trees, music, text files • 1D to 2D data? • Tran (2007): NCD not a good predictor of visual indistinguishability • Only one photograph used, one type of linearization (row-by-row) • Gondra (2008): CBIR using NCD produced statistically significant measures against H0 of random retrieval and other similarity measures • Test set of hundreds of images, inconsistent methods of compression and concatenation, linearization unclear

K(x) – The length of the shortest program or string x* to produce x K(x|y) - The shortest binary string to convert output x given input y E(x,y)=max{K(x|y),K(y|x)} Normalized Information Distance: Kolmogorov Complexity

Kolmogorov Complexity • Universal, in that it captures all other semi-computable normalized distance measures • Therefore also semi-computable • Compression losslessly simplifies strings, and therefore is used as an approximation, C(x) “The human brain is incapable of creating anything which is really complex.”--Kolmogorov, A.N., Statistical Science, 6, p314, 1990

CompLearn • Open Source package which implements K-Complexity • Developed by Rudi Cilibrasi, Anna Lissa Cruz, Steven de Rooij, and Maarten Keijzer • Uses basic linux compression tools to develop the comparison map

Images from “Google Similar Images”

Initial Questions • Linearization Methods and Alternatives • How to Preserve a 2D signal • Linearization’s affect NCD on spatial transformations and intensity shifts • Do additional feature images lower NCD? • CBIR: Can K-Complexity be used with feature vectors or image semantics

Spatial Transformations • Applied 4 types of linearization to 800 images (original and 7 transformations) • Found that each linearization type produced distinctly different NCDs • Certain linearizations result in lower NCDs for certain transformations

Linearization Methods Row Major Column Major Hilbert-Peano SPC: Images transformed to 128x128 SCPO: Images transformed to 35% of original size

Spatial Transformations Original Image Down Shift Left Shift 180 rotation 90 rotation 270 rotation Reflection Y Axis Reflection X Axis

Intensity Transformations • Additive Constant • Three types of noise • Gaussian • Speckle • Salt and Pepper • Least Significant Bit (LSB) Steganography • Contrast Windowing

Additive Constant Image 937.jpg +32 and +64 respectively • P = Intensity + Constant • +4, +8, +12… +100 • 16 bit • 255 (+4)-> 259 • Truncation • 255 (+4)-> 255 • Wrap • 255 (+4)-> 4

Additive Constant

Various Noise • Gaussian (Statistical) • Speckle (Multiplicative) • Salt and Pepper (Drop-off) 0.32 and 0.64 Variance/Noise Density Respectively

Noise Cont: • Gaussian and Speckle Noise don’t compress well • Gaussian and Salt Pepper experience some posterior decay

Least Significant Bit Steganography • Hide4PGP • “Scrambles” message • Changes pixel bit to most similar color with opposite bit assignment • Spreads secret data over entire file • True Grayscale: Changes two bits per pixel Image with No Text Image hiding “Gettysburg Address”

LSB Steganography

Hamming Distance

Contrast Windowing • Computed Tomography image enhancement that increases contrast in certain structures • Brief Medical Exploration

Contrast Windowing Lung Window (-200 HU, width 2000 HU) Bone Window (300 HU, width 1500 HU) Patient 5: Original Image top left Soft Tissue Window (50 HU, width 350 HU)

P1 P3 P5

Cross Dicom Comparison

Conclusion: "How Many" vs "How Little" • NCD for Ordinal Comparisons • Numerical Redundancy Selective Entire Picture Gaussian Speckle Noise Salt and Pepper Noise Steganography Additive Constants Contrast Windowing Larger NCD SmallerNCD

Feature Image Comparison and Grouping • Feature Image: Pixel based values derived from the original image • 3 Main Types of Linearization • Avg NCD inter > Avg NCD intra • The greater inter - intra, the better NCD finds groupings

Feature Image Linearization • Image-At-Once – row-order one feature image at a time • Row Concatenation – Appends all images, then performs row-order linearization • Pixel Order – Selects value from same pixel of each feature image in row-order fashion • Gray Row-Major – Grayscales an image and follows row-order on intensities

Data Set and Methods • Corel Image Database with 10 predefined groupings • Linearized by 5 methods • NCDs were found within a group and then to the left and to the right

Results • Nearly every linearization produced statistically different NCDs • Intra Group was always less than Inter Group • Gray provided the greatest difference Inter-Intra • Thought this was due to filesize • Triple Concat’ed Gray creating equal filesize: Found an even greater difference

Conclusion • NCD is a good model for predefined human groupings and linearization has little impact on this • Gray-Triple Row-Major may be the best form of linearization • Direction of concatenation does not matter • Defined a methodology for any number of feature images

Conclusion • Compressor Errors • Numerical Redundancy • Ordinal Variables vs Nominal Variables • EX: 195 195 195 195 <=> 198 198 198 198 • NCD = 0.100000 • 199 199 199 199 <=> 202 202 202 202 • NCD = 0.128205 • NCD needs refinement • 2D image as a 1D string?

Future Work • Image Scaling and Normalization • Additional Feature Images • New Forms of Image concatenation • Investigate Compressors (Numeric?)

References • A. Itani and D. Manohar. Self-Describing Context-Based pixel ordering. Lecture notes in computer science, pages 124{134, 2002. • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004. • R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000. • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/. • R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003. • N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007. • I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.

Questions

Effect of Linearization on Normalized Compression Distance

Effect of Linearization on Normalized Compression Distance

Presentation Transcript

Effect of Linearization on Normalized Compression Distance

Appendix D: Linearization

A Compression Based Distance Measure for Texture

Feedback(Exact) Linearization

Biased Normalized Cuts

The Effect Highways Have on Invertebrates: Does Distance Matter?

The Effect of Distance on the Rate of Blinking

The Effect Highways Have on Invertebrates: Does Distance Matter?

On Lossy Compression

Multivariable Linearization

The Effect of Tire Pressure on Stopping Distance

Effect of the TLEs-receiver distance on the signal perturbations properties

“distance-Doppler” effect and applications

Normalized Cuts Demo

The Effect Highways Have on Invertebrates: Does Distance Matter?

On the Effect of Trajectory Compression in Spatio-temporal Querying

The Effect of…on…..

Normalized expression values

Normalized Polish Expression

Evaluation of Compression Lines and Aging Effect on Clayey Soils

On Lossy Compression

Effect of Specular Focal Distance on Endothelial Cell Counting Accuracy