1 / 41

Effect of Linearization on Normalized Compression Distance

Effect of Linearization on Normalized Compression Distance. Jonathan Mortensen Julia Wu DePaul University July 2009. Introduction. Kolmogorov Complexity is an emerging similarity metric Transformation Distance Universal Similarity Measure

kele
Download Presentation

Effect of Linearization on Normalized Compression Distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effect of Linearization on Normalized Compression Distance Jonathan Mortensen Julia Wu DePaul University July 2009

  2. Introduction • Kolmogorov Complexity is an emerging similarity metric • Transformation Distance • Universal Similarity Measure • Does not require feature identification and selection • How can it be applied to images? • CBIR, Classification • Investigate its effectiveness • Discovered some fundamentals have been overlooked thus far

  3. Outline • Background • Kolmogorov Complexity and Complearn • Research Topics • Spatial Transformations • Intensity Transformations • Image Groupings • Conclusion • Future Work

  4. Background • Li (2004): successful clustering of phylogeny trees, music, text files • 1D to 2D data? • Tran (2007): NCD not a good predictor of visual indistinguishability • Only one photograph used, one type of linearization (row-by-row) • Gondra (2008): CBIR using NCD produced statistically significant measures against H0 of random retrieval and other similarity measures • Test set of hundreds of images, inconsistent methods of compression and concatenation, linearization unclear

  5. K(x) – The length of the shortest program or string x* to produce x K(x|y) - The shortest binary string to convert output x given input y E(x,y)=max{K(x|y),K(y|x)} Normalized Information Distance: Kolmogorov Complexity

  6. Kolmogorov Complexity • Universal, in that it captures all other semi-computable normalized distance measures • Therefore also semi-computable • Compression losslessly simplifies strings, and therefore is used as an approximation, C(x) “The human brain is incapable of creating anything which is really complex.”--Kolmogorov,  A.N., Statistical Science, 6, p314, 1990

  7. CompLearn • Open Source package which implements K-Complexity • Developed by Rudi Cilibrasi, Anna Lissa Cruz, Steven de Rooij, and Maarten Keijzer • Uses basic linux compression tools to develop the comparison map

  8. Images from “Google Similar Images”

  9. Initial Questions • Linearization Methods and Alternatives • How to Preserve a 2D signal • Linearization’s affect NCD on spatial transformations and intensity shifts • Do additional feature images lower NCD? • CBIR: Can K-Complexity be used with feature vectors or image semantics

  10. Spatial Transformations • Applied 4 types of linearization to 800 images (original and 7 transformations) • Found that each linearization type produced distinctly different NCDs • Certain linearizations result in lower NCDs for certain transformations

  11. Linearization Methods Row Major Column Major Hilbert-Peano SPC: Images transformed to 128x128 SCPO: Images transformed to 35% of original size

  12. Spatial Transformations Original Image Down Shift Left Shift 180 rotation 90 rotation 270 rotation Reflection Y Axis Reflection X Axis

  13. Intensity Transformations • Additive Constant • Three types of noise • Gaussian • Speckle • Salt and Pepper • Least Significant Bit (LSB) Steganography • Contrast Windowing

  14. Additive Constant Image 937.jpg +32 and +64 respectively • P = Intensity + Constant • +4, +8, +12… +100 • 16 bit • 255 (+4)-> 259 • Truncation • 255 (+4)-> 255 • Wrap • 255 (+4)-> 4

  15. Additive Constant

  16. Various Noise • Gaussian (Statistical) • Speckle (Multiplicative) • Salt and Pepper (Drop-off) 0.32 and 0.64 Variance/Noise Density Respectively

  17. Noise Cont: • Gaussian and Speckle Noise don’t compress well • Gaussian and Salt Pepper experience some posterior decay

  18. Least Significant Bit Steganography • Hide4PGP • “Scrambles” message • Changes pixel bit to most similar color with opposite bit assignment • Spreads secret data over entire file • True Grayscale: Changes two bits per pixel Image with No Text Image hiding “Gettysburg Address”

  19. LSB Steganography

  20. Hamming Distance

  21. Contrast Windowing • Computed Tomography image enhancement that increases contrast in certain structures • Brief Medical Exploration

  22. Contrast Windowing Lung Window (-200 HU, width 2000 HU) Bone Window (300 HU, width 1500 HU) Patient 5: Original Image top left Soft Tissue Window (50 HU, width 350 HU)

  23. P1 P3 P5

  24. Cross Dicom Comparison

  25. Conclusion: "How Many" vs "How Little" • NCD for Ordinal Comparisons • Numerical Redundancy Selective Entire Picture Gaussian Speckle Noise Salt and Pepper Noise Steganography Additive Constants Contrast Windowing Larger NCD SmallerNCD

  26. Feature Image Comparison and Grouping • Feature Image: Pixel based values derived from the original image • 3 Main Types of Linearization • Avg NCD inter > Avg NCD intra • The greater inter - intra, the better NCD finds groupings

  27. Feature Image Linearization • Image-At-Once – row-order one feature image at a time • Row Concatenation – Appends all images, then performs row-order linearization • Pixel Order – Selects value from same pixel of each feature image in row-order fashion • Gray Row-Major – Grayscales an image and follows row-order on intensities

  28. Data Set and Methods • Corel Image Database with 10 predefined groupings • Linearized by 5 methods • NCDs were found within a group and then to the left and to the right

  29. Results • Nearly every linearization produced statistically different NCDs • Intra Group was always less than Inter Group • Gray provided the greatest difference Inter-Intra • Thought this was due to filesize • Triple Concat’ed Gray creating equal filesize: Found an even greater difference

  30. Conclusion • NCD is a good model for predefined human groupings and linearization has little impact on this • Gray-Triple Row-Major may be the best form of linearization • Direction of concatenation does not matter • Defined a methodology for any number of feature images

  31. Conclusion • Compressor Errors • Numerical Redundancy • Ordinal Variables vs Nominal Variables • EX: 195 195 195 195 <=> 198 198 198 198 • NCD = 0.100000 • 199 199 199 199 <=> 202 202 202 202 • NCD = 0.128205 • NCD needs refinement • 2D image as a 1D string?

  32. Future Work • Image Scaling and Normalization • Additional Feature Images • New Forms of Image concatenation • Investigate Compressors (Numeric?)

  33. References • A. Itani and D. Manohar. Self-Describing Context-Based pixel ordering. Lecture notes in computer science, pages 124{134, 2002. • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004. • R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000. • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/. • R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003. • N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007. • I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.

  34. Questions

More Related