140 likes | 301 Views
Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up. Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza University of Rome, Italy Luca Lombardi
E N D
Binary Image Compression via Monochromatic Pattern Substitution:A Sequential Speed-Up Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza University of Rome, Italy Luca Lombardi Computer Science Department University of Pavia, Italy
The Binary Image Compression Scheme The image is read by a raster scan. If the 4 x 4 subarray in position (i,j) is monochromatic then we compress the largest monochromatic rectangle in that position else the 4x4 subarray is left uncompressed. The positions covered by the detected monochromatic rectangles and non-monochromatic 4x4 sub-arrays are skipped in the linear scan of the image.
The Binary Image Compression Scheme The encoding scheme starts each coding item with a flag field indicating whether there is a monochromatic rectangle (0 for white, 10 for black) or raw data (11). If the flag field is 11 then 16 uncompressed bits follow else the width and the length of the monochromatic rectangle must be encoded. A variable length coding technique is used which has the same compression effectiveness of the block matching method (Storer and Helfgott [97],The Computer Journal) and it is suitable for implementations on large scale parallel systems.
The Coding Technique 12 bits are a practical upper bound to encode the width or the length of a rectangle. Either 12 or 8 or 4 bits are used to encode the width and the length, defining 9 classes of rectangles. Therefore, the flag fields 0 and 10 are followed by a second flag field indicating one of the nine classes. We can partition an image into a thousand blocks and apply compression via monochromatic pattern substitution to each block independently with no relevant loss of effectiveness.
Experimental Results on the Speed-Up In order to implement decompression on an array of processors with distributed memory and no interconnections, we indicate the end of the encoding of a block with 111 changing the flag field 11 to 110. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about twenty times faster than the sequential ones with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine (avogadro.cilea.it) on a test set of large (4096 x 4096 pixels) binary images.
Compression Efficiency The monochromatic pattern substitution technique requires O(Mlog M) time to compute a rectangle of size M and the worst-case sequential time is Ω(nlog M) for an image of size n. The technique has the same complexity of the block matching method but it is twice faster in practice. In conclusion, the monochromatic pattern substitution method is more scalable and more efficient.
Worst Case Running time ≈ M(1+1/2+1/3+…+1/M)= θ(Mlog M)
Waste Factor The waste factor is the average number of rectangles covering one pixel in the parsing of the image. On realistic data, it is conjectured the waste factor is always less than two. It follows that the sequential time of the monochromatic pattern substitution technique is O(nlog M). In practice, time is linear as for the square block matching technique. The waste factor decreases to 1 when the parallel procedure is applied to an image partitioned into up to 256 blocks.
Waste Factor Same behavior on the CCITT image test set (1728 x 2376 pixels) and the set of 4096 x 4096 pixels images.
Sequential Speed-up Averagerunning times on the 4096 x 4096 pixels images (cs.) Average running times on the CCITT set (cs.)
Speeding up Parallel Computation The CCITT times were obtained with a single core of a quadcore (CPU Intel Core 2 Quad Q9300-2.5GHz). Since the sequential speed-up is also obtained with smaller partitions, it can be applied to parallel computation as well. Using the full power of the quadcore, the compression and decompression times are lowered to 1.5 and 0.6 cs. respectively using partitions of 64 blocks.
Speeding up Parallel Computation The times of the 4096x4096 pixels images were obtained with a single core of the 256 processors machine avogadro.cilea.it. Using 16 processors, we lowered the compression and decompression times to 3 and 1 cs. respectively using partitions of 16 blocks (5 and 3 cs. are the times obtained for compression and decompression with no sequential speed-up). Generally speaking, such sequential speed-up can be applied to small scale parallel systems.
Future work As future work, we wish to implement compression and decompression via monochromatic pattern substitution on a graphical processing unit (GPU). With such device, we will have more cores available for experiments. If monochromatic pattern substitution can be realized on a general purpose GPU, it will be straightforward to experience the effects of the sequential speed-up presented here.