200 likes | 358 Views
Lossless Image Compression by Block Matching on Practical Massively Parallel Architectures Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY. Lossless image compression by block matching is an extension of the LZ1 method to bi-level images.
E N D
Lossless Image Compression by Block Matchingon Practical Massively Parallel ArchitecturesLuigi Cinque and Sergio De Agostino Computer Science DepartmentSapienza University of RomeITALY
Lossless image compression by block matching is an extension of the LZ1 method to bi-level images. A square greedy matching heuristic using a simple perfect hashing scheme providesa linear time implementation . Storer [1996], Proc.IEEE Data Compression Conf. A slower rectangle greedy matching technique requires O(Mlog M) time to compute a match (M is the size of a match) and the worst-case sequential time is Ω(nlog M) for an image of size n. Storer and Helfgott [1997],The Computer Journal The image is scanned in some linear order and the window is unrestricted.
The Compression Scheme The image is read by a raster scan and the matching algorithm works with a perfect hashing table with one position for each possible 4x4 subarray. All-zero and all-one rectangles are handled differently. The encoding scheme starts each pointer with a flag field indicating whether there is a monochromatic rectangle (0 for white, 10 for black), a match (110) or raw data (111).
If the 4 x 4 subarray in position (i,j) is monochromatic, then we compute the largest monochromatic rectangle in that position and encode the width and the length.Otherwise we compute the largest rectangular match in the position provided by the hash table, encode its position, width and length, and update the table with the current position.If the subarray is not hashed, then it is left uncompressed and added to the hash table with its current position.The positions covered by matches are skipped in the linear scan of the image.
Worst Case Running time ≈ M(1+1/2+1/3+…+1/M)= θ(Mlog M)
The Parallel Encoder A work-optimal parallel encoder requiring O(logM log n) time on the PRAM EREW can be implemented with the same parallel complexity on the mesh of trees. The parallel encoder partitions an m x m' image in w x l rectangular areas Ai,j for 1≤ i ≤┌m/w┐ and 1≤ j ≤┌m’/l ┐, where w and l are θ(log1/2 m x m'), and applies the sequential parsing algorithm to each area. Larger monochromatic rectangles are computed by merging adjacent monochromatic areas.
The encoder outputs the sequence of pointers in the order produced by the raster scan of each of the areas where the areas are ordered by a raster scan as well. A pointer coding a monochromatic rectangle is associated with the area containing the left upper corner. The end of the sequence of pointers corresponding to a given area is indicated with 1111 followed by the next area index (the flag field 111 is changed to 1110).
On the mesh of trees, the encoder is implemented with the same complexity
The Parallel Decoder The parallel decoder requires O(logM log n) time and O(n/log n) processors on the PRAM EREW and the mesh of trees and it has three phases: Phase 1: For each area Ai,j identify the corresponding sequence of pointers, if any. Phase 2: Decode the pointers corresponding to proper matches and raw data. Phase 3: Identify the monochromatic rectangles to complete the decoding.
Phase 1 Parsing the binary sequence encoding the image into pointers can be reduced to finding a path from a leaf to a root of a doubly linked forest. The path can be found by pointer jumping in O(log n) time and O(n/log n) processors on the PRAM EREW by means of the Euler tour technique. We obtain the pointers for each area by identifying on the path the positions corresponding to 1111.
Phase 2 For each area Ai,j one processor decodes the corresponding sequence of pointers, if any. For each monochromatic rectangle, the left upper portion corresponding to a given area Ai,j is decoded. i and j are odd values.
Phase 3 Step 1: Left upper monochromatic area A2i-1,j is copied on A2i,j if it is monochromatic and has the same color (information provided by the pointer); the same is done horizontally. → ↓ →
Step k: Areas A(i-1)2^(k-1)+1,j … Ai2^(k-1),j, with i odd, are copied respectively on Ai2^(k-1)+1,j … A(i+1)2^(k-1),j if monochromatic with the same color (similarly on the vertical boundaries). ↓ →
Copying one area takes O(log n) time so the parallel decoder requires O(log M log n) time on the PRAM EREW with O(n/log n) processors.
Decoding on the Mesh of Trees The first phase of the PRAM EREW algorithm corresponds to the input process and has standard solutions for a distributed memory system as the mesh of trees. Phases 2 and 3 have the same complexity on the mesh of trees as on the PRAM EREW.
The Pyramid ImplementationAn N x N pyramid network with N = 4
Two assumptions needed to obtain an O(log M log n) time encoder/decoder with O(n/log n) processors on the pyramid when the PRAM EREW procedure is implemented are: the number of monochromatic matches with length or width ≥ 2k┌log1/2n┐ is O(N2/22k) for 1 ≤ k ≤ log N – 1. each pixel is covered by a small constant number of monochromatic matches. It follows that the amount of information each processor at level k must broacast is constant for 1 ≤ k ≤ log N – 1.
Copying one area takes O(log n) time so the parallel encoder/decoder requires O(log M log n) time on the pyramid with O(n/log n) processors.