Introduction

Introduction • Much of the information is in form of images • Images are handled by machines as a matrix of digital picture elements, or pixels • The appearance of an image depends on • image type • resolution Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Types of images & Resolution • bilevel (black & white) • e.g. faxes • grayscale • color • dot per inches (dpi) • 600 x 600 – actual medium quality laser printer • 1200 x 1200 – low cost phototypesetter • 4800 x 4800 – high resolution phototypesetter Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Bilevel images: CCITT fax standard • fax: facsimile • CCITTComité Consultatif International Téléphonique et Télégraphique, it is part of the ITUInternational Telecommunication Union, one of the specialized agencies of the United Nations • In the late 70s CCITT starts thinking about a standard for fax transmission • 1980 CCITT Group 3 standard • group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

CCITT Group 3 - I • It is the most common standard for fax transmission • It is accepted worldwide, almost every fax machine supports this standard • It uses compression algorithms for bilevel images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

1728 bits/line 1188 lines/page CCITT Group 3 - II • Paper size: international A4 (not US letter) • standard resolution 204x98 dpi (200x100) • high resolution 204x196 dpi (200x200) • bilevel image  1 bit/pixel • image size: 1728x1188 bits at standard resolution  about 2 Mbit • Transmission rate: 4.8 Kbit/s • today is usually higher, 14.4 – 33.6 Kbit/s • At 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms

Run-length coding • Each scan line is composed by sequences of pixel of the same color • Count the number of element of each run • Example 3w 4b 9w 2b 2w 6b 5w 2b 5w... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

G3 1D • Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes run lengths using a predefined Huffman code • In order to maintain black/white syncronization, each line begins with a white run, eventually of zero length Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

G3 1D 1000 011 10100 11 0111 0010 ... • predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

G3 1D • As one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengths • As shorter runs occur more frequently that longer runs, we code each run length in an additive form • there is a terminating and makeup codeword • Lengths form 0 to 63 are coded with a single terminating codeword • Longer runs are coded with one or more makeup codewords and a terminating codeword • Each line is terminated with a EOL symbol composed of eleven 0 and one 1 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

G3 2D • Group 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate) • Many images have a high degree of vertical coherence between consecutive lines • changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line

G3 2D • Nearby means within an interval of radius 3 pixels • If there are changing elements in the current line without correspondents in the reference line  switch to horizontal mode (1D) • On the opposite if the ref line has a run with no counterpart in the current line  special pass code Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

G3 2D reference line current line vertical mode horizontal mode pass code vertical mode ... -1 0 +2 -2 from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3 <mode | length of preceding white run | length of black run> 0001 generated code

G3 2D • Two dimensional coding is more prone to transmission errors • In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codeword • Here an error in the reference line is likely propagated in all the other lines • For this reason there are 1 reference line for each k lines (i.e. k-1 are coded w.r.t. each ref line) • standard resolution  k=2 • high resolution  k=4 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

CCITT fax standard compression performances • Standard resolution (~200x100 dpi) • G3 1D  0.13 bits/pixel 57s. for A4 at 4.8 Kbps • G3 2D (k=2)  0.11 bits/pixel 47s. for A4 at 4.8 Kbps • High resolution (~200x200 dpi) • G3 2D (k=4)  0.09 bits/pixel 74s. for A4 at 4.8 Kbps • Compression is very good for office image where run lengths are long • It would be very bad for bilevel natural images

Continuous-tone images: why lossless compression? • lossy compression is often preferred to have remarkably more compressed images, with good quality • However there are some situations in which using an approximation may not be adequate • medical images • historical documents • images with legal relevance Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Continuous-tone images: lossless compression • GIF standard • PNG standard • JPEG-LS • It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’ • extimation of pixel value using quite simple context: effective and low cost solution • www.hpl.hp.com/loco

GIF image format - I • Adopted by CompuServe to minimize the time required to download images over a modem link • The most widely used lossless image format until 1995 • 8-bit pixel description • 256 color images, but it is possible to use a color map Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

GIF image format - II • The color map can be specified for each image or can be omitted • if specified, it is included as an header into image file, in uncompressed form • color map is composed of 256 24-bit entries, that specify 256 RGB colors • Compression scheme used is LZW • Alphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information” code Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

GIF image format - III • Even if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color map • LZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing it • In 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZW • This catalyzed the development of a new lossless image format, designed for public domain and with the last improvements

PNG image format - I • Portable Network Graphics(pronounced “ping”) • it uses gzip compression scheme • through some improvements compression obtained is about 10-30% better than GIF • By default it encodes the pixels in raster scan order, but some other methods are available • it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixel • average difference, the difference with the average of above and next pixel • ... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

PNG image format - II • It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit color • GIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background • It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Continuous-tone images: why lossy compression? • Digital images are yet an approximation of the real analog phenomenon • lossy techniques allow to obtain very good compression with a modest lost of details • This is useful for storing and trasmitting images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Continuous-tone images: lossy compression • JPEG • JPEG2000 • a new image coding system that uses state-of-the-art compression techniques based on wavelet technology • file extension .jp2 • With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

JPEG format - I • JPEG is a standard defined by the Joint Photographic Experts Group in 1992 • It was conceived to transmit images at 64 Kbps • It has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard) • With lossy mode it allows to obtain very good quality at about 1 bit/pixel • Implementation complexity is reasonable Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

JPEG format - II • It could be used with graylevel and color images • Each channel of the color space (RGB, YUV...) is treated separately • it allows progressive transmission (that is much better suited for WWW than raster transmission) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Raster vs. progressive transmission

Quantization Discrete Cosine Transform Binary Encoder JPEG Coder - I Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

JPEG Coder - II • Image is divided in 8x8-pixel squares • Preprocessing • Apply Discrete Cosine Transform on each square • Coefficient quantization • Bit stream encoding Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Preprocessing: color space transformation & downsampling • from RGB into YUV • The Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturation • Human eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively • 4:4:4 no downsampling • 4:2:2 horizontal downsampling of a factor 2 • 4:2:0 both horizontal and vertical downsampling Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Discrete Cosine Transform - I • The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers • It is used in JPEG because it is fast and quite easy to implement efficiently Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Discrete Cosine Transform - II where • the block is pixels (in JPEG, 8x8) • A(i,j) is the value of pixel of position (i,j) • is the DCT coefficient of position • low values for corresponds to low vertical frequencies, low values for to low horizontal frequencies • Generally higher frequencies have very low values Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Discrete Cosine Transform - III • DCT function basis • each 8x8 square is reduced to 64 coefficient Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Discrete Cosine Transform - IV • Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the square • But • finite precision • quantization of the coefficients (always) • Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Quantization - I • The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factor • the factor for each component has been decided based on human sensitivity to changes at each frequency • In practice the matrix of factor is usually

Quantization - II • Next, all values are rounded to nearest integer • This leads to a quite high number of 0s in the high frequency zone, as factors are bigger Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Zig-zag scan • Low frequency coefficients are transmitted before higher frequency coefficients • This allows for progressive visualization of this 8x8 block Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Raster vs. progressive transmission • Raster transmission • DCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on • Progressive transmission • first all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Binary coding • DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value • For this reason it is convenient to encode the difference from the previous value • Tipically the bit stream is coded with Huffman • It is possible to use arithmetic scheme, gaining some compression at cost of decoding speed • Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

Dequantization Inverse DCT Binary Decoder JPEG Decoder Some values are lost! Good quality, but reconstruction is not exact Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

JPEG performances - I Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005

JPEG performances - II Original Quality factor 75 Quality factor 20 Quality factor 3

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction