280 likes | 623 Views
Compression Algorithm. Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam. Course Outlines. Introduction. Introduction. D ata compression involves encoding information using fewer bits than the original representation would use.
E N D
Compression Algorithm Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam
Course Outlines • Introduction
Introduction • Data compressioninvolves encoding information using fewer bits than the original representation would use. • Compression is useful to reduce the consumption of hard disk space or transmission bandwidth. • On the downside, compressed data must be decompressed to be used, and itmay be detrimental to some applications.
Introduction • For instance, a compression of video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed • The option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video
Introduction • The data compression schemes involves the degree of compression, the amount of distortion, and the computational resources required to compress and uncompress the data • Compression was one of the main drivers for the growth of information during the past two decades. • There are two compression concept, lossy and lossless compression
Lossless Data Compression • Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. • The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange for better compression rates.
Lossless Data Compression • Lossless data compression is used in many applications,such as ZIP and gzip • It is also used as a component within lossy data compression technologies
Lossless Data Compression • Lossless compression is used if it is important that the original and the decompressed data be identical • Some image file formats, like PNG or GIF, use only lossless compression, while others like TIFF and MNG may use either lossless or lossy methods • Lossless audio formats are most often used for archiving or production purposes
Lossless Compression Techniques • Most lossless compression programs do two things in sequence, first generate a statistical model for the input data, and second use this model to map input data to bit sequences in such a way that frequently encountered data will produce shorter output than "improbable" data. • The algorithms used to produce bit sequences are Huffman coding and arithmetic coding
Lossless Compression Techniques • Arithmetic coding achieves compression rates close to the best possible for a particular statistical mode • Huffman compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1
Lossless Compression Techniques • There are two primary ways of constructing statistical models: static model and adaptive model • In a static model, the data is analyzed and a model is constructed, then this model is stored with the compressed data
Lossless Compression Techniques • This approach is simple and modular, but has the disadvantage that the model itself can be expensive to store, and also that it forces a single model to be used for all data being compressed, and so performs poorly on files containing heterogeneous data
Lossless Compression Techniques • Adaptive models dynamically update the model as the data is compressed. Both the encoder and decoder begin with a trivial model, yielding poor compression of initial data, but as they learn more about the data, performance improves. • Most popular types of compression used in practice now use adaptive coders.
Lossy Compression • Lossy compression is a data encoding method that compresses data by discarding (losing) some of it • The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer. • Typically, a substantial amount of data can be discarded before the result is sufficiently degraded to be noticed by the user.
Lossy Compression • Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony • By contrast, lossless compression is required for text and data files, such as bank records and text articles
Run Length Encoding • Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run • This is most useful on data that contains many such runs, example simple graphic images such as icons, line drawings, and animations • It is not useful with files that don't have many runs as it could greatly increase the file size.
Run Length Encoding • For example, a screen with black text on a solid white background. There are black pixel for text and white pixel • B for black pixel and W for white WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW • The RLE converts to 12W1B12W3B24W1B14W • The run-length code represents the original 67 characters in only 18.
Run Length Encoding • Run-length encoding is lossless data compression and is well suited to palette-based iconic images • It does not work well at all on continuous-tone images such as photographs
Fix Length Encoding • Original text: ADA ATE APPLE • There are 7 symbols, A, D, E, L, P, T and space with frequency: 4 As, 2 Ps, 2 Es, 2 spaces, 1 D, 1 T and 1 L • The symbols are presented by 3 bits: A: 000 D: 001 E: 010 L: 011 P: 100 T: 101 Space: 110 • Encoded text needs 39 bits (compared to original text 104 bits)
Variable Length Encoding • Original text: ADA ATE APPLE • There are 7 symbols, A, D, E, L, P, T and space with frequency: 4 As, 2 Ps, 2 Es, 2 spaces, 1 D, 1 T and 1 L • The symbols are presented depending on frequency A : 0 P: 10 E: 110 Space: 1110 D: 11110 T: 111110 L: 111111
Variable Length Encoding • The Preffix Property • Encoded text needs 4+4+6+8+5+6+6 bits (39 bits) • In general variable length encoding is better the fix length encoding • Deencoding is done with tree structure
Huffman Coding • Variable length coding • Tree structure is built bottom up • Level paling bawah terdiri dari simbol dengan kemunculan paling sedikit