460 likes | 628 Views
Representing Strings. How much space do we need?Assume we represent every character. How many bits to represent each character?Depends on |?|. Bits to encode a character. Two character alphabet{A,B}one bit per character:0 = A, 1 = BFour character alphabet{A,B,C,D}two bits per character:00 =
E N D
1. Representation of Strings Background
Huffman Encoding
2. Representing Strings How much space do we need?
Assume we represent every character.
How many bits to represent each character?
Depends on |?|
3. Bits to encode a character Two character alphabet{A,B}
one bit per character:
0 = A, 1 = B
Four character alphabet{A,B,C,D}
two bits per character:
00 = A, 01 = B, 10 = C, 11 = D
Six character alphabet {A,B,C,D,E, F}
three bits per character:
000 = A, 001 = B, 010 = C, 011 = D, 100=E, 101 =F, 110 =unused, 111=unused
4. More generally The bit sequence representing a character is called the encoding of the character.
There are 2n different bit sequences of length n,
ceil(lg|?|) bits required to represent each character in ?
if we use the same number of bits for each character then length of encoding of a word is |w| * ceil(lg|?|)
5. Can we do better?? If ? is very small, might use run-length encoding
6. Taking a step back Why do we need compression?
rate of creation of image and video data
image data from digital camera
today 1k by 1.5 k is common = 1.5 mbytes
need 2k by 3k to equal 35mm slide = 6 mbytes
video at even low resolution of
512 by 512 and 3 bytes per pixel, 30 frames/second
7. Compression basics video data rate
23.6 mbytes/second
2 hours of video = 169 gigabytes
mpeg-1 compresses
23.6 mbytesdown to 187 kbytes per second
169 gigabytes down to 1.3 gigabytes
compression is essential for both storage and transmission of data
8. Compression basics compression is very widely used
jpeg, gif for single images
mpeg1, 2, 3, 4 for video sequence
zip for computer data
mp3 for sound
based on two fundamental principles
spatial coherence and temporal coherence
similarity with spatial neighbor
similarity with temporal neighbor
9. Basics of compression character = basic data unit in the input stream -- represents byte, bit, etc.
strings = sequences of characters
encoding = compression
decoding = decompression
codeword = data elements used to represent input characters or character strings
codetable = list of codewords
10. Codeword encoding/compression takes characters/strings as input and uses codetable to decide on which codewords to produce decoder/decompressor takes codewords as input and uses same codetable to decide on which characters/strings to produce