350 likes | 368 Views
Week 4: Data Representation: PART II READING: Chapter 3. EECS 1520 -- Computer Use: Fundamentals. Representing Real Numbers. Real numbers have a whole part and a fractional part. Example of real numbers in base 10: 12.05, 35.75, ….
E N D
Week 4: Data Representation: PART II • READING: Chapter 3
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • Real numbers have a whole part and a fractional part • Example of real numbers in base 10: 12.05, 35.75, …. • We have learned how to convert the whole part to binary digits, the question now is, how do convert the fractional part to binary numbers?
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • Example: 12.45 = 1 * 101+ 2 * 100+ 4 * 10-1 + 5 * 10-2 6.451 = 6 * 100+ 4 * 10-1 + 5 * 10-2 + 1 * 10-3 • The positions to the right of the decimal point work the same way, except that the powers are negative.
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • Same rules apply to base 2 • Example: What is the decimal representation of the binary number 01010.110? 1 * 23 + 1 * 20+ 1 * 2-1 + 1 * 2-2 = 8 + 2 + 0.5 + 0.25 = 10.75
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • How do we convert a real number in base 10 to its binary representation? • Example: What is the binary representation of the decimal number 24.25? 2 steps: 1. find the binary number of the integer part 2. find the binary number of the fractional part In step 2, we multiply by the new base, the fractional part of the result is then multiplied by the new base. The process continues until the fractional part of the result is zero
EECS 1520 -- Computer Use: Fundamentals Number systems: Two’s Complement Step 1: convert 24 to binary: QR 24/2 12 0 12/2 6 0 6/2 3 0 3/2 1 1 1/2 0 1 11000 Step 2: convert 0.25 to binary 0.25*2 0.5 0.5*2 1.0 01 24.25 in binary is: 11000.01
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • What is the binary representation of the decimal number 0.4? 0.4*2 0.8 0.8*2 1.6 0.6*2 1.2 0.2*2 0.4 0.4*2 0.8 0.8*2 1.6 . . . . . . Starts repeating • We can say the binary representation of 0.4 is: 0.011001….. • The more bits that you get, the closer you will get to 0.4
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • Adding two unsigned binary numbers with the fractional part: • Example: 6.375 3.25 9.625
EECS 1520 -- Computer Use: Fundamentals Representing Real Numbers • Subtracting two unsigned binary numbers with the fractional part: • Example: 5.625 3.25 2.375 • We can look at the corresponding decimal representation to confirm the final results.
EECS 1520 -- Computer Use: Fundamentals Representing Text • English language includes 26 letters • Uppercase and lowercase letter have to be treated separately • so 52 unique characters will be required • Punctuation characters (i.e. ; “” ! ,) • Numeric digits (i.e. actual character ‘0’, ‘1’,… ‘9’) Etc… Two character sets: • 1. ASCII Character Set • 2. The Unicode Character Set
EECS 1520 -- Computer Use: Fundamentals ASCII Character Set ASCII (American Standard Code for Information Interchange): • Each character is coded as 1 byte (8 bits) • The codes are expressed as decimal numbers, these values are translated to their binary equivalent for storage in the computer.
EECS 1520 -- Computer Use: Fundamentals ASCII Character Set ASCII (American Standard Code for Information Interchange): • Each character is coded as 1 byte (8 bits) • For example, the character “5” is represented as ASCII value 53 8-bit binary string: 0011 0101
EECS 1520 -- Computer Use: Fundamentals ASCII Character Set • Example: • Given that the ASCII code for B is 66, expressed as a decimal vale, what is the ASCII code, in hexadecimal, for the letter G? • Characters in the ASCII table are arranged in alphabetical order, hence: • If the character “B” is 66 in decimal, ….”G” will be 71 in decimal 71 in decimal is 47 in hexadecimal
EECS 1520 -- Computer Use: Fundamentals Unicode Character Set • ASCII character set provides 256 characters (i.e. 8 bits) • Onlyenough to cover the characters in English • Unicode was designed to be a superset of ASCII, that is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set • 16-bit standard • 65,536 possible codes (= 216)
EECS 1520 -- Computer Use: Fundamentals Representing Text • It is important to find ways to store and transmit text efficiently Data compression: is a reduction in the amount of space needed to store a piece of data Compression ratio: is the size of the compressed data divided by the size of the original data
EECS 1520 -- Computer Use: Fundamentals Representing Text • It is important to find ways to store and transmit text efficiently • 3 types of text compression: • Keyword encoding • Run-length encoding • Huffman encoding
EECS 1520 -- Computer Use: Fundamentals Representing Text: Keyword Encoding • Idea: substitute a frequently used word with a single character • Example: • “ To do well on the test” 22 characters (including space) • “ To do % on - test” 17 characters (including space) Compression ratio = 17/22 = 0.773
EECS 1520 -- Computer Use: Fundamentals Representing Text: Keyword Encoding • Idea: substitute a frequently used word with a single character Limitations: - these characters can't be part of the text - frequently used words tend to be short, so not much compression - word variations not handled: The vs. the
EECS 1520 -- Computer Use: Fundamentals Representing Text: Run-Length Encoding • Also called recurrence coding • A single character may be repeated over and over again in a long sentence. This type of repetition doesn’t generally take place in English text, but can occur in large data streams • Idea: replace long series of a repeated character with a count of the repetition • A sequence of repeated characters is replaced by: • a “flag character” • Followed by the repeated character • Followed by a single digit that indicates how many times the character is repeated • Example: AAAAAAA can be replaced with *A7
EECS 1520 -- Computer Use: Fundamentals Representing Text: Run-Length Encoding • Example: replace AAAAAAA with *A7 • The character “A” is represented as ASCII value 65 • So replace: 01000001 01000001 01000001 01000001 01000001 01000001 01000001 • With: 00101010 01000001 00000111 A A A A A A A * A 7
EECS 1520 -- Computer Use: Fundamentals Representing Text: Run-Length Encoding • Idea: substitute a frequently used word with a single character • Limitations: • It’s not worth it to encode strings of two or three (i.e. it takes 3 characters to encode a repetition sequence) • In the case of two repeated characters, encoding would actually make the string longer
EECS 1520 -- Computer Use: Fundamentals Representing Text: HuffmanEncoding • Idea: Using variable-length bit strings to represent each character • The approach is to: Use only a few bits to represent characters that appear often and Use longer bit strings for character that don’t appear often • Example: • The word “BELL” would be: 1010 01 100 100 • Only 12 bits are required • Compared to the fixed-size bit string, for example, if 8 bits are required to represent each character, it would need 32 bits • A compression ratio (vs ASCII) of 12/32 = 0.375!
EECS 1520 -- Computer Use: Fundamentals Representing Text: HuffmanEncoding • Idea: Using variable-length bit strings to represent each character • Example: • How about decoding the bit string with Huffman Encoding? • For example, can we decode 0100111? • “0100111” would be decoded into the word “EAR”
EECS 1520 -- Computer Use: Fundamentals Representing Audio Data • A stereo sends an electrical signal to a speaker to produce sound, this signal is an analog representation of the sound wave. • The magnitude of the voltage signal varies in direct proportion to the sound wave magnitude
EECS 1520 -- Computer Use: Fundamentals Representing Audio Data • To represent audio data on a computer, we must digitize the sound wave data. • That is, we take the electric signal that represents the sound wave and represent it as a series of discrete numeric values Analog signal Sampling period To digitize the signal, we periodically measure the voltage of the signal, a process called “Sampling” Higher sampling rate produces better-quality sound
EECS 1520 -- Computer Use: Fundamentals Representing Audio Data • A sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction Part of the data is lost Low sampling rate High sampling rate
EECS 1520 -- Computer Use: Fundamentals Representing Audio Data • Popular audio data formats: WAV, AU, MP3 • All use some form of compression • MP3 offers the strongest compression ratio than other formats MP3 • MP3 is short for MPEG-2, audio layer 3 file • Uses both lossy and lossless compression • Lossless: bit stream compressed by a form of Huffman encoding • Lossy: uses mathematical models of human psychoacoustics to discard information the human ear can’t hear
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • Representing Color: • 3 basic colors: red, green, and blue • Color is often expressed as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colors • If each number in the triple is given on a scale of 0 to 255, 0 means no contribution of that color and 255 means full contribution
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • The amount of data that is used to represent a color is called the “color depth” • For example, TrueColor indicates a 24-bit color depth. With this scheme, each number in an RGB value gets 8 bits, which gives the range of 0 to 255 for each • This results in the ability to represent more than 16.7million unique colors!
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • For example, (255,255,0) means no contribution from “blue”, and this corresponds to bright yellow” “TrueColor” RGB values Three-dimensional color space
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • A photograph is an analog representation of an image, it is continuous across its surface • Digitizing a picture means representing the picture as a collection of individual dots called “pixels” • Each pixel is composed of a single color • The number of pixels used to represent a picture is called the “resolution”
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics Digitized Images and Graphics: • High resolution image (i.e. more pixels) • Low resolution image • (i.e. less pixels)
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • The storage of image information on a pixel-by-pixel basis is called a raster-graphics format. • Several popular raster-graphics file formats are: bitmap (BMP), JPEG
EECS 1520 -- Computer Use: Fundamentals Representing Images and Graphics • Another technique for representing images is called vector graphics. • Instead of assigning colors to pixels, vector graphics format describes an image in terms of lines and geometric shapes • For example, a vector graphics can be a series of commands that describe a line’s direction, thickness and color. • Suitable for line art and cartoon-style drawing
EECS 1520 -- Computer Use: Fundamentals Representing Videos • Video (film) is a stream of images/frames (at 24 or 30 fps [frame per second]) • A video codec (COmpressor/DECompressor) refers to the methods used to shrink the size of a movie Two types of compression in video codec: • Temporal compression: keyframe + series of delta frames that record only changes from keyframe (good if image changes little, i.e. such as a scene that has little movement) • Spatial compression: removes redundant info within a frame (essentially jpeg like compression on each frame). This compression often groups pixels into blocks that have the same color (for example, a portion of a clear blue sky). Instead of storing each pixel, the color and the coordinates of the area are stored.