500 likes | 572 Views
I Need More Data. Too much information running through my brain. What is 'data'?. 3.14. Pi. Data : the quantities, characters, or symbols on which operations are performed by computers and which may be stored or transmitted in the form of electrical signals.
E N D
I Need More Data Too much information running through my brain.
What is 'data'? 3.14 Pi • Data: the quantities, characters, or symbols on which operations are performed by computers and which may be stored or transmitted in the form of electrical signals. • Information: that which is obtained by the processing of data • Knowledge comes from careful investigation of information. • Information is represented/encoded as data. • What information is represented by an abacus? How? • What information is represented on a DVD? How? • What information is encoded on a credit card? How?
How can real-world information become data? • How can a picture or a sound or a temperature reading become data? • Data comes in two types: • Continuous: infinitely variable points • Discrete: finite number of points/choices • How much could an orange weigh? • 200 grams? • 229.3 grams? • 229.31533 grams? • 229.31533480185993 grams?
Continuous or Discrete? How long might it take a light bulb to burn out? What was your ACT score? How tall are you? How many books did you read this year? How much water did you drink this week? How many gen-ed courses have you taken at UW-L?
Continuous/Discrete • In electronics, signals are known as either • analog (meaning a continuous signal) • digital (meaning a discrete signal)
Why are computers digital? • Information needs to be encoded in such a way as to be processed. • Electrical signals can be processed. • Even analog signals can be processed, but digital is simpler. • In computers, there are two discrete (digital) signals: on and off. It's easy to tell if an electrical signal is on or off: • Electric fence • Electric socket • Light bulb
On or Off http://www.flickr.com/photos/tudor/31803307/sizes/o/in/photostream/ http://www.flickr.com/photos/my-other-eye/5300224495/sizes/z/in/photostream/
What is a bit? 1 0 • Bit: short for "binary digit". A bit is the representation used for the smallest (atomic) amount of computer data. • A bit is either ON or OFF. • You can think of a bit as an extremely small battery that can be quickly charged and discharged. When charged, the bit is ON. When discharged, the bit is OFF. This is essentially what a single transistor is. • Mathematically speaking, a bit is usually understood as the value 0 when OFF and the value 1 when ON. • Since there are only two values, a bit is known as a 'binary' digit.
Bit Patterns 0 1 0 1 0 1 0 1 A bit-string is a sequence of bits. How many different patterns could there be in a bit-string of length 2?
Bit Patterns 0 0 0 0 1 0 0 1 1 0 0 1 How many different patterns could there be in a bit-string of length 3?
Bit Patterns • How many patterns could there be in a bit-string of length 4? • How many patterns could there be in a bit-string of length N? • With more bits you can store more information. • One more bit doubles the amount.
How is data capacity measured? • One bit is too small to use as a measurement. • Nobody says: "I've got a 10 GigaBit IPod" • Measures of data capacity are based on a byte. • 1 byte = 8 bits • 1 bytes can have 256 different patterns • 1 byte is big enough to represent many kinds of things
Data Capacity • How many bits would you need to encode • The results of a coin toss? • There are 2 values : 1 bit. • A day of the week? • There are 7 values : 3 bits • A month of the year? • There are 12 values : 4 bits • One keyboard symbol? • There are about 104 values : 7 bits • The day of the year? • There are 365 values : 9 bits
Numeral System Numeral system : a way of representing number in written form. Consider the following three numbers. What numbers are represented?
Tally Marking • Tally marking: a numeral system • A number is represented by making one tally mark for each unit in the number. • You may have used tally marking when keeping score in a game of tic-tac-toe.
Roman Numerals • Roman numerals : a number system • The Roman numeral system has well defined rules for representing integer numbers • The system is not widely used today because it is difficult to understand all of the rules and to decode larger numbers. • For example, the latest SuperBowl is XLVII
Decimals • Decimal numerals : a positional numeral system using powers-of-10 • The rules of the decimal numbering system are simple enough to easily decode even large numbers.
Binary • Binary numerals : a positional numeral system using powers-of-2 • The rules of the decimal numbering system are simple enough to easily decode even large numbers.
Numeral Systems The numeral systems we described begin to look different once the numbers become larger. The number 5 is given as
Digital NUMBERS • All digital data is a sequence of bits; a bit-string. • How can we represent an integer number as a bit-string? • Consider the decimal number 515. • A sequence of digits • Digits are one of: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • Meaning of a digit depends on position: power of 10 • 515 = 5×102 + 5×101 + 5×100 • Consider the binary number 101. • Binary uses a base-2 (or radix 10) system rather than base 10 • Digits are one of: 0, 1 • Meaning of a digit depends on position: power of 2 • 101 = 1×22+ 0×21+ 1×20
Convert binary to decimal • 101110 = • 1×25+ 0×24+ 1×23 + 1×22+ 1×21+ 0×20 • 1×32 + 0×16 + 1×8+ 1×4 + 1×2 + 0×1 • 32 + 0 + 8 + 4 + 2 + 0 • 46 • 110001= • 1×25 + 1×24+ 0×23 + 0×22+ 0×21+ 1×20 • 1×32 + 1×16 + 0×8+ 0×4 + 0×2 + 1×1 • 32 + 16 + 0 + 0 + 0 + 1 • 49
What's the base? • It's easy to get confused and not be sure of what base a number is written in. For example, is 111: • One hundred eleven? • Five? • A subscript can be used to specify the base whenever it is unclear. • 1112 is equal to seven • 11110 is equal to one hundred eleven.
Biggest Binary Number • What is the biggest number you can have with • Two bits? • Three bits? • Four bits? • Five bits? • N bits? http://www.flickr.com/photos/barnoid/2025811494/sizes/o/in/photostream/
What about real numbers? • Is it possible to represent a real number such as 2.31 or 2.125 in base 2? • In base 10, the value 2.125 means: • 2×100+ 1×10-1+ 2×10-2 + 5×10-3 • In base 2, the value 1.101 means: • 1×20+ 1×2-1+ 0×2-2 + 1×2-3 • 1×1 + 1×(1/2) + 0×(1/4)+ 1×(1/8) • 1 + .5 + 0 + .125 • 1.625
What about real numbers? • How many decimal digits does it take to accurately represent 1/3 as a real number? • 1/3 = 0.33333333333333333333333... • How many decimal digits does it take to accurately represent 1/5 as a real number? How many binary digits? • 1/5 = 0.210 • 1/5 = 0.00110011001100110011…2 • Since it requires a potentially infinite number of bits to store a real number, computers can be imprecise.
Sources of Error • Precision: some numbers require more bits to exactly encode than are available. We must choose an approximate number to encode. • Example: 1/3 • Underflow: Occurs when a computer operation produces a value that is too small to encode with the available bits • Example: Two 8-bit numbers. Each stores a number in the range 0-255. What is 0-1? • Overflow: Occurs when a computer operation produces a value that is too large to encode with the available bits • Example: Two 8 bit-numbers. What is 255+1?
What about text? Can text be represented as a sequence of binary digits (bits)? Text is a made of pictures (also known as symbols or characters). Each character can be associated with an integer number
What about text? • The numbers associated with a character can obviously be stored • About how many unique numbers are required for English text? (asked another way, how many unique characters did William Shakespeare ever use?) • One byte has enough capacity to store an English character. • About how many unique numbers are required for Chinese text? • Two bytes is enough for most languages: 电脑
ASCII Table Most computers that are configured for English writers, use the ASCII table. This table associates numbers with English text.
What about colors? • How might a computer store a 'color'? • What are the primary colors of pigment? • Cyan, magenta, yellow • What are the primary colors of light? • Red, green, blue
RGB Color Model • RGB color model • Uses red, green, and blue as the primary colors. • Any color can be represented by combining different amounts of these three primaries. • Consider a flashlight that has a slider that chooses the strength of light emitted. • Setting the slider to zero, the flashlight is turned completely off • Setting the slider to 255, the flashlight generates as much light as it is capable of generating. • Consider three such flashlights • Each light emits purely red; green; or blue light. If all three flashlights are aimed at the same spot on a white wall any color can be projected onto the wall by adjusting the slider values on the three lights in different ways.
CMY Color Model • CMY color model • Uses cyan, magenta, and yellow as the primary colors. • A color can be obtained by combining different amounts of these three primaries. • Consider an artist working with a palette of three paint colors. The artist mixes them together with a ratio given as: • If a color has a zero ratio, that color is not used at all • If a color has a ratio value of 255, that color is used maximally. • Any color can be generated on the canvas by adjusting the ratio of these pigments in different ways.
What about colors? • How to store the color: • red? • green? • purple? • black? • white? • How many bits to record one color? • A digital image is a table (or grid) of pixels. • Pixel is short for "picture element" • A pixel is one color
What about pictures? • Could you encode an image as a sequence of bits? • Starting from the upper-left pixel, scan the image left-to-right, top-to-bottom • Record each pixel that you encounter. • How many bits would be required for a • 100x100 image? • 1024x768 image? • Most JPG files of 1024x768 are about 3-4 Meg. How?
What about pictures? • There are many different ways to encode the same information. Some ways use more bits than others. • Consider a black & white 8x8 image. • Use 0 for white and 1 for black • This is known as 'raw' or 'bitmap' format
What about pictures? 2,4,2 1,1,4,1,1 1,1,1,2,1,1,1 1,6,1 1,1,1,2,1,1,1 1,2,2,2,1 1,1,4,1,1 2,4,2 Raw Run Length Can you think of numbers in the Run Length code above that are not needed? • Run length encoding is another way to encode images • A 'run' is the length of successive like-colored pixels • Store the lengths of these runs for each row, starting with white
Compression • When data is compressed, information is encoded using fewer bits. • This speeds transmission • Reduces storage cost (smaller drives) • May increase processing (must un-compress to view/process) • For pictures, there are two types: • Lossless: No information is lost • Lossy: Information may be lost
Compression • Football uses “compression” to call plays. • Describing a full play in English words would be too time consuming. • A compressed play-calling scheme might use four digits to communicate a single play. • The first digit might indicate the snap count • the second the blocking scheme • the third will define the alignment and motion of the backs • the fourth describing the nature of the play itself. • The play 3518 might be understood as 3-snap count; 5-zone blocking to the right; 1-fullback aligns right, halfback motions left; 8-quarterback hands off to the right.
Sound • Sound is a physical phenomenon caused by waveforms that propagate through the air. • A microphone transforms the waveform into an analog electric signal • The analog signal is sampled to produce a sequence of numbers. • Frequency is the rate at which sound waves change. • Hertz : number of changes per second. • Human hearing : 20Hz to 20,000 Hz (20KHz)
Sound • The Nyquist–Shannon sampling theorem • The rate at which samples are taken must be at least twice that of the measured signal • The highest-frequency sound wave that can be heard with the human ear is about 20Khz. • A sampling rate of at least 40 kHz is required when capturing sound • This explains why compact discs (CD) are sampled at a rate of 44.1 kHz; digital audio tapes (DAT) are sampled at 48 kHz