130 likes | 250 Views
Data Representation. CPIS 210 John Beckett. Useful Skills. Understand how data is organized Helps development & debugging Helps understand performance issues for planning & design Understand how to transform data Is it related to other data? Is there a transliteration?
E N D
Data Representation CPIS 210 John Beckett
Useful Skills • Understand how data is organized • Helps development & debugging • Helps understand performance issues for planning & design • Understand how to transform data • Is it related to other data? • Is there a transliteration? • What is truncated or expanded in the process • Understand how to quantify performance
A Brief History • Pictograph – represents ideas by pictures • Alphabet – represents phonemes (sounds) by letters • Morse Code – Represents characters by strings of on/off states over time • Electrical Analog – represents • variations in physical reality with • Variations in voltage • Audio: air pressure over time • Video: locations on a raster New media tend not to replace the old, but to encapsulate it. E.g. Movies on TV
Digital Representation • Everything must be funneled into 1’s and 0’s • If the information was discrete symbols (e.g. the alphabet) or events, there is a code list • Hollerith, BCDIC, EBCDIC – IBM • Baudot – 5 bits, required shift for num/char • ASCII – 7 bits • Unicode – varying width, extends ASCII • If the information represents physical state, there is a standard (e.g. WAV, MP3)
Conversions • Binary to Hex: Group bits by 4, then map them to the hex sequence: • 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F • Hex to Binary: Turn each digit into 4 bits
IPv4 Addresses • Four bytes, each of which contains 8 bits • 8 bits can contain any positive number from 0-255
Masking • If two numbers are combined with “AND” operator, a “1” bit will result where both bits were “1” but a “0” bit otherwise • This is how IP Net Masks work • If two numbers are combined with “OR” operator, a “1” bit will result where either bit was “1”, and an “0” bit will only result if both were “0”.
Masking • If we know that a certain bit (let’s say, 8) indicates whether a charge transaction was accepted, we can use an “OR” mask to access only the relevant bit. Example in php: $data=128+16+8+1; $mask=8; print $data . "<br />"; $result = $data && $mask; print $mask . "<br />"; Result: 1538 Result: 1450
Representing Multiple Bits • Parallel: Separate in space • Issue: Synchronizing multiple lines • Issue: Multiplies cost of transmission • Serial: Separate in time • Issue: Aggregate speed may be very high • Current practice is gravitating toward serial in all domains except shortest/fastest (e.g. video display)
Re-Coding Methods • If code size is the same and character set is similar, use a lookup table (e.g. EBCDIC to ASCII) • Video and audio re-coding can be particularly complex. A preliminary or intermediate format: • Preserves all nuances of every representation • Probably native to the equipment you use • Probably consumes a great deal of data space • Example: “raw” format in digital cameras
Communicating Data • Issue: How do you synchronize? • Async: lost time in high-volume situations • Sync: Must keep the channel active (special “SYN” pattern that can be re-captured easily) • Issue: Which bit goes first? • RS-232 sends low-order bit first • Issue: Routing versus content? • TCP/IP organizes in terms of “packets” which include both types of information
Close to the Metal • As we develop more-complex data management schemes, we use up more computer power • Need extreme speed? • Simplify the requirements • Use available speed for performance, not ease-of-use and management features
The Challenge Continues • Was: How to get data into the computer • Then: How to transfer data to new system • Now: How to establish live (or more lively) links easily • One solution: XML • Provides for structure • Separate schema may be used