750 likes | 1.11k Views
ITEC 1000 “Introduction to Information Technology”. Lecture 4. Data Formats. Lecture Template:. Data Forms Data conversion and representation Data Formats Alphanumeric Data Image Data Audio Data Data Input Data Compression Internal Computer Data Format. Data Forms.
E N D
ITEC 1000 “Introduction to Information Technology” Lecture 4 Data Formats
Lecture Template: • Data Forms • Data conversion and representation • Data Formats • Alphanumeric Data • Image Data • Audio Data • Data Input • Data Compression • Internal Computer Data Format
Data Forms • Human communication • Includes language, images and sounds • Computers • Process and store all forms of data in binary format • Conversion to computer-usable representation using data formats • Define the different ways human data may be represented, stored and processed by a computer
Data formats • Proprietary formats • Unique to a product or company • E.g., Microsoft Word, Word Perfect • Standards (evolve in two ways): • Proprietary formats become de facto standards (e.g., Adobe PostScript) • Invented by an international standard organization (e.g., Motion Pictures Experts Group, MPEG)
Alphanumeric Data • Characters (r, T), number digits (0..9), punctuation (!, ;), special purpose characters ($, &) • Four codes/standards to represent letters and numbers: • BCD (Binary-Coded Decimal) • Unicode • ASCII (American Standard Code for Information Interchange) • EBCDIC (Extended Binary Coded Decimal Interchange Code)
Standard Alphanumeric Formats • BCD • ASCII • EBCDIC • Unicode Next 2 slides
Binary-Coded Decimal (BCD) • Four bits per digit Note: the following 6 bit patterns are not used: 1010 1011 1100 1101 1110 1111
BCD: Example • 709310 = ? (in BCD) 7 0 9 3 0111 0000 1001 0011
Standard Alphanumeric Formats • BCD • ASCII • EBCDIC • Unicode Next 13 slides
ASCII Features • Developed by ANSI (American National Standards Institute) • Defined in ANSI document X3.4-1977 • 7-bit code • 8th bit is unused (or used for a parity bit or to indicate “extended” character set) • 27 = 128 different codes • Two general types of codes: • 95 are “Printing” codes (displayable on a console) • 33 are “Control” codes (control features of the console or communications channel) • Represents • Latin alphabet, Arabic numerals, standard punctuation characters • Plus small set of accents and other European special characters (Latin-I ASCII)
ASCII Table Most significant bit Least significant bit
ASCII Table e.g., ‘a’ = 1100001
ASCII Table 95 Printing codes
ASCII Table 33 Control codes
ASCII Table Alphabetic codes
ASCII Table Numeric codes
ASCII Table Punctuation, etc.
ASCII Table 7416 111 0100
H e l l o , w o r l d = = = = = = = = = = = = Binary 1001000 1100101 1101100 1101100 1101111 0101100 0100000 1110111 1100111 1110010 1101100 1100100 = = = = = = = = = = = = Hexadecimal 48 65 6C 6C 6F 2C 20 77 67 72 6C 64 = = = = = = = = = = = = Decimal 72 101 108 108 111 44 32 119 103 114 108 100 Example: “Hello, world”
Common Control Codes • CR 0D carriage return • LF 0A line feed • HT 09 horizontal tab • DEL 7F delete • NULL 00 null Hexadecimal code
Standard Alphanumeric Formats • BCD • ASCII • EBCDIC • Unicode Next 3 slides
EBCDIC • 8-bit code • Developed by IBM • IBM and compatible mainframes only • Rarely used today (common in archival data) • Character codes differ from ASCII • Conversion software to/from ASCII available
Standard Alphanumeric Formats • BCD • ASCII • EBCDIC • Unicode Next 2 slides
Unicode • Most common 16-bit form represents 65,536 characters • ASCII Latin-I subset of Unicode • Values 0 to 255 in Unicode table • Multilingual: defines codes for • Nearly every character-based alphabet • Large set of ideographs for Chinese, Japanese and Korean • Composite characters for vowels and syllabic clusters required by some languages • Allows software modifications for local-languages
Collating Sequence • Collating Sequence – the order of the codes in the representation table • Determines sorting and selection of the alphanumeric data • Collating Sequences are different in ASCII and EBCDIC: • Small letters precede capitals in EBCDIC; reverse in ASCII • Numbers collate first in ASCII; in EBCDIC, last
Two Classes of Codes • Printing characters • Produced output on the screen or printer • Control characters • Control position of output on screen or printer • Cause action to occur • Communicate status between computer and I/O device
Escape Sequences • Extend the capability of the ASCII code set • For controlling terminals and formatting output • Defined by ANSI in documents X3.41-1974 and X3.64-1977 • The escape code is ESC = 1B16 • An escape sequence begins with two codes: ESC [ 1B16 5B16
Escape Sequences: Examples • Erase display: ESC [ 2 J • Erase line: ESC [ K
Alphanumeric Input: Keyboard • Scan code • Two different binary scan codes generated • when key is struck and when key is released • Converted to Unicode, ASCII or EBCDIC by software in terminal or PC • Received by the host as a stream of text and other characters, i.e. in the sequence typed • Advantage • Easily adapted to different languages or keyboard layout • Separate scan codes for key press/release for multiple key combinations • Examples: shift and control keys
Shift Key • inhibits bit 5 in the ASCII code a Shift a
Control Key • inhibits bits 5 & 6 in the ASCII code c Ctrl c Controlcode
Keyboard Input • Three letters are typed: “D”, “I”, “R”, followed by the carriage return • Four scan codes translated to ASCII binary codes: 1000100, 1001001, 1010010, 0001101
OCR (optical character recognition) • Scans text and inputs it as character data • Special OCR software required • Used to read specially encoded characters • Example: magnetically printed check numbers • Attempts to recognize hand-written input (limited, only carefully printed)
Bar Code Readers • Used in applications that require fast, accurate and repetitive input with minimal employee training • Examples: supermarket checkout counters and inventory control • Alphanumeric data in bar code (i.e., 780471 108801 90000) read optically using wand that converts them into electrical binary signals • A bar code translation module converts the binary input into a sequence of number codes , one code per digit, then translated to Unicode or ASCII.
OtherAlphanumeric Input • Magnetic stripe reader: alphanumeric data from credit cards • Voice • Digitized audio recording common but conversion to alphanumeric data difficult • Requires knowledge of sound patterns in a language (phonemes) plus rules for pronunciation, grammar, and syntax
Image Data • Photographs, figures, icons, drawings, charts and graphs • Two approaches: • Bitmap or raster images of photos and paintings with continuous variation (e.g., GIF, JPEG) • Object or vector images composed of graphical shapes like lines and curves defined geometrically • Differences include: • Quality of the image • Storage space required • Time to transmit • Ease of modification
Image Input • Image scanning (moves over the image converting dot by dot into a stream of binary numbers, pixels, representing black or white, or levels of gray, or of a colour) – bitmap image • Digital/video cameras – bitmap image • Pointing devices (mouse, pen)- object image
Bitmap Images • Each individual pixel (pi(x)cture element) in a graphic stored as a binary number • Pixel: A small area with associated coordinate location • Example: each point below represented by a 4-bit code corresponding to 1 of 16 shades of gray
Bitmap Display • Monochrome: black or white • 1 bit per pixel • Gray scale: black, white or 254 shades of gray • 1 byte per pixel • Color graphics: 16 colors, 256 colors, or 24-bit true color (16.7 million colors) • 4, 8, and 24 bits respectively
Storing Bitmap Images • Frequently large files • Example: 600 rows of 800 pixels with 1 byte for each of 3 colors ~1.5MB file • File size affected by • Resolution (the number of pixels per inch) • Amount of detail affecting clarity and sharpness of an image • Levels: number of bits for displaying shades of gray or multiple colors • Palette: color translation table that uses a code for each pixel rather than actual color value • Data compression
GIF (Graphics Interchange Format) • First developed by CompuServe in 1987 • GIF89a enabled animated images • allows images to be displayed sequentially at fixed time sequences • Color limitation: 256 • Image compressed by LZW (Lempel-Zif-Welch) algorithm • Preferred for line drawings, clip art and pictures with large blocks of solid color • Lossless compression