370 likes | 599 Views
http:// proglit.com /. bits and text. SA. BY. byte. (the size of a cell of addressable memory) 8 bits on all modern systems octet = 8 bits. kilo byte. 1,000 ( 10 3 ) bytes or 1,024 ( 2 1 0 ) bytes. mega byte. 1,000,000 ( 10 6 ) bytes or 1,048,576 ( 2 2 0 ) bytes. giga byte.
E N D
SA BY
byte (the size of a cell of addressable memory) 8 bits on all modern systems octet = 8 bits
kilobyte 1,000 (103) bytes or 1,024 (210) bytes
megabyte 1,000,000 (106) bytes or 1,048,576 (220) bytes
gigabyte 1,000,000,000 (109) bytes or 1,073,741,824 (230) bytes
(1012 bytes or240 bytes) petabyte (1015 bytes or250 bytes) terabyte exabyte (1018 bytes or260 bytes) zettabyte (1021 bytes or270 bytes)
(210 bytes) kibibyte mebibyte (220 bytes) gibibyte (230 bytes)
(103 bits or210 bits) megabit kilobit (106 bits or220 bits) gigabit (109 bits or230 bits) etc…
kilobit (kb) kilobyte (kB)
b a n a n a 2 1 14 1 14 1 “banana”
2 1 14 1 14 1 • b a n a n a 2 1 14 1 14 1
b a n a n a 52 97 4 97 4 97 “banana”
character set ASCII (a mapping of characters to numbers) (American Standard Code for Information Interchange) 128 characters
whitespace character (a character representing spacing)
A b a n a n a 65 32 97 96 110 96 110 96 “A banana”
whitespace character (a character representing spacing) space, tab, linefeed, carriage return
control character • LF (line feed) • CR (carriage return) • FF (form feed) • BEL (bell) (signals an action response to the reader)
plain text • (no formatting, only characters) • no italics, underline, or bold • no fonts, font sizes, or colors • no margins, columns, or page breaks • etc.
character jj glyph (a unit of written language and notation) (an actual visual representation of a character)
character encoding (scheme for representing characters as bits) • c a t • 100 97 116 • 0x64 0x61 0x74 ASCII = 1 byte per character
Unicode (the world standard character set and its encodings) • U+0000 • to • U+10FFFF
U+0000 – U+FFFF plane 0, BMP (Basic Multilingual Plane) • U+10000 – U+1FFFF plane 1, SMP (Supplementary Multilingual Plane) • U+20000 – U+2FFFF plane 2, SIP (Supplementary Ideographic Plane) • U+30000 – U+DFFFF planes 3 to 13 currently unassigned • U+E0000 – U+EFFFF plane 14, SSP (Supplementary Special-purpose Plane) • U+F0000 – U+FFFFF plane 15, PUA (Private Use Area) • U+100000 – U+10FFFF plane 16, PUA (Private Use Area)
UTF-32 • U+3FF01 0000_0000 0000_0011 1111_1111 0000_0001 • 00 03 FF 01 • U+40077 0000_0000 0000_0100 0000_0000 0111_0111 • 00 04 00 77 • U+0065 0000_0000 0000_0000 0000_0000 0110_0101 • 00 00 00 65 (4 bytes per character)
UTF-16 • U+0065 0000_0000 0110_0101 • 00 65 • U+F10F 1111_0001 0000_1111 • F1 0F (2 or 4 bytes per character)
UTF-16 1101_10xx xxxx_xxxx 1101_11xx xxxx_xxxx * (fixed) (plane) (character) U+3F010 1101_1000 1011_11001101_1100 0001_0000 U+10FF00 1101_1011 1111_11111101_1111 0000_0000 U+17711 1101_1000 0001_11011101_1111 0001_0001 (2 or 4 bytes per character)
UTF-16 surrogates: U+D800 to U+DFFF U+3F010 1101_1000 1011_11001101_1100 0001_0000 D8 BC DC 10 U+10FF00 1101_1011 1111_11111101_1111 0000_0000 DB FF DF 00 U+17711 1101_1000 0001_11011101_1111 0001_0001 D8 1D DF 11 (2 or 4 bytes per character)
UTF-8 U+0000 – U+007F: 0xxx_xxxx U+0080 – U+07FF: 110x_xxxx 10xx_xxxx U+0800 – U+FFFF: 1110_xxxx 10xx_xxxx 10xx_xxxx U+10000 – U+10FFFF: 1111_0xxx 10xx_xxxx 10xx_xxxx10xx_xxxx (1 to 4 bytes per character)
UTF-8 U+0031: 0011_0001 U+0700: 1101_1100 1000_0000 U+86FF: 1110_1000 1001_1011 1011_1111 U+50000: 1111_0001 1001_0000 1000_0000 1000_0000 (1 to 4 bytes per character)
UTF-8 U+0031: (valid) 0011_0001 U+0031: (invalid) 1111_0000 1000_0000 1000_0000 1011_0001 (1- to 4-bytes per character)
text editor • notepad • vi/vim • emacs (a program for creating and editing text files)