130 likes | 262 Views
Information Management. DIG 3563 – Lecture 14 Data Formats J. Michael Moshell University of Central Florida. Original image* by Moshell et al. Imagery is fromWikimedia except where marked with *. Licensing is listed. Data Formats. .We begin with a visit to the chart at
E N D
Information Management DIG 3563 – Lecture 14 Data Formats J. Michael Moshell University of Central Florida Original image* by Moshell et al . Imagery is fromWikimedia except where marked with *. Licensing is listed.
Data Formats • .We begin with a visit • to the chart at • http://en.wikipedia.org/wiki/ • Graphics_file_format_summary • You know all this stuff, right? You're DM people! www.tech-notes.tv
I had asked you ... * Pick ONE item in this chart and be ready to discuss it. * SO let's see what you picked? -3 - www.tech-notes.tv
Key Concepts for Final Exam: lossy and lossless compression Raster and vector imagery Color depth Indexed color Metadata High Dynamic Range (HDR) -4 - www.tech-notes.tv
What's up.doc (?) Warner Brothers File Extensions:original concept from Unix Adopted in CP/M -> DOS -> Windows Tell what application(s) to use with data A crude form of metadata Macintosh had SEPARATE type & app fields but OS-10 now uses extensions as well http://en.wikipedia.org/wiki/File_extension
From Bytes to Characters The Enemy: "Big Blue" (IBM) and the 7 Dwarves BCD and EBCDIC were IBM's Codes ASCII was the big rival, from DEC and Teletype DEC and ASCII won ... 7 bit code + 1 parity bit (odd) punch-tape; odd means every row has at least one hole! 7 bits-> 128 characters (Quite limited) www.nic.funet.fi
Moving to 8 bits ISO 8859-1: An 8 bit standard (256 chars.) Added Western European characters like A The default for web pages But ... there are hundreds of other languages! Unicode: a system to get 'em ALL ... several variants. UTF-8 is most common: one byte for Western alphabets, four bytes for most everything else o
Unicode A is the same as A (Unicode does not represent styling, size, etc.) Codes 0-255 are same as ISO 8859-1 ("Ascii") CODE POINTS: Numbers between 0 and 10FFFFx i. e. 1 to 6 bytes.Normally 4 bytes, designated like U+0058 represents LATIN CAPITAL LETTER X. (Hex 58 = decimal 88). 我 (wo3) is unicode 我 <<decimal nr.<< o
Unicode Aside: What's the decimal value of the largest 4 digit Hexadecimal integer? FFFFx? Well, it's one less than 10000x, which would be (16)4 or (256)2 or 65536 (so called 64k) 我 (wo3) is unicode 我 <<decimal nr.<< So it's a 4 digit Hex code.
MIME Types Internet eMail follows the SMTP – Simple Mail Transport Protocol (ASCII) But many attachments are not ASCII. o MIME is an Internet Standard way of classifying attachments - and now used for most media, not just e-mail. MIME-Version: 1.0 is a "standard" even though it changes(!) content-type default is text/plain type subtype
MIME Types the types come from a short list: • Text • Image – requires a display device • Audio – requires an audio output device • video – requires ability to show moving images • application – usually binary data for use by a specific • application. Subtype "octet-stream" if no • app is specifically associated. • multipart – consists of multiple entities of independent • data types. • message – rare (and I don't understand it.)
MIME Types • Text • Content-type: text/plain; charset=iso-8859-1 • If you have text, then you must have a character set, no? yes! • Here's a good reference for looking up MIME types: • http://www.iana.org/assignments/media-types/index.html