200 likes | 402 Views
End-to-End Data. Outline Presentation Formatting Data Compression. Problem. The sender and receiver seeing the same data is often called the presentation format . The efficiency of the encoding involves the error detection/correcting and data compression. Presentation Formatting.
E N D
End-to-End Data Outline Presentation Formatting Data Compression
Problem • The sender and receiver seeing the same data is often called the presentation format. • The efficiency of the encoding involves the error detection/correcting and data compression.
Presentation Formatting • The transformations of network data from the representation used by the application into a form suitable for transmission is called presentation formatting. • The sending program encodes data into a message and the receiving application decodes the message into data. • Encoding is sometimes called argument marshalling, and decoding called unmarshalling.
Application Application data data Presentation Presentation encoding decoding … Message Message Message Presentation Formatting • Data types we consider • integers • floats • strings • arrays • structs • Types of data we do not consider • images • video • multimedia documents
(2) (17) (34) (126) Big- endian 00000010 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 0 (126) (34) (17) (2) Little- endian 0 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 High Low address address Difficulties • Representation of base types • floating point: IEEE 754 versus non-standard • integer: big-endian versus little-endian (e.g., 34,677,374) • Compiler layout of structures
Taxonomy • Data types • base types (e.g., ints, floats); must convert • flat types (e.g., structures, arrays); must pack • complex types (e.g., pointers); must linearize • Conversion Strategy • canonical intermediate form • receiver-makes-right (an N x N solution) Application data structure Marshaller
Interface descriptor for Procedure P Call P P Arguments Specification Arguments Code Code Client Stub Server type = stub compiler stub len = 4 value = 417892 INT Marshalled Marshalled arguments arguments RPC RPC Message Taxonomy (cont) • Tagged versus untagged data • Stubs • compiled • interpreted
eXternal Data Representation (XDR) • Defined by Sun for use with SunRPC • C type system (without function pointers) • Canonical intermediate form • Untagged (except array length) • Compiled stubs
Count Name 3 7 J O H N S O N List 4 9 7 8 3 2 1 2 6 5 3 #define MAXNAME 256; #define MAXLIST 100; struct item { int count; char name[MAXNAME]; int list[MAXLIST]; }; bool_t xdr_item(XDR *xdrs, struct item *ptr) { return(xdr_int(xdrs, &ptr->count) && xdr_string(xdrs, &ptr->name, MAXNAME) && xdr_array(xdrs, &ptr->list, &ptr->count, MAXLIST, sizeof(int), xdr_int)); }
type length type length value type length value value Abstract Syntax Notation One (ASN-1) • An ISO standard • Essentially the C type system • Canonical intermediate form • Tagged • Compiled or interpretted stubs • BER: Basic Encoding Rules (tag, length, value)
Network Data Representation (NDR) • IntegerRep • 0 = big-endian • 1 = little-endian • CharRep • 0 = ASCII • 1 = EBCDIC • FloatRep • 0 = IEEE 754 • 1 = VAX • 2 = Cray • 3 = IBM • Defined by DCE • Essentially the C type system • Receiver-makes-right (architecture tag) • Individual data items untagged • Compiled stubs from IDL • 4-byte architecture tag
Compression Overview • Encoding and Compression • Huffman codes • Lossless • data received = data sent • used for executables, text files, numeric data • Lossy • data received does not != data sent • used for images, video, audio
Huffman Codes • Huffman coding [1952] can be used as a reasonable approximation to the theoretical limit. • Write down the symbols and their probabilities: A B C D .50 .30 .15 .05 They are the terminal nodes. • Find and mark the two smallest nodes. Add a node with arcs to the nodes marked. • Set the probability of the new node to the sum of marked nodes. • Repeat steps 2 and 3 until all nodes have been marked, except one the root. • The encoding is found by tracing the path from the root to the symbol, with left=0, right=1.
Huffman Codes () / \ / \1 / \ 0/ () / / \ / 0/ \1 / / \ / / () / / / \ / / 0/ \1 / / / \ (A) (B) (C) (D) .5 .3 .15 .05 0 10 110 111
Lossless Algorithms • Run Length Encoding (RLE) • Replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occurs. • example: AAABBCDDDD encoding as 3A2B1C4D • good for scanned text (8-to-1 compression ratio) • can increase size for data with variation (e.g., some images) • Differential Pulse Code Modulation (DPCM) • First output a reference symbol and then, for each symbol in the data, to output the difference between that symbol and the reference symbol. • example AAABBCDDDD encoding as A0001123333 • change reference symbol if delta becomes too large • works better than RLE for many digital images (1.5-to-1)
Dictionary-Based Methods • Build dictionary of variable-length strings of common terms • Transmit index into dictionary for each term • For example, replace ‘compression’ with 9293. ‘compression’ is 9293rd in /usr/share/dict/words. • Lempel-Ziv (LZ) – compress command is the best-known example. • Commonly achieve 2-to-1 ration on text • Variation of LZ used to compress GIF images • first reduce 24-bit color to 8-bit color • treat common sequence of pixels as terms in dictionary • not uncommon to achieve 10-to-1 compression (x3)
Image Compression • JPEG (Joint Photographic Experts Group) is an ISO/IEC group of experts that develops and maintains standards for a suite of compression algorithms for computer image files. • JPEG is also a term for any graphic image file produced by using a JPEG standard. • A JPEG file is created by choosing from a range of compression qualities (actually, from one of a suite of compression algorithms). • Lossy still-image compression
MPEG • The Moving Picture Experts Group (MPEG), develops standards for digital video and digital audio compression. • Lossy compression of video • First approximation: JPEG on each frame • Also remove inter-frame redundancy