1 / 18

End-to-End Data

End-to-End Data. Outline Presentation Formatting Data Compression. Problem. The sender and receiver seeing the same data is often called the presentation format . The efficiency of the encoding involves the error detection/correcting and data compression. Presentation Formatting.

tad
Download Presentation

End-to-End Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. End-to-End Data Outline Presentation Formatting Data Compression

  2. Problem • The sender and receiver seeing the same data is often called the presentation format. • The efficiency of the encoding involves the error detection/correcting and data compression.

  3. Presentation Formatting • The transformations of network data from the representation used by the application into a form suitable for transmission is called presentation formatting. • The sending program encodes data into a message and the receiving application decodes the message into data. • Encoding is sometimes called argument marshalling, and decoding called unmarshalling.

  4. Application Application data data Presentation Presentation encoding decoding … Message Message Message Presentation Formatting • Data types we consider • integers • floats • strings • arrays • structs • Types of data we do not consider • images • video • multimedia documents

  5. (2) (17) (34) (126) Big- endian 00000010 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 0 (126) (34) (17) (2) Little- endian 0 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 High Low address address Difficulties • Representation of base types • floating point: IEEE 754 versus non-standard • integer: big-endian versus little-endian (e.g., 34,677,374) • Compiler layout of structures

  6. Taxonomy • Data types • base types (e.g., ints, floats); must convert • flat types (e.g., structures, arrays); must pack • complex types (e.g., pointers); must linearize • Conversion Strategy • canonical intermediate form • receiver-makes-right (an N x N solution) Application data structure Marshaller

  7. Interface descriptor for Procedure P Call P P Arguments Specification Arguments Code Code Client Stub Server type = stub compiler stub len = 4 value = 417892 INT Marshalled Marshalled arguments arguments RPC RPC Message Taxonomy (cont) • Tagged versus untagged data • Stubs • compiled • interpreted

  8. eXternal Data Representation (XDR) • Defined by Sun for use with SunRPC • C type system (without function pointers) • Canonical intermediate form • Untagged (except array length) • Compiled stubs

  9. Count Name 3 7 J O H N S O N List 4 9 7 8 3 2 1 2 6 5 3 #define MAXNAME 256; #define MAXLIST 100; struct item { int count; char name[MAXNAME]; int list[MAXLIST]; }; bool_t xdr_item(XDR *xdrs, struct item *ptr) { return(xdr_int(xdrs, &ptr->count) && xdr_string(xdrs, &ptr->name, MAXNAME) && xdr_array(xdrs, &ptr->list, &ptr->count, MAXLIST, sizeof(int), xdr_int)); }

  10. type length type length value type length value value Abstract Syntax Notation One (ASN-1) • An ISO standard • Essentially the C type system • Canonical intermediate form • Tagged • Compiled or interpretted stubs • BER: Basic Encoding Rules (tag, length, value)

  11. Network Data Representation (NDR) • IntegerRep • 0 = big-endian • 1 = little-endian • CharRep • 0 = ASCII • 1 = EBCDIC • FloatRep • 0 = IEEE 754 • 1 = VAX • 2 = Cray • 3 = IBM • Defined by DCE • Essentially the C type system • Receiver-makes-right (architecture tag) • Individual data items untagged • Compiled stubs from IDL • 4-byte architecture tag

  12. Compression Overview • Encoding and Compression • Huffman codes • Lossless • data received = data sent • used for executables, text files, numeric data • Lossy • data received does not != data sent • used for images, video, audio

  13. Huffman Codes • Huffman coding [1952] can be used as a reasonable approximation to the theoretical limit. • Write down the symbols and their probabilities: A B C D .50 .30 .15 .05 They are the terminal nodes. • Find and mark the two smallest nodes. Add a node with arcs to the nodes marked. • Set the probability of the new node to the sum of marked nodes. • Repeat steps 2 and 3 until all nodes have been marked, except one the root. • The encoding is found by tracing the path from the root to the symbol, with left=0, right=1.

  14. Huffman Codes () / \ / \1 / \ 0/ () / / \ / 0/ \1 / / \ / / () / / / \ / / 0/ \1 / / / \ (A) (B) (C) (D) .5 .3 .15 .05 0 10 110 111

  15. Lossless Algorithms • Run Length Encoding (RLE) • Replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occurs. • example: AAABBCDDDD encoding as 3A2B1C4D • good for scanned text (8-to-1 compression ratio) • can increase size for data with variation (e.g., some images) • Differential Pulse Code Modulation (DPCM) • First output a reference symbol and then, for each symbol in the data, to output the difference between that symbol and the reference symbol. • example AAABBCDDDD encoding as A0001123333 • change reference symbol if delta becomes too large • works better than RLE for many digital images (1.5-to-1)

  16. Dictionary-Based Methods • Build dictionary of variable-length strings of common terms • Transmit index into dictionary for each term • For example, replace ‘compression’ with 9293. ‘compression’ is 9293rd in /usr/share/dict/words. • Lempel-Ziv (LZ) – compress command is the best-known example. • Commonly achieve 2-to-1 ration on text • Variation of LZ used to compress GIF images • first reduce 24-bit color to 8-bit color • treat common sequence of pixels as terms in dictionary • not uncommon to achieve 10-to-1 compression (x3)

  17. Image Compression • JPEG (Joint Photographic Experts Group) is an ISO/IEC group of experts that develops and maintains standards for a suite of compression algorithms for computer image files. • JPEG is also a term for any graphic image file produced by using a JPEG standard. • A JPEG file is created by choosing from a range of compression qualities (actually, from one of a suite of compression algorithms). • Lossy still-image compression

  18. MPEG • The Moving Picture Experts Group (MPEG), develops standards for digital video and digital audio compression. • Lossy compression of video • First approximation: JPEG on each frame • Also remove inter-frame redundancy

More Related