1 / 65

NATO Consultation, Command and Control Agency

NATO Consultation, Command and Control Agency. COMMUNICATIONS & INFORMATION SYSTEMS Decreasing “Bit Pollution” through “Sequence Reduction”. Dr. Davras Yavuz yavuz@nc3a.nato.int. You will find this presentation and the accompanying paper at www.nc3a.info/MCC2006

nishi
Download Presentation

NATO Consultation, Command and Control Agency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NATOConsultation, Command and Control Agency COMMUNICATIONS & INFORMATION SYSTEMS Decreasing “Bit Pollution” through “Sequence Reduction” Dr. Davras Yavuz yavuz@nc3a.nato.int NATO UNCLASSIFIED

  2. You will find this presentation and the accompanying paper at www.nc3a.info/MCC2006 from where both can be viewed and/or downloaded (the four other NC3A presentations can also be found at the above URL) NATO UNCLASSIFIED

  3. Terminology “Sequence Reduction” Originates with Peribit ~2000, Founder’s Ph. D. on Genome Mapping - uses the term “Molecular Sequence Reduction” (MCR) - Biomedical Informatics, Stanford University “Bit Pollution” Link/network pollution repetition of redundant digital sequences over transmission media (especially significant for mobile/deployed networks/links) Other related terms: WAN optimizer, Application Accelerator/ Optimizer or Application Controller-Optimizer, Performance Enhancement Proxies (PEP), WAN Expanders, Latency (=delay) removers/compensators/mitigators ….. etc. New & dynamic field, many terms will continue to appear, coalesce, some will catch on others will disappear NATO UNCLASSIFIED

  4. Terminology • “Next Generation Compression”, “Bit Pollution Reduction”, “Sequence Reduction” (latter Peribit/Dr. Amit Singh) • WAN Expander (WX), WAN Optimizer, WAN Optimization Controller (WOC)(Juniper/Peribit) • Application Accelerator/Optimizer/Controller-Optimizer • Latency Remover/Optimizer(replace Latency by “Delay” ) • Especially for networks with SATCOM links • In general; use of a-priori knowledge of data comms protocols required by application to optimize the data input/output • Combinations of above • Unfortunately all present implementations “proprietary” • Unrealistic to expect “standards” soon, technology too new and lucrative

  5. Why “Bit Pollution” ? Most of us deal daily with various electronic files/ information Taking MS Office as an example; Word, PPT, Excel, Project, HTML, Access, …. Files …and/or many other electronic files, data-bases, forms, etc.,.. On many occasions we make small changes and send them back and/or forward to others Repetitive traffic over communication links can, in general, be classified broadly into 3 categories: 1) Application & protocol overheads 2) Commonly used words, phrases, strings, objects (logos, images, audio clips, etc.) 3) Process flows (data-base updates/views, forms, templates, etc. going back & forth) NATO UNCLASSIFIED

  6. SEQUENCE REDUCTIONNext Generation Compression - Examples 256 Kbps satellite link • 20 Mbytes PPT file (48 slides) sent 1st time : ~12 minutes (700 secs) • 6 of the slides modified, file size change <0.5 Mbytes • Modified file sent 6 hours later time taken: ~ 8 secs • Same modified file sent 24 hours later ~ 18 secs • Sent 7 days later ~24 secs • Original file sent 7 days later : ~14 secs • Similar results for Word, Excel files and web pages • Less but still significant improvement for PDF files • Smallest improvement for zipped files (reduction by ~ 2.5 to 3) Amount of “new” files in between repetitions & SR RAM/HD capacities have strong effect on the duration of repeat transmissions (dynamic library updates) Above results based on Peribit SR s : German MOD, Syracuse University “Real World” Labs (Network Computing Nov 2004) and NC3A GE MOD results based on operational traffic, others test traffic Ref [6] of paper: “Record for throughput was ~60Mbps through a T1. It came about when copying 1.5GB file twice! ” NATO UNCLASSIFIED

  7. Mobile/Tactical Comms Divergence • Fixed communications – WANs with all users/nodes fixed • Fiber-optic/photonic revolution: Essentially unlimited capacity is now possible/available if/when a cable can be installed • Mobile comms: Networks with mobile/deployable users • No technological revolution similar to photonic foreseen • Radio propagation will be the limiting factor • Mainstay will be radio: Tactical LOS tens/hundreds of Kbps, BLOS (rough terrain, long distances) few Kbps • Star-wars scenarios : Moving laser beams ??? • LEO satellites will provide some 100s of Kbps at a cost • Divergence will continue • Another factor: Input into the five senses : ~100 Shannon/ Entropy bps • For transmission redundancy : x 10 = 1 Kbps Therefore: we must treat mobile/tactical comms differently NATO UNCLASSIFIED

  8. Deployable, Mobile, On-the-MoveCommunications • At least one end of a link moving/deployed • Networks which have nodes/users moving/deployed • Such links/networks essential for survivability and rapid reaction • Will be taking on increasingly more critical tasks • Present approach: Use applications developed for fixed links/networks for deployed/mobile units • Must consider the very different characteristics of such networks when choosing applications Can we measure information” so we can determine performance of links/ networks in terms of “information” transported, not just bits/bytes NATO UNCLASSIFIED

  9. Can we measure “information” ?Yes we can ! Shannon defined the concept of “Entropy”, a logarithmic measure in 1940s (while working on cryptography), it has stood the test of time • First suggestion of log measure was Hartley (base 10) but Shannon used the idea to develop a complete “theory of information & communication” • Shannon preferred Log2 and called the “unit” bits • Base e is also sometimes used (Nats) Smaller the probability of occurrence of an event higher the “information delivered” when it occurs NATO UNCLASSIFIED

  10. {Rj} {Si} {{ Discrete, countable discrete C. E.Shannon (BSTJ 1948)

  11. Entropy Entropy (H) in the case of two possibilities/events/symbols Prob of one = p the other q = 1-p H = -(p log p + q log q) H versus p plotted  NATO UNCLASSIFIED

  12. Let us take a “Natural Language” English as an example • English has 26 letters (characters) • Space as a delimiter • TOTAL 27 characters (symbols) • One could include punctuation, special characters, etc., for example we could use the full 256 ASCII symbol set - methodology is the same • Extension to other natural languages readily made • Extension to images also possible (same methodology) NATO UNCLASSIFIED

  13. Structure of a “Natural Language” - English • Defined by many characteristics: Grammar, semantics, etymology, usage, …., historical developments, …. • Until early 70s there was substantial belief that “Natural Languages” and “computer programming languages” (finite automata instructions) had similarities • Noam Chomsky’s work (Professor at MIT) completely destroyed those expectations • Natural Languages can be studied through probabilistic (Markov) models • Shannon’s approach (1940s, no computers, Bell Labs staff flipped through many pages of books to get the probabilities) • He was actually working on cryptography and made important contributions in that area also NATO UNCLASSIFIED

  14. Various Markov model examples here, skipped here for continuity, may be found at the end NATO UNCLASSIFIED

  15. Zipf’s Law “Principle of Least Effort” • George Kingsley Zipf, Professor of Linguistics, Harvard (1902 – 1950) • If the “words” in a language are ordered (“ranked”) from the most frequently used down the probability Pn of the nth word in this list is Pn  0.1 / n • Implies a maximum vocabulary size 12366 words since (  1 / n is not finite when summed 1 to  ) For details of above see DY IEEE Transactions on Information Theory, September 1974 Many other applications of “Zipf’s Law”, if interested just make a Google/Internet search NATO UNCLASSIFIED

  16. Zipf’s Law (Principle of Least Effort) ~ million words, various texts From “Symbols, Signals & Noise” J. R. Pierce

  17. Entropy bits/character - English Amazingly it turns out to be about the same for most “Natural Languages” for which the analysis has been done (Arabic, French, German, Hebrew, Latin, Spanish, Turkish, .…). These languages also follow Zipf’s Law. NATO UNCLASSIFIED

  18. Entropy of Natural Languages Between 1 & 2 bits per letter/character 1.5 bits per letter is commonly used English has ~4.5 letters per word on the average 4.5 x 1.5 = 6.75 or ~7 bits per word average Normal speech 1 - 2 words per second Hence information per second ~ 5 bits NATO UNCLASSIFIED

  19. Extension to Images • Same concept and definitions • Letters replaced by pixels/groups of pixels, etc. • Words could be analogous to sets of pixels, objects • The numbers are much larger • E.g. 400 x 600 = 240000 pixel image with each pixel capable of taking on one of 16 brightness levels • 16240000 possible images • Assume all these images are equally likely (*): Probability of one these images is 1/ 16240000 and the information provided by that image is 240000 log2 16 = 0.96 106 bits • A real image contains much smaller “information” adjacent/nearby pixels are not independent of each other • Movies : frame to frame only small/incremental changes (*) “equally likely” assumption clearly not realistic NATO UNCLASSIFIED

  20. Speech Coding ~5 b/s is irreducible information content, x by 10 to introduce redundancy - therefore we should be able communicate speech “information” at ~50 bps Examples of speech coding we use: 64000 bps , 32000 bps PC 16000 bps CVSD, 2400 bps LPC, MELP 1200, 600 bps MELP All above “waveform” codecs, they will also convey “non-measurable” (intangible) information Speech codecs (recognition at transmitter and synthesis at receiver ) technology could conceivably go lower than 600 bps but would not contain the intangible component !

  21. A QUICK REFRESHER ON CONVENTIONAL COMPRESSION May be found at the end NATO UNCLASSIFIED

  22. SEQUENCE REDUCTIONNext Generation Compression Dictionary based – implements learning algorithm • Dynamically learns the “language” of the communications traffic and translates into “short-hand” • Continuously updates/improves “knowledge” of link “language” • Frequent patterns move up in dictionary, infrequent patterns move down and eventually can age out • No fixed packet or window boundaries • Unlike e.g. LZ which generally uses 2048 byte window • Once a pattern is learned and put in dictionary it will be compressed wherever it appears • Data compression is based on previously seen data • Performance improves with time as “learning” increases • Very quickly at first (10 –20 minutes) and then slowly • When a new application comes in, SR adapts to its “language” NATO UNCLASSIFIED

  23. MOLECULAR SEQUENCE REDUCTION Relative positioning of statistical and substitutional compression algorithms (from Peribit, A. P. Singh)

  24. “Molecular Sequence reduction” NATO UNCLASSIFIED www.Peribit.com

  25. Origins in DNApattern matching MSR – Technology • Real time, high speed, low latency • Continuously learns and updates dictionary • Transparently operates on all traffic (optimized for IP) • Eliminates patterns of any size, anywhere in stream • Patent-pending technology NATO UNCLASSIFIED

  26. MSR – Molecular Sequence Reduction “Next-gen dictionary-based compression” NATO UNCLASSIFIED www.peribit.com

  27. Government/Military use examples • Many thousands of units in use in USA (mostly corporate but also government agencies) • GE MOD using Peribit SRs (since ~2 years) • INMARSAT German Navy WAN (encrypted) • Links to GE Navy ships in/around South Africa • Satellite links to GE units in Afghanistan • Plans for some 64 Kbps landlines • GE MOD total : 300+ units • also other nations …… • Some with initial trials NATO UNCLASSIFIED

  28. Reduction rates observed (reduced by % amount given) GE Armed Forces Results Traffic type Version 3.0 V 4.02 V 5.0 HTTP 30 % 40 % 46 % MAIL 61 % 67 % NetBios 59 % 62 % CIFS 92 % 92 % FTP 69 % 73 % TELNET 65 % 69 % 93 % NATO UNCLASSIFIED

  29. From German MOD NATO UNCLASSIFIED

  30. From German MOD Startup behavior example NATO UNCLASSIFIED

  31. From German MOD NATO UNCLASSIFIED

  32. From German MOD NATO UNCLASSIFIED

  33. From Peribit.com (not GE MOD data) NATO UNCLASSIFIED

  34. Peribit(screen capture) NC3A – WAN (NL – BE) EFFECTIVE WAN CAPACITY INCREASED BY 2.80 DATA REDUCTION BY 64.34 % NO DATA COMPRESSION & NO REDUCTION WITH DATA COMPRESSION & REDUCTION !!! NATO UNCLASSIFIED

  35. NATO UNCLASSIFIED

  36. Peribit Sequence Reducers www.peribit.com NATO UNCLASSIFIED

  37. NC3A TEST RESULT SUMMARYExpand Model 4800 “WAN Link Accelerators” 512 kbps satellite link Multiplexed TCP/IP Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED

  38. NC3A TEST RESULT SUMMARY 512 kbps satellite link Multiplexed TCP/IP Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED

  39. 512 Kbps satellite link10 multiplexed TCP/IP sessions Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED

  40. Packeteer NATO UNCLASSIFIED

  41. Industry New area but many & increasing number of companies Peribit.com (now Juniper Networks) Expand.com (Expand Networks) Packeteer.com Riverbed.com Silver-peak.com ….. National authorities (e.g. USA & GE) also working with industry to incorporate SR/WX technology into national crypto devices NATO UNCLASSIFIED

  42. SEQUENCE REDUCTIONNext Generation CompressionSummary (1) WANs will form backbone of Network Enabled Operation This technology provides significant improvements in capacity Dictionary based – implements learning algorithm • Dynamically learns the “language” of the communications traffic and translates into “short-hand” • Continuously updates/improves “knowledge” of link “language” • Frequent patterns move up in dictionary, infrequent patterns move down and eventually can age out • No fixed packet or window boundaries • Unlike conventional compression which operates over 1-2 Kbytes • Once a pattern is learned and put in dictionary it will be compressed wherever it appears • Data compression is based on previously seen data • Performance improves with time as “learning” increases • Very quickly at first (10 –20 minutes) and then slowly • When a new application comes in, SR adapts to its “language” NATO UNCLASSIFIED

  43. SEQUENCE REDUCTIONNext Generation CompressionSummary (1) • Significant advantages for WANs where capacity is an issue (i.e. deployed/mobile/tactical) • Removes redundant/repetitive transmissions • Packet-flow acceleration (latency removal) can be easily added • Quality of Service & Policy Based Multipath can also be implemented • Does not impact security implementations (cryptos between SRs) However • Presently available from a few sources, each with its “proprietary” technology NATO UNCLASSIFIED

  44. Conclusions • Shannon Information Theory provides tools for measuring “information” as “Entropy” • Has formed the basis for most of the coding, data transmission/detection results since 1950s • DNA / Genome mapping process has also apparently benefited from it • In 90s estimate for human genome was 20-30 years; took 2-3 years with the computational developments in late 90s • A new form of compression, “Sequence Reduction” provides significant reductions by reducing redun-dancies in transmitted data • Will provide important advantages for mobile/deployable/moving WAN link applications NATO UNCLASSIFIED

  45. Questions Comments This presentation & associated paper can be found at www.nc3a.info/MCC2006 NATO UNCLASSIFIED

  46. NC3A Brussels Visiting address: Bâtiment ZAvenue du Bourget 140B-1110 BrusselsTelephone +32 (0)2 7074111Fax +32 (0)2 7078770 Postal address:NATO C3 AgencyBoulevard Leopold IIIB-1110 Brussels - Belgium NC3A The Hague Visiting address: Oude Waalsdorperweg 612597 AK The HagueTelephone +31 (0)70 3743000Fax +31 (0)70 3743239 Postal address:NATO C3 AgencyP.O. Box 1742501 CD The HagueThe Netherlands NC3A NATO UNCLASSIFIED

  47. Markov model examples NATO UNCLASSIFIED

  48. Zeroth approximation to English (zero memory) [Zero order Markov : equally likely letters, 27 numbers ] AZEWRTZYNSADXESYJRQY_WGECIJJ_OB _KRBQPOZB_YMBUAWVLBTQCNIKFMP_KMVUUGBSAXHLHSIE_MAULEXJ_NATSKI All logs base 2 Entropy =  pi log (1/pi) for i = 1 to 27 = log 27 = 4.75 bits / letter (or symbol)

  49. First approximation to English (zero memory) [Zero order Markov : letter probabilities, 27 numbers ] AI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYROO_POER_SETRYGAIETRWCO__ EHDUARU_ EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R __UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E Entropy =  pi log (1/pi) for i = 1 to 27 = ~ 4 bits / letter

  50. Second approximation to English (memory) [First order Markov : e.g. prob(a|a), prob(b|a), prob(c|a), … , 27 x 27 = 729 numbers, some zero] URTESHETHING_AD_E AT_FOULE_ ITHALIORT_WACT_D_STE_MINTSAN_OLINS__TWID_OULY_TE_THIGHE_CO_YS_TH_HR_ UPAVIDE_PAD_CTAVED_QUES_E Entropy =  pi,k log (1/pi/k) for i = 1 to 729 (= 27 x 27) = ~ 3.3 bits / letter

More Related