1 / 40

Applications of Computing in Industry: What is Low Latency All About?

Applications of Computing in Industry: What is Low Latency All About?. Divyakant Bengani. Undergrad degree in Management and IT from Manchester Vice President at CS, responsible for eFX Core Technologies Working in the banking industry since 2003 & CS for ~3 years. 2. EFX - What do we do?.

regis
Download Presentation

Applications of Computing in Industry: What is Low Latency All About?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of Computing in Industry:What is Low Latency All About? eFX – January 2014

  2. DivyakantBengani • Undergrad degree in Management and IT from Manchester • Vice President at CS, responsible for eFX Core Technologies • Working in the banking industry since 2003 & CS for ~3 years 2

  3. EFX - What do we do? • Cash FX Only • Spot, Forwards and Swaps • Continuous Publication of Prices • Streaming Executable Rates • Response to Request for Quotes • Acceptance and Booking of Trades 3

  4. Key Statistics • ~200 Currency Pairs (E.g EURUSD / GBPJPY etc.) • 3 billion prices broadcast a day • 60000 trades a day • >200 client connections 4

  5. Technologies Used • Java • C# for UIs • GWT for Web UIs • Oracle Coherence • Oracle DB • Derby DB • Azul Zing JVM • Low Latency Fix Engine 5

  6. Protocols • Socket Connections • Asynchronous JMS • Java RMI • HTTP (JSON, HESSIAN) 6

  7. Payloads • Google Protobuf • Fixed Length Byte Arrays • FIX - Industry Standard • JMS Map Messages • Java Serialization 7

  8. EFX - Overall Architecture 8

  9. Service Discovery • Zero Conf • Dynamically add and remove services • Applications do not need to know about each other - just pick up what’s advertised 9

  10. Automated Testing 10

  11. Code Quality Analysis 11

  12. Continuous Integration 12

  13. HowtoAchieve Low Latency

  14. Daniel Nolan-Neylan • Graduated from UCL in 2004 • Started working at Credit Suisse in 2006 • First, networking for 4 years • Now, Application Developer in FX IT • Different projects: • Distributed caching system for static data • Simplified credit checking library • Pricing and trading gateway (now team lead) Corporate Design, HCBC 1

  15. Wait a second! • Reminder: • 1 second is: • 1,000 milliseconds • 1,000,000 microseconds • 1,000,000,000 nanoseconds

  16. Latency Numbers Every Programmer Should Know • L1 cache reference 0.5 ns • Branch mispredict 5 ns • L2 cache reference 7 ns 14x L1 cache • Mutex lock/unlock 25 ns • Main memoryreference 100 ns 20x L2 cache, 200x L1 cache • Compress 1K bytes with Zippy 3,000 ns • Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms • Read 4K randomly from SSD* 150,000 ns 0.15 ms • Read 1 MB sequentially from memory 250,000 ns 0.25 ms • Round trip within same datacenter 500,000 ns 0.5 ms • Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory • Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip • Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD • Send packet CA->Netherlands->CA 150,000,000 ns 150 ms By Jeff Dean: http://research.google.com/people/jeff/

  17. FX Trading – Latency Numbers • 250ms – A human responding to price update • 30ms – Bank accepting trade • 10ms – Credit checking client • 9ms – JVM Garbage Collecting • 5ms – Persisting a trade to disk • 2ms – JMS networking round-trip • 1ms – Raw socket networking round-trip • 0.5ms – Max wire-to-wire pricing latency • 0.05ms – Min pricing latency • 0.005ms – Writing price to FIX engine

  18. Optimization Quotes • Michael A. Jackson: “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” • Rob Pike: “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.”

  19. Where to Optimize? Use Profiler

  20. Measuring Milliseconds and Nanoseconds in Java • Measure time taken for operations and log: • System.currentTimeMillis() • Good for taking a time/date that can be compared against other systems. Accuracy depends on OS, but 1ms accuracy achievable on modern Unix-based OS (Linux) • Bad if more precise measurements are required • System.nanoTime() • Good for sub-millisecond measurements • Bad if comparable time with other systems required • Realistically, need to use both Corporate Design, HCBC 1

  21. Quote Journalling – log latency of every price Corporate Design, HCBC 1

  22. Our Soak Test Harness Corporate Design, HCBC 1

  23. …and the graphs it can produce Corporate Design, HCBC 1

  24. Removing Millisecond Delays • Identify the longest-running tasks • Usually I/O delays • Disk • Database activity • Synchronous logging • Writing files • Network • Calling network services • Remote services far away (e.g. Across Atlantic ~50ms)

  25. Removing Millisecond Delays (2) • Analyze whether delays can be eliminated • Disk • Database activity -> Use a cache • Synchronous logging -> Use asynchronous logging • Writing files -> Use buffers and write asynchronously • Network • Calling network services -> Cache where possible • Remote services far away -> Co-locate in same place

  26. FX Trading – RFQ Example • E.g. Incoming request for a price, target response time is 10ms • Need to: • Validate request parameters • Internally subscribe for prices • Obtain a globally unique transaction ID • Perform a credit check • How to get all this done in just 10ms?

  27. FX Trading – RFQ Example (2) • Credit check • Old one took 30-200ms • New one takes 5-10ms • Using Caching and Co-location • Parallelize all validation • Pre-cache prices • by opening up price streams in advance of being required

  28. Don’t Optimize Too Soon • Remember: • Only optimize what you need to optimize • Remove longest delays first • No point removing micros if you still have delays of millis or worse • Always measure your operations carefully • Determine what minimum, maximum, mean, standard deviation, and other percentiles are (99%, 99.9%, etc) • Watch for jitter and solve separately

  29. Removing Microsecond Delays • Intra-process delays • Unbalanced / slow queues • Slow algorithms • Expensive loops repeated many times • Poor use of object creation / memory allocation • Contented memory controlled with locks • Wasted effort calculating unwanted results

  30. FX Trading – Pricing Example • Achieving wire-to-wire latencies of 50μs • Google protobuf parsers replaced with low-garbage creating versions • each GC stops the JVM for 9,000μs (i.e. 9ms) • LMAX Disruptors used instead of queues • Busy spin consumer threads / single-write principle • “PriceBigDecimal” class to replace Java BigDecimal class • BigDecimal slow to instantiate and impossible to mutate • No synchronous logging or network calls • Pre-cache static data before starting price stream

  31. Disruptor or Blocking Queues? Corporate Design, HCBC 1

  32. Java BigDecimal or use Low Latency replacement? Corporate Design, HCBC 1

  33. Removing Nanoseconds? • Use specialist hardware (such as FPGA) • Understand low-level CPU interconnectivity with memory, and how CPU caching works (including cache-lines) • http://mechanical-sympathy.blogspot.com • eFX – No need to pursue this level of performance at the moment

  34. Latency vs Throughput • Latency - time taken (typically mean, percentile or worst case) to complete a task • Throughput – the number of tasks completed in a given time period (typically, per second) • Throughput is 1/latency (per pipeline)

  35. Increasing Throughput • Identify delays • Throughput constrained by latency • Blocking I/O calls delay unprocessed messages • Data bursts • What’s the peak throughput required? • What’s the gap typically between bursts?

  36. Techniques to Increase Throughput • Batching • Sometimes latent calls are unavoidable • Using batching can strip overhead of making call per transaction • Cost of batching is the delay incurred waiting for new items to add to batch • More difficult to accurately measure delay per item when multiple items are in a batch

  37. FX Trading – Batching Example • Legacy global server in London • Regional trade acceptance components • Latency between New York and London - 50ms • Per thread: 1/0.05 = 20 trades per second max • How to increase? • More threads • Add batching per thread • Now, with batch size of 5, 100 trades per second per thread.

  38. Techniques to Increase Throughput(2) • Use Asynchronous callbacks • Synchronous calls: • booleandoCall() • Wait for response • Can be delayed for varying time • Asynchronous calls: • void doCall(Callback callback) • Do not wait and keep processing more events • Can additionally overlay timeouts to improve resilience

  39. FX Trading – Asynchronous Callbacks • Submission of trade to price service for verification – was originally synchronous • Call blocks for 50ms – max 20 trades per second per thread • After converting to asynchronous callbacks, the only delay is putting packets on network buffer (μs), so effectively no delay – max numbers of trades is very high!

  40. Q & A eFX – January 2014

More Related