1 / 23

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data St

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores. Pengcheng Xiong (NEC Labs America) Hakan Hacigumus (NEC Labs America ) Jeffrey F. Naughton (Univ. of Wisconsin). Agenda. Why? Motivation and background How?

ajay
Download Presentation

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data St

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Software-Defined Networking based Approach for Performance Management of Analytical Queries onDistributed Data Stores Pengcheng Xiong (NEC Labs America) Hakan Hacigumus (NEC Labs America) Jeffrey F. Naughton (Univ. of Wisconsin)

  2. Agenda • Why? • Motivation and background • How? • System architecture and implementation • So what? • Real system and benchmark query evaluation • Conclusion

  3. Motivation • Data analytics applications or data scientists query the data from distributed stores. • A huge amount of data traffic on the network. • Join • Many applications want to share a cluster • Data backup, video streaming, etc • Response time is critical • Deadline-driven reports • Query service differentiation • Batch queries, interactive queries

  4. An example query (TPC-H Q14) We assume that tables are distributed at relational data stores. Relational data stores are connected by networking

  5. Network change implies plan perf. change (2) The best plan can become the worst one Network status changes (1) Huge gap Phase 1 Phase 2 Phase 3

  6. What if? What if query optimizer can dynamically monitor the network bandwidth and adaptively choose plan? Adaptive plan is chosen and query execution time is kept short. Phase 1 Phase 2 Phase 3

  7. Network busy implies no good plan Well… I am sorry. None of the candidate plans can meet your deadline due to current busy network status. Run query right now and right away. I need that ASAP to catch my deadline! Distributed DBMS User

  8. What if? OK. Although current network is busy, I can control it to prioritize the bandwidth for the query. Run query right now and right away. I need that ASAP to catch my deadline! Distributed DBMS User What if query optimizer can control the network?

  9. Distributed query optimizer monitors and controls the network?

  10. Sounds like a mission impossible • Database always treats the underneath networking as a black box • unable to monitor • let alone to control • With software-defined networking • inquire about the current status of the network, or • control the network with directives Able to inquire and control Unable to monitor, let alone to control Networking Networking With SDN

  11. Sounds interesting, but how? Ethernet Switch/Router

  12. Control Path (Software) Data Path (Hardware)

  13. Dist. Query Optimizer Our contribution API OpenFlow Controller Control Path OpenFlow OpenFlow Protocol (SSL/TCP) Data Path (Hardware)

  14. System architecture

  15. System implementation NEC PFS5240 Beacon

  16. Plan generation Stores lineitem table Stores part table

  17. Cost estimation • Cost model for network operator • Amount of data transferred • Real-time transfer speed • (Monitor) • Take any bandwidth left • (Control) • Assign the highest priority • Make a bandwidth reservation SDN support

  18. Evaluation • Setup • TPC-H, scaling factor 100, Q14 • Small tables (supplier, nation, region) are replicated. • Other tables are placed at a single data store site • Neighbor traffic generator-iperf • Summary of case studies

  19. Case 1: single user, single-thread, iperf Bottleneck Based on SDN, query optimizer can dynamically monitor the network bandwidth and adaptively choose the best plan Bottleneck Bottleneck Phase 1 Phase 2 Phase 3

  20. Case 3: multiple users, multiple-thread,no contention traffic, priority queue Based on SDN, premium queries run faster than regular ones. Based on SDN, all queries run faster.

  21. Case study 5: single user, multi-thread, iperf, weighted-fair queue Based on SDN, more reservation makes queries run faster.

  22. Conclusion • SDN can be effectively exploited for performance management of analytical queries on distributed data stores • Directly monitor the network and adaptively pick the best plan. • Control the priority of network traffic or make network bandwidth reservations to differentiate the query service. • Lots of opportunities

  23. Thanks!

More Related