170 likes | 197 Views
Explore how research on system scalability, database optimization, and hardware acceleration can improve application performance and scalability in data-intensive environments. Learn about cloud examples and the benefits of hardware acceleration implemented in various systems.
E N D
System Scalability Research Andrew A. Chien William Eckhardt Professor of Computer Science, The University of Chicago Senior Scientist, Argonne National Laboratory IRIS-HEP Kickoff September 7, 2018
September 7, 2018 Traditional Software Scalability SW App • Does the application scale to larger data units? • Does the application scale up over large data sets, experiments? Much larger data unit Many instances of app, data SW App SW App SW App SW App SW App SW App SW App SW App SW App SW App SW App . . . SW App SW App SW App SW App SW App Good, but doesn’t take impact on INFRASTRUCTURE into account. How to scale HEP community research experiments bandwidth?
September 7, 2018 System Scalability Research • Database Example • Query optimization: Predicate/Filter push down • Exploit selectivity, increase scalability/performance of system • Hardware Acceleration • Cloud Example • S3 Select • Optimize infrastructure use AND improve application performane • IRIS-HEP Examples • Distributed analysis, filtering • Optimized data movement across wide area, increase scalability/performance of system • Hardware acceleration
Traditional Query Execution Plan SELECTCOUNT(*), Product.color, Age FROMSaleWHEREDNN(Product.comment) AS Score > 0.9, Age != “old”, GROUPBYProduct, Age, ORDERBYProduct.price, Age Timsort filter: Score > 0.9 DNN DNN inference word2vec lookup readP tokenize filter: Age != “old” decompress protobuf decode readA hash aggregate Sale pre-hash aggregate
Query Optimized ATO Plan Timsort Accelerated Transformation Operator (e.g. UDP) hash aggregate Accelerated Inference Operator (e.g. Nvidia DLA ) pre-hash aggregate DNN filter: Score > 0.9 inference readP word2vec lookup tokenize filter: Age != “old” decompress protobuf decode readA Sale
QO ATO Plan + Flexible Encodings filter: Score > 0.9 radix sort Accelerated Transformation Operator (e.g. UDP) Accelerated Inference Operator (e.g. Nvidia DLA ) pack (price,age) hash aggregate CPU Radix Sort Operator DNN Accelerated Scan Operator (e.g. SIMD [bitweaving, SIGMOD’13]) inference pre-hash aggregate word2vec lookup readP tokenize huff-encode: (product,age) dict-encode: age decompress filter: Age != “old” protobuf decode pack: array of age pack (product,age) readA transpose dict-encode: product Sale
QO ATO Plan + Flexible Encodings + Operator Fusion Accelerated Transformation Operator (e.g. UDP) filter: Score > 0.9 radix sort DNN Accelerated Inference Operator (e.g. Nvidia DLA ) inference hash aggregate word2vec lookup CPU Radix Sort Operator pre-hash aggregate Accelerated Scan Operator (e.g. SIMD [bitweaving, SIGMOD’13]) fused fused fused filter: Age != “old” Sale
Example Benefit Overall query benefit can be 10-100x! (looking hard at the data that matters)
September 7, 2018 A Cloud Example: Data Analysis • Iterators over all objects in an S3 bucket • S3 select • Interesting: Pricing and business model (when you own the endpoints and network COST)
September 7, 2018 Hardware Acceleration: Big Wins Xeon E5620 (8-thread, 340mm2, 80W) UDP (64-lane, 8.7mm2, 0.86W) UDP Hardware Implementations
September 7, 2018 CSV Parsing 1 UDP lane is 4x faster than 1 CPU thread UDP is 1000x energy-efficient thanCPU, 64-lane UDP: 12GB/s, 8-thread CPU: 0.4GB/s
September 7, 2018 Snappy Compression UDP is 270x energy-efficient thanCPU, 64(21)-lane UDP: 3.2 GB/s, 8-thread CPU: 1.0GB/s 1 UDP lane matches 1 CPU thread
September 7, 2018 64-lane UDP vs. 8-thread CPU Significant speedup on all ETL workloads, mean speedup >20x
September 7, 2018 What does this mean for IRIS-HEP? • Distributed Data Lake, Shared General Data format (across experiments) • Scalable analysis pulls data from Lake, and ships to computing resources [analysis] • Variety in analysis experiments and data use and availability of compute resources IMPLIES large data movement <rob gardner picture>
September 7, 2018 Example Research Topics • Vertical (distributed) partitioning and filtering • Programmable hardware acceleration [10-100x size reduction] • => Can dramatically increase System scalability and HEP application science capability
September 7, 2018 Discussion
September 7, 2018 backup