210 likes | 316 Views
Towards Secure Dataflow Processing in Open Distributed Systems. Juan Du, Wei Wei , Xiaohui (Helen) Gu , Ting Yu. 1 /21. Outline. Introduction Design and Algorithms Experimental Evaluation Related Work Conclusion. 2 /21. Dataflow Processing in Distributed System. f 1. f 5. f 5.
E N D
Towards Secure Dataflow Processing in Open Distributed Systems Juan Du, Wei Wei, Xiaohui (Helen) Gu, Ting Yu 1/21
Outline • Introduction • Design and Algorithms • Experimental Evaluation • Related Work • Conclusion 2/21
Dataflow Processing in Distributed System f1 f5 f5 f3 f2 f2 …,f2(f1(di)),… …,f1(di),… f1 …,f3(f2(f1(di))),… …di,… …di,… f4 …,f3(f2(f1(di))),… Component provider Data processing component di ADU Dataflow 3/21
Run in Open Distributed Systems • Dataflow Processing Applications • Network traffic monitoring • Sensor data analysis • Audio/video surveillance • Scientific data processing • Advantages in Open Distributed Systems • Highly scalable and available infrastructures • No need to maintain hardware and software • Challenges in Open Distributed Systems • Component providers come from different security domains • Not all data processing components are trustworthy 4/21
ADU Attack f1 f5 f5 f3 f2 f2 … f2(f1(d1), d0 … f2(f1(d1) … f1(d2), f1(d1) f1 … d2, d1 f4 Component provider Malicious component Data processing component di ADU Dataflow 5/21
Dataflow Topology Attack f1 f5 f5 f3 f2 f2 … f1(d2), … f1 f4 …f3(f5(f2(f1(d2)))), … …f3 (f2(f1(d2)))), … Component provider Malicious component Data processing component di ADU Dataflow 6/21
Function Integrity Attack f1 f5 f5 f3 f2 f2 … f1(d2),… … f0(f1(d2)),… … f1(d2), … f1 f4 Component provider Malicious component Data processing component di ADU Dataflow 7/21
System Design • Attack Models • ADU attack • Dataflow topology attack • Function integrity attack • Assumptions • Third-party component providers could be malicious • Composers and users are trusted • PKI is deployed in advance • Goals • Provide integrity and confidentiality for dataflow processing applications • Focus on discussing integrity issues 8/21
Provenance-based ADU Protection • d • receipt • d • d • [sqn, session_Id, hash(d)]sign_s2 • “Receipt” packet • ADU dropping attack • s2 may claim it does not receive d • s1 may claim it sends d, but it doesn’t 9/21
Provenance-based ADU Protection • f1 • f2 • f2(f1(d)) • d • f1(d) • [[h(d), h(f1(d))]sign_s1]key_c • [[h(d), h(f1(d))]sign_s1]key_c • [[h(f1(d)), h(f2(f1(d)))]sign_s2]key_c • input • output • input • output • Provenance evidence • Cached or carry-on evidence • Consistency verification between different components 10/21
Dataflow Topology Protection • C s1 s2 s3 C • C sig_c sig_c sig_c sig_c key_s1 key_s3 key_s2 [s1][s2][s3][C] C • f1 • s1 • f2 • s2 • s3 • f3 • C • Cascading topology encryption • Any component cannot change the dataflow topology • Each component only knows its previous hop and next hop 11/21
Dataflow Topology Protection • C s1 s2 s3 C • C sig_c sig_c sig_c sig_c key_s1 key_s3 key_s2 [s1][s2][s3][C] C • f1 • s1 [s1]sig_c[s2]sig_c[s3]sig_c[C]sig _ c key_s3 key_s2 • f2 • s2 • [s2]sig_c[s3]sig_c[C]sig _ c key_s3 • s3 • f3 • [s3]sig_c[C]sig _ c • C • Cascading topology encryption • Any component cannot change the dataflow topology • Each component only knows its previous hop and next hop • Onion routing [Goldschlag, et al., 1999] 12/21
Function Integrity Attestation • f1 • f2 • s1 • f1(d1) , f1(d3) • s5 • f2(f1(d1)) , f2(f1(d3)) • d1 • d3 • f1(d2) • s6 • s2 • f2(f1(d2)) • d2 • d3 • d2 • d1 • C • C • d3’ f1(d3’) • f2(f1(d3’)) • s3 • s7 • d2’ f2(f1(d2)) = = f2(f1(d2’)) ? • s8 • s4 • f1(d2’) • f2(f1(d2’)) f2(f1(d3)) = = f2(f1(d3’)) ? • Randomized data attestation • Achieve scalable function integrity attack detection • Duplicate a random subset of ADUs • Send duplicates to selected functionally equivalent components • Check result consistency • Continuously perform randomized data attestation 13/21
Implementation and Experimental Setup • Implementation • Implement a prototype of the secure dataflow processing • Follow the design of the IBM System S • Experiment setup • Conduct experiments on Planetlab • Use about 200 hosts • One host represents one component provider • Composer deployed on a pre-defined Planetlab host 14/21
Evaluation • Overhead caused by basic protection schemes • Randomized data attestation • Overhead • in terms of dataflow processing delay • (time of dngetting out - time of d1 getting in ) / n • Detection probability • non-collusion • collusion 15/21
Overhead of Basic Protection Schemes The overhead is about 10~15% for both secure dataflow schemes
Overhead of RandomizedData Attestation • # of redundant components k = 5 • data size = 1KB • data rate = 10 ADUs/sec • duration = 30s • Avg dataflow processing delay increases with the number of redundant components used • Due to sub-optimal dataflow topology
Detection Probability Detection probability increases with duplication probability puand number of redundant components used Detection is harder in collusion scenarios than that in non-collusion scenarios 18/21
Related Work • Distributed dataflow processing • Focuses on resource and performance management issues • Assumes that data processing components are trustworthy • Trust management in distributed systems • Distributed messaging systems [Haeberlen, et al. SOSP 2007] • Pub-sub overlay [Srivatsa, et al., CCS 2005] • None of them addressed secure and scalable dataflow processing in open distributed system • Byzantine fault-tolerance • in Wide area networks [Amir, et al., DSN 2006] • No trusted party 19/21
Conclusion • Finished Work • The first attempt to address the integrity of dataflow processing application delivery on open distributed systems • Identify and classify major security attacks • Propose a set of effective protection schemes • Future Work • Non-linear dataflow topology • Integrity attestation on stateful function • Further identify malicious component 20/21
Thank you • Questions? 21/21