230 likes | 348 Views
Applying Control Theory to Stream Processing Systems. Wei Xu ( xuw@cs.berkeley.edu ) Bill Kramer ( kramer@lbl.gov ) Joe Hellerstein ( hellers@us.ibm.com ). TCQ drops tuples silently if result queue is full. Description of the system. TCQ Complex internal structure. Data Source.
E N D
Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu) Bill Kramer (kramer@lbl.gov) Joe Hellerstein ( hellers@us.ibm.com )
TCQ drops tuples silently if result queue is full Description of the system TCQ Complex internal structure Data Source Input Buffer
Why do we need control? • Data source does not provide accurate data rate
Why do we need control? • TCQ node drops tuples when result queue fill up Source Buffer TCQ Result Q
Control Problems • Providing an accurate data source • Get the actual data rate • Regulate queue length on TCQ node • Prevent dropping tuples • Maximize throughput (and adapts when disturbance happens)
2 Queue Length Monitor System with Control Controlled Data Source Output Rate Controller
PI Controller The Control Architecture P Controller
Result – An accurate data source P Controller with Pre-compensation PI Controller
Result – regulating queue length Source Buffer TCQ Result Q
Result – Under CPU Contention Source Buffer TCQ Result Q
Why theory is useful? • One of my implementations .. What happened? Source Buffer TCQ Result Q
What is going on? Controlled Output Thread(Code Reuse) Queue Length Controller Desired Queue length Data Rate to TCQ Actual Queue Length
Output Y from simulation Theory meets reality Queue length Time
Tricky part of parameter estimation Model evaluation – Making the system operate in desired range Data rate vs free space Free Space Non-Linear range Easy for data source, but queue length ..
Settling Time and Overshoot matters A lot of small disturbance in a Java program Incremental garbage collection P Controller PI Controller
Conclusion • Advantages of feedback control • Make system more robust under disturbance • Treat complex systems as black boxes • Cope with the system characteristics instead of having to change it • Encourage reporting system statistics • Implementation is easy and has theoretical guarantees
Future Work • Load balancer • Smaller sample time to reduce disturbance caused by Java GC? • Controller on scheduling of system shared by multiple streams
Outline • Problems and Motivation • Controller design • Result • Discussion
Description of the System Tuples TCQ Node Tuple Blocks Routing Logic Input Buffer Data Source TCQ Node Load Splitter Tuples Queue length • Operation of Load Splitter • Arriving blocks wait in Input Buffer • Tuples are routed to balance TCQ queue lengths • Stop routing if queue length is too large to avoid tuple discards Revised
Compare to Open Loop Control We know Y(k) , and we know what we want y(k+1) to be.. Use transfer function to solve for u(k)… (Expected result – accuracy and disturbance ) -- do be done
Estimation of the transfer function y(k+1)=ay(k)+bu(k) Regression
Tricky part of parameter estimation Model evaluation – A data rate that make it operate in linear range