10 likes | 149 Views
TCQ query Q. online service. raw log data. Data Collection Automatic analysis. preprocessing. . ?. TCQ query Q. Repository. Sanitized Data. TCQ query R. Controlled Data Source. Output Rate Controller. 6+5+4. 3+2+1. Output Y from simulation. 4. 1. . 4. 1. TCQ query Q.
E N D
TCQquery Q onlineservice raw logdata Data Collection Automatic analysis preprocessing ? TCQquery Q Repository Sanitized Data TCQquery R Controlled Data Source Output Rate Controller 6+5+4 3+2+1 Output Y from simulation 4 1 4 1 TCQquery Q Failure Detection 6 5 4 3 2 1 5 2 5 2 6 6 5 5 4 4 3 3 2 2 1 1 Source 6 3 6 3 Queue Length Monitor feedback loop Buffer TCQ Result Q Controlled Output Thread(Code Reuse) Queue Length Controller Desired Queue length Data Rate to TCQ Actual Queue Length Source Source P Controller with Pre-compensation PI Controller Client write duration is an outlier bytes-served <= 67958 | R_error-code = yes | | R_content-type = yes: true (253/6) | | R_content-type != yes: false (17) | R_error-code != yes | | gmt = 2003-06-24 00:01:07: true (54) | | gmt != 2003-06-24 00:01:07 | | | user-id = 96848766314153157: true (99/6) | | | user-id != 96848766314153157 | | | | gmt = 2003-06-24 02:23:28: true (45) | | | | gmt != 2003-06-24 02:23:28 | | | | | visit-url = 8227...: true (43) | | | | | visit-url != 8227...: false (18005) bytes-served > 67958: true (17733/55) Buffer Buffer TCQ TCQ Result Q Result Q Error Code bytes-served <= 195: 145 (135/9) bytes-served > 195 | R_content-len = yes: 32 (98) | R_content-len != yes | | R_not-cached-reason = yes: 32 (45/19) | | R_not-cached-reason != yes | | | duration <= 15.2 | | | | bytes-received <= 2680: -13 (39) | | | | bytes-received > 2680 | | | | | bytes-received <= 2805: 131 (30/7) | | | | | bytes-received > 2805: -13 (85/13) | | | duration > 15.2: 131 (69/6) Decision Trees Applying Control Theory to Data Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu) Bill Kramer Peter Bodik • Problem: TCQ drops tuples when result queue is full • Goal of control: • By controlling data rate to TCQ node • Regulate queue length on TCQ node • Prevent dropping tuples • Maximize throughput (and adapts when disturbance happens) • Preprocessing Data • Logs are in different format • Information we need may be implicit • Merge information from various sources • Sampling • Sanitize the data • Data stream processing • Continuous queries • Using Telegraph CQ • Preprocessing expressed as SQL queries • Queries over a sliding time window • Run multiple instances for scalability Problem: Actual output is not the same as desired rate for various reasons Goal: Providing an accurate data source using feedback control by controlling the “desired data rate” setting on the output thread Feature Selection Clustering Visualization See Poster Clustering DNS Problems load splitter combiner SLT 1 SLT 2 Scalable Software Architecture for Data Stream Processing If not careful with feedback control … System can become unstable under normal load Control theory analysis help make correct design