1 / 19

Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems

Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems. Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener. 2011-09-23. IDEAS 2011. Agenda. Problem Statement Calibration of Cost Models Function Approximation

phila
Download Presentation

Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23 IDEAS 2011

  2. Agenda • Problem Statement • Calibration of Cost Models • Function Approximation • Estimating the Costs of Single Operators • Evaluation • Summary • Perspective: Cost Estimation for Federated DSMS Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  3. DSAM: heterogeneous distributed data stream processing Automatic cost-based query distribution Problem: hardware and DSMS specific cost models needed Problem Statement Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  4. Operator graph Topology Data rates Selectivity Distribution of certain values For some operators: Cost model Calibration of Cost Models Things we know a priori Stream characteristics Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  5. Hardware and DSMS-specific parameters of cost models System costs For some operators: cost model Function approximation Things we do not know a priori Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  6. Cost model consists of • Stream and operator-dependent parameters • Constant values • Hardware/System/Implementation dependent values • Test queries and input streams • Different values for the stream and operator dependent parameters • Cost Measurements • Least squares • Outlier detection (e.g. RANSAC) Calibration of Cost Models - Parameter Estimation Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  7. No appropriate cost model • Operator without existing cost model • Existing cost models could not be fitted to a specific system • Solution: function approximation • Radial Basis Function Network (RBNF) • Function approximation instead of interpolation • Less centers than input points • Moore-Penrose pseudoinverse  least squares solution • Improving the function approximation • Iterative approach • Naive function approximation • Improving areas of interest (e.g. discontinuities, high gradient) Function Approximation – Nonparametric Models Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  8. Assumptions • Only the system costs can be measured • The costs of a single operator are independent of other operators additivity • System costs linear dependent on the number of operators • Parallel instances of the same operator • Latency • Parallel operators  latency not dependent on the number of operators • Operators have to be connected in series Estimating the Costs of Single Operators Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  9. Coral 8 • Test setting • Synthetic input streams with constant properties (rate, attribute value distribution) • Every test query running for two minutes • The test data collected in the first minute is discarded • Measured values • Latency • Memory consumption (resident set size) • CPU usage • Coral8 status stream • Input and output rate • Query latency • Application Memory Evaluation Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  10. Filter operator • Application memory • CPU usage • Unexpected behavior: steps and peaks Coral8 Measurements Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  11. CPU usage linear dependent on the number of operators Slope equals the costs of a single operator Costs of Single Operators Operators Operators Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  12. Application memory of the aggregate operator • Left side: Calibrated cost model • Linear cost model • Right side: Function Approximation • Adapts to the steps Model Calibration and RBFN Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  13. Operator graph consisting of 100 parallel filter operators Cost estimation using function approximation Cost Estimation for Operator Graphs Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  14. Cost estimation for black-box systems without cost estimators • Calibration of a cost model • Default cost model • System-specific cost model • Function approximation • Calibration of a cost model for unknown systems • Behavior conforming to cost model is required • Nonconforming behavior can be detected (automatically) after some measurements • Evaluation • CPU usage and memory consumption can be estimated • Latency: Queuing theory Summary Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  15. Cost formulas as metadata • Cost formulas containing constants, variables and parameters • Cost estimation • Hardware-dependent and system-dependent parameters loaded from metadata catalog • Operator-dependent variables by a metadata provider • Stream-dependent variables by a monitoring component or an estimator • Interpreter to calculate costs • Advantages • Both default and system specific cost formulas possible • Cost models interchangeable at runtime Application: Cost Estimation for Federated DSMS Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  16. Any questions…? Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  17. Identifying parameters • Cost model based • Identifying query or stream-dependent parameters • Generating a set of test data for the parameters • Mapping the parameters to the query language and stream properties • Operator or query language based • No existing cost model • Function approximation • Identifying important parameters based on the query language and possible stream properties • Generating a set of test data Generating Test Data and Test Queries Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

  18. Problem statement Distributed Query Processing Global Query Graph SSDBM 2010 Node 1 Data Rate, Density, Statistics Op1 Op2 Stream1 ??? Node 3 Op6 Op5 Node 2 Op3 Op4 Stream2 ??? ??? Out ??? ??? Data Rate, Density, Statistics Relevant metadata about inner streams unknown ??? Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener

  19. Propagation of input streams‘ statistics • Propagation of statistics for inner streams between operators • Propagation of statistics for output streams • Statistical objective: Attribute Value Distribution (Density) • Analytic Operator Model • Accurate Formulas • Numerical Operator Model • Discrete Mappings • Training of mapping relation Propagation of Densities Operator Input-Stream Output-Stream Data Rate, Density, Statistics Data Rate, Density, Statistics Operator Model Analytic Operator Model Numerical Operator Model SSDBM 2010 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener

More Related