70 likes | 82 Views
This paper explores the problem of data shedding in data stream processing under overload conditions and proposes a fair shedding mechanism based on the Source Information Content (SIC) metric. It also discusses the challenges of implementing shedding in a distributed setup and explores the possibility of a self-aware system for fair data shedding.
E N D
THEMIS: Fairness in Data Stream Processing under Overload Marco Fiscato Imperial College London, UK EvangeliaKalyvianaki City University London, UK TheodorosSalonidis IBM Research, USA Peter Pietzuch Imperial College London, UK 15041 Model-driven Algorithms and Architectures for Self-Aware Computing Systems, Dagstuhl 2015
The Puzzle of Big Data Real-Time Processing Engines in Data Centres Queries overload data center resources. How to efficiently allocate resources across clusters/engines?
Data Shedding a well-known mechanism to handle transient overload conditions is to discard data overloaded overloaded How much data should we shed from queries? How to measure shedding across queries? How to implement shedding in this distributed setup? A well-known technique to handle transient overload conditions is to discard data [][][]
How to measure shedding across queries? shedding data reduced correctness degraded performance different dropped data difference degrees of degradation Source Information Content (SIC) metric measures the contribution of data from sources to results 11/6 < 3 degraded processing perfect processing SIC is a data-stream-processing-aware metric. But can we have a metric that is operator- or query-aware?
Fair Shedding for Equalising SIC values each local shedder equalisesthe SIC values of its own queries global coordination is achieved with local informed shedding
SIC Fair Shedder to address nodes’ heterogeneity and workload variations: online cost model estimates the time to process an average tuple Could we build the system to be goal-aware?
A self-aware autonomic system for data processing in real-time Systems already have (some) adaption and (some) self-awareness but could we extend to (full) self-awareness? For example, can we build a self-aware system to perform fair data shedding for data stream processing and databases andfilesystems in overload? Thank you! Questions? evangelia.kalyvianaki.1@city.ac.uk http://www.staff.city.ac.uk/~sbbj913/