490 likes | 650 Views
Quality Management in Multimedia Databases and Data Stream Management Systems. Yicheng Tu Department of Computer Sciences Purdue University Advisor: Prof. Sunil Prabhakar. Quality?.
E N D
Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue University Advisor: Prof. Sunil Prabhakar Final Exam, May 25, 2007
Quality? The nature, kind, or character (of something). Hence, the degree or grade of excellence, etc. possessed by a thing.Restricted to cases in which there is comparison (expressed or implied) with other things of the same kind. - Oxford English dictionary character with respect to fineness, or grade of excellence … - Dictionary.com Final Exam, May 25, 2007
Our Definition series of parameters that describe the characteristics of data processing and lead to different degrees of user satisfaction • Overlaps with the concept of Quality-of-Service (QoS) • Not data quality Final Exam, May 25, 2007
Problems • Two types of problems • Determine the quality of concurrent applications for maximal user satisfaction • To maintain quality of applications under highly dynamic environments • Problems are system and application-specific • Various techniques/solutions are involved. • Resource reservation • Application adaptation Final Exam, May 25, 2007
Roadmap • Introduction • Controlling delays in data stream management systems (DSMSs) • Quality-aware (media) data replication • Other works Final Exam, May 25, 2007
Data Stream Management Systems • Data-active query-passive model • Continuous query • Continuous data, discarded after being processed • Applications • Financial analysis • Mobile services • Sensor networks • Network monitoring Final Exam, May 25, 2007
Load Shedding • Data processing in DSMS is quality-critical • Tuple processing delay • Data loss • Sampling rate, window size, … • Overloading during spikes degraded quality (processing delay) • Solution: load shedding (i.e., adjust data loss) • Eliminating excessive load by dropping data items • Users tolerate approximate query results Final Exam, May 25, 2007
Load Shedding: Challenges • Constantly discarding most packets would work • What happens to query accuracy? • The real (and hard) problem is: How to maintain processing delays while minimizing data loss ? • Specifically • When? • How much? • For how long? • Which ones to discard? Final Exam, May 25, 2007
State-of-the-Art • Data triage (Reiss & Hellerstein, ICDE06) • Put data into an fast-track analyzer upon overloading • LoadStar (Chi et al., VLDB05) • Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04) • QoS-driven load shedding (Tatbul et al., VLDB03, 06) All utilize intuitive rule-of-thumb algorithms to decide when,how much, and how long Does not work under bursty arrival pattern and variable tuple processing cost Final Exam, May 25, 2007
Our Approach • Insight: treat load shedding as a control problem • Control: manipulation of system states (outputs) by adjusting input(s) to system • In our problem • processing delay -> output • amount of load injected -> input • Problem reformulation: Let the output track the desirable value by changing the amount of load discarded delay time Final Exam, May 25, 2007
Feedback Control • Suitable for rejecting the effects of disturbances • Main components form a feedback control loop Reference Value yd Disturbance e(k) = yd - y(k) + S Actuator Controller Plant – Plant: DSMS engine Actuator: load shedder y: average data processing delay yd: desired processing delay e: control error u: allowed load into DSMS Final Exam, May 25, 2007
Issues • System modeling • Critical for control loop design • Analytical models desirable but not currently available • Experimental methods can be used • Controller design • Database-specific challenges • Lack of real-time measurement of output signal y • Actuator may not be able to implement control signal correctly Final Exam, May 25, 2007
Modeling Borealis • Interestingly, system identification of Borealis shows a first-order model with single-queue characteristics • In other words (block diagram) Final Exam, May 25, 2007
Controller Design • Design based on pole placement • Locations of pole(s) determine how fast/well the system responds • Guaranteed performance targets • Convergence rate - responsiveness • Damping - smoothness • The controller: Final Exam, May 25, 2007
DSMS-specific challenges • A database system is different from a traditional control system in many ways • Lack of real-time measurement of output signal y • Actuator may not be able to implement control signal correctly • Solutions are provided in the context of DSMS • Need more systematic study from a control viewpoint Final Exam, May 25, 2007
Experiments • Controller and load shedder implemented in a real DSMS - Borealis • Synthetic (“Pareto”) and real (“Web”) data streams • Query network with variable average processing cost • Experiments for comparison • Aurora - open loop • Baseline - primitive feedback control Final Exam, May 25, 2007
Experiments: Inputs Final Exam, May 25, 2007
Main Results - Synthetic Data Final Exam, May 25, 2007
Main Results - Real Data Final Exam, May 25, 2007
Main Results - Data Loss Final Exam, May 25, 2007
Summary on Load Shedding • Load shedding is an effective quality adaptation method in DSMSs • Ad hoc solutions do not work well under dynamic load • A load shedding approach based on feedback control theory shows promising results in a real-world DSMS • Control theory could provide solutions to other database problems • However, we need to address new challenges that are unique in database problems Final Exam, May 25, 2007
Roadmap • Introduction • Controlling delays in data stream management systems (DSMSs) • Quality-aware (media) data replication • Other works Final Exam, May 25, 2007
Quality-Aware Queries in Multimedia DBMS • Quality = QoS • Querying the DB with quality parameters SELECT vid:[s] FROM VidLib1 WHERE (vid, s) IN FindVideoWithObject( Someone ) QUALITY Resolution = High, Color_depth = Low Final Exam, May 25, 2007
Quality-aware Data Retrieval • Quality (QoS) critical for media data • Varieties of user quality requirements • Determined by user preference and resource availability • Large number of quality combinations • Adaptation techniques to satisfy quality needs • Dynamic adaptation: online transcoding • Static adaptation: retrieve precoded replica from disk Final Exam, May 25, 2007
Dynamic Adaptation • Transcoding is very expensive in terms of CPU cost • Situation may improve in the future • Layered coding • Not standardized yet. • Less popular than people expected Final Exam, May 25, 2007
Static Adaptation • Little CPU cost • Choice of many commercial service providers • What about storage cost? • On the order of total number of quality points • Ignored in previous research assuming • Very few quality profiles • Storage is dirt cheap • Excessively high for service providers Final Exam, May 25, 2007
Quality-Aware Replication • Replicas are of different “quality” • Destination: point(s) in a metric quality space • Costs of transformation among different qualities are very high • Applications • Multimedia • Materialized view • Biological structure • Good news: read-only • Bad news: too much storage needed Final Exam, May 25, 2007
Two Quality Models • Hard-Quality: Users are strict in their quality needs • Quality A cannot serve a request for quality B • Online transcoding is needed • Soft-Quality: Users are willing to negotiate/compromise • Quality A can serve a request for quality B • With some penalties (quantified by utility functions) Final Exam, May 25, 2007
Hard-Quality Systems • Problem is to minimize reject rate (probability) P under an overall storage constraint C, given • fk: query rate to that quality k • uk: service time for quality k • sk: storage consumption for quality k • ck: CPU consumption for quality k • Map system to a multi-rate Erlang loss system • Reduced the problem to a 0-1 Knapsack • A (good) heuristic solution: • Sort all qualities by their fk /sk values and fill in the storageC Final Exam, May 25, 2007
Soft-quality system: the fixed-storage replica selection (FSRS) Problem • An optimization: get the highest utility given the popularity (fk), storage cost (sk) of all quality points under total storage S • u(j,k): the utility when a request on qualityjis served by quality k • Utility is given as a function of distance in quality space • Requests served by the closest replica Final Exam, May 25, 2007
The FSRS Algorithms (I) • Problem is NP-hard: a variation of k-mean • We propose a heuristic algorithm named Greedy • Aggresively selects replicas based on the ratio of marginal utility gain (∆u) to cost (sk) • Time complexity: O(m2I) where I is the # of replicas selected and m the total # of possible replicas • selected replica set P := Φ • available storage s’ := S • while s’ > 0 • add the quality point that yields • the largest ∆u/sk value to P decrease s’ by sk • return P Final Exam, May 25, 2007
The FSRS Algorithms (II) • Greedy could pick some bad replicas, especially the earlier selections • Remedy: remove those bad choices and re-select • The Iterative Greedy algorithm: • Time complexity: same as Greedy with a larger coefficient P ← a solution given by Greedy while there exists solution P’ s.t. U(P’) > U(P) doP ← P’ returnP Final Exam, May 25, 2007
Other Extensions • Our FSRS algorithms can be easily extended to handle • Multiple media objects • Further user-specified constraints on replicas to be selected • Multiple servers Final Exam, May 25, 2007
Dynamic Replication • Popularity f of replicas could change over time • We only consider the situation where popularity of all replicas of a media object changes together • Reasonable assumption in many systems • Competition for storage among media objects • Desirable dynamic replication algorithms: • Find solutions as optimal as those by static FSRS algorithms • Fast enough to make online decisions • Naïve solution: run Greedy every time a change of f occurs Final Exam, May 25, 2007
Replication Roadmap (RR) • Consider the order replicas are selected by Greedy – follow a predefined path (RR) for each media object • RRs are all convex • Exchanges of storage may happen between two media objects, triggered by the increase/decrease of f • The one that becomes more popular takes storage from the least popular one • The one that becomes less popular gives up storage to the most popular one • It is efficient to make exchanges at the frontiers of the RRs, no need to look inside Final Exam, May 25, 2007
Replication Roadmap (continued) • Storage exchanges, example: Media A should take storage from media B as the slope of its current segment in RR is greater than that of B’s Final Exam, May 25, 2007
Dynamic FSRS algorithm • Based on the RR idea • Proved performance: results given are as optimal as those chosen by Greedy • Preprocess phase: • Build the RRs • Online phase: • Performing exchanges till total utility converges • Time complexity: O(I log V) whereI: # of storage exchanges occurs and V is the # of media objects Final Exam, May 25, 2007
Effectiveness of FSRS Algorithms • For comparison: • The optimal solution (by CPLEX) • Random selections • Local popularity-based Final Exam, May 25, 2007
Efficiency of FSRS Algorithms • CPLEX < Iterative Greedy < Greedy < Random < Local • Results on a P4 2.4 GHz CPU: Final Exam, May 25, 2007
Dynamic Replication Results • Randomly generated changes of f • Compare with Greedy • Results with (almost) the same optimality as Greedy • Reason: small number of storage exchanges Final Exam, May 25, 2007
Summary on media replication • Storage cost in static adaptation prohibits replication of all qualities • Optimize toward lowest reject (hard-quality) or the highest utility (soft-quality) given storage constraints • Two heuristics are proposed for static replication that gives near-optimal choices • An online algorithm for a dynamic replication problem Final Exam, May 25, 2007
Other Works • VDBMS - a multimedia DBMS • Quality-of-Service Aware Query Processing [EDBT04] • System architecture [MMSJ03, DMS03, ICDE03] • Peer-to-peer media streaming • Performance analysis [MMCN04, TOMCCAP05] • Genetic algorithms [JEC07] • Other topics in data stream systems • Entity-based query processing [VLDB05] • Stream data compression [GSN06] • Signal processing [JMASM07, CSC05] Final Exam, May 25, 2007
Ongoing and Future Research • Further investigate load shedding problem • Handle actuator uncertainty • Other control targets • Is the optimal achievable? • Quality-aware replication: • General case of dynamic replication, why is a random solution not so bad? • Conjecture: Greedy is 4/3-competitive? • Application of control theory in other database topics • Self-tuning databases Final Exam, May 25, 2007
Publications-1 [TKDE07] Y. Tu, J. Yan, G. Shen and S. Prabhakar. Multi-Quality Data Replication in Multimedia Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE). 19(5):679-694, May 2007. [JMASM07] L. Qu andY. Tu.Change Point Estimation of Bi-Level Functions. Journal of Modern Applied Statistical Methods. 5(2), May 2007 [JEC]H. Fang, Q. Wang, Y. Tu and M.F . Horstemeyer. An Efficient Non-Dominated Sorting Algorithm for Evolutionary Algorithms. Accepted to Journal of Evolutionary Computation. [ICDE07] Y. Tu, S. Liu, S. Prabhakar, B. Yao, and W. Schroeder. Using Control Theory for Load Shedding in Data Stream Management. In Procs. of ICDE, pp.490-491, Istanbul, Turkey, April 2007. [GSN06] Y. Xia, Y. Tu, M. Atallah, and S. Prabhakar. Efficient Data Compression in Location Based Services. In Procs. of 2nd International Conference on Geosensor Networks, Boston, MA, October 2006. [VLDB06] Y. Tu, S. Liu, S. Prabhakar, and B. Yao. Load Shedding in Stream Databases - A Control-Based Approach. In Proceedings of VLDB, pp.787-798, September 2006. [TOMCCAP05] Y. Tu, J. Sun, M. Hefeeda, and S. Prabhakar. An Analytical Study of Peer-to-Peer Media Streaming Systems. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP). 1(4):354-376., November 2005. Final Exam, May 25, 2007
Publications-2 [VLDB05] R. Cheng, B. Kao, S. Prabhakar, A. Kwan, and Y. Tu. Adaptive Stream Filters for Entity-Based Queries with Non-Value Tolerance. In Proceedings of VLDB, pp.37-48, August 2005. [DEXA05a] Y. Tu, J. Yan, and S. Prabhakar. Quality-Aware Replication of Multimedia Data. In Proceedings of DEXA, pp. 240-249, August 2005. [DEXA05b] Y. Tu, M. Hefeeda, Y. Xia, S. Prabhakar, and S. Liu. Control-based Quality Adaptation in Data Stream Management Systems.In Proceedings of DEXA, pp. 746-755, August 2005. [CSC05] L. Qu and Y. Tu. Change Point Estimation of Bar Code Signals.In Proceedings of International Conference on Scientific Computing. pp.109-114, Las Vegas, USA, June 2005. [MMJS04] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu. VDBMS: A Testbed Facility for Research in Video Database Benchmarking. ACM/Springer Multimedia Systems. 9(6):575-585., June 2004. [EDBT04] Y. Tu, S. Prabhakar, A. Elmagarmid and R. Sion. QuaSAQ: An Approach to Enabling End-to-End QoS for Multimedia Databases. In Proceedings of Extending Database Technology (EDBT), pp.694-711, Herakolin, Greece., March 2004. [MMCN04] Y. Tu, J. Sun and S. Prabhakar. Performance Analysis of A Hybrid Media Streaming System. In Proceedings of ACM/SPIE Conf. on Multimedia Computing and Networking (MMCN), pp.69-82, San Jose, CA., January 2004. Final Exam, May 25, 2007
Publications-3 [DMS03] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu (alphabetical order). VDBMS: A Testbed Facility for Research in Video Database Benchmarking. In Proceedings of Intl. Conf. on Distributed Multimedia Systems (DMS) 2003, pp.160-166. [ICDE02] W. Aref, A. Elmagarmid, J. Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, A. Rezgui, A. Teoh, E. Terzi, Y. Tu, A. Vakali, X. Zhu (alphabetical order). A Distributed Database Server for Continuous Media. Procs. of ICDE, pp.490-491. San Jose, CA., March 2002. [ICDE06] Y. Tu and S. Prabhakar.Control-Based Load Shedding in Data Stream Management Systems. PhD Workshop, in conjunction with ICDE 2006. Submitted: Using control theory for self-tuning databases. Submitted to journal. Final Exam, May 25, 2007
Thank you! Questions? Final Exam, May 25, 2007
QuaSAQ • Quality-of-Service-Aware Query processing • Users do not need to know low-level details • Cost evaluation toward global optimization goals • Throughput • Utilizing current system/network QoS support to deliver the query results • Theory first presented in Bertino et al., 2003 • Prototyping is essential Final Exam, May 25, 2007
QuaSAQ Architecture • Our approach: • Augment the query evaluation and optimization modules to directly take QoS into account • Major components • Offline multimedia processor • Transcode media objects into copies with different QoS/formats • Estimate resource use • Online components • QoS Browser • Quality Manager • QoS APIs Final Exam, May 25, 2007