140 likes | 229 Views
Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories. Alasdair J.G. Gray 1 M. Howard Williams 1 Werner Nutt 2 1 School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK. 2 Faculty of Computer Science,
E N D
Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories Alasdair J.G. Gray1 M. Howard Williams1 Werner Nutt2 1School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK. 2Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. 5th December 2006
Overview • Publishing distributed data streams • Incomplete stream histories • Answering conjunctive queries • Conclusions and Future Work A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Main sources: sensors Characteristics: Unbounded Append only Frequency Managed by: Sensor networks Network/Grid monitoring Ubiquitous/Pervasive computing environments Streams of Data Reading A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Consumer Consumer Secondary Producer Registry R-GMA as a Data Publishing Service • Grid Monitoring and Information Service • Strategy • PP & SP register in Registry using global schema • Consumer issues queries over agreed global schema • Mediator translates global query into local queries over sources Primary Producer Primary Producer Primary Producer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Consumer Secondary Producer Data Stream Histories • Three types of queries • Primary Producers publish stream of data • Secondary Producer • Collects streams • Stores history in database • Only stores finite amount • Consumer queries stream history Store Primary Producer Primary Producer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Problem of Incompleteness • Distribution: streams published by distributed sources • Network failures, lost data • Configuration errors • Finite memory: Secondary Producers store finite amount of history • Each SP has a Retention Period • Old tuples discarded • Different SPs may store similar data but different history length, frequency, etc. A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Representing Missing Data • We assume that: • Producer can detect when there are tuples missing, e.g. • If PP produces fixed frequency • Sensor sequence number • Stream made up of channels • Channel: Tuples agree on key values For each channel, the missing tuples can be represented by a gap consisting of [start, end] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Query Answers Query q can have 3 types of answer tuples: • Certain Positive Answer Tuple would be returned over complete data set • Certain Negative Answer Tuple would not be returned over complete data set • Possible Answer Tuple may be returned over complete data set A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Example: Grid Monitoring Query 1 Machines with more than 5 running jobs in last 24 hours q1(CEId) compEle(CEId, fCPUs, rJobs, ts) /\ rJobs > 5 /\ [hist = 24hrs] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Example: Grid Monitoring Query 2 • Machines with more than 5 running jobs in last 12 hours and • are linked to a storage element • q2(CEId) compEle(CEId, fCPUs, rJobs, ts) /\ CESEBind(CEId, SEId) /\ • rJobs > 5 /\ [history = 12hrs] • Query answer: (2) • Query can be answered completely despite missing the data. A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Example: Grid Monitoring Query 3 • Machines with more than 5 running jobs in last 24 hours and • are linked to a storage element with a load greater than 75 • q3(CEId) compEle(CEId, fCPUs, rJobs, ts1) /\ storEle(SEId, cIO, ts2) /\ • CESEBind(CEId, SEId) /\ rJobs > 5 /\ cIO > 75 /\ [history = 24hrs] A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Co-operative Answer • Gaps affected answer to q3 • Return information about the relevant gaps • Allows users to reason about the effects of the incompleteness Very unlikely that there were any answers to q3 A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Conclusions • Data streams are often incomplete • Stored histories of the stream will be incomplete • Presented a model for representing incompleteness • Developed algorithms for: • Answering conjunctive queries • Providing meta-data about the answer A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006
Future Work • Investigate answering queries under different assumptions • Extend expressivity of queries to allow aggregate functions • Develop an implementation by extending R-GMA’s Mediator A.J.G. Gray, M.H. Williams, and W. Nutt iiWAS2006