1 / 32

Adaptive Cleaning for RFID Data Streams

Adaptive Cleaning for RFID Data Streams. Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by: Hamid Haidarian Shahri. Where Are We? Look at the Signs!. Looking at Signs – Before Jumping In.

zea
Download Presentation

Adaptive Cleaning for RFID Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by: Hamid Haidarian Shahri

  2. Where Are We? Look at the Signs!

  3. Looking at Signs – Before Jumping In • S. Chaudhuri, U. Dayal, "An Overview of Data Warehousing and OLAP Technology," SIGMOD Record, 1997. • 800+ citations • DW and information integration • “Data cleaning” term publicized • Identified its importance in integration • Extensive research followed

  4. VLDB 2001 • Session R12: DATA QUALITY & CLEANING • Declarative data cleaning: language, model, and algorithms Helena Galhardas (INRIA Rocquencourt), Daniela Florescu (Propel), Dennis Shasha (NYU), Eric Simon, and Cristian-Augustin Saita (INRIA Rocquencourt) • Potter's wheel: an interactive data cleaning system Vijayshankar Raman and Joseph M. Hellerstein (University of California at Berkeley) • Update propagation strategies for improving the quality of data on the Web Alexandros Labrinidis and Nick Roussopoulos (University of Maryland)

  5. Data Cleaning Previous Work - 2006 • Hamid Haidarian Shahri, S.H. Shahri, “Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, Vol. 21, No. 5, 2006.

  6. Putting Things into Context • Data cleaning required after integration • No unified standard across sources • NOW: sensor/hardware errors inevitable; research opportunity • Data modeling (Amol Deshpande) • An important use case is cleaning

  7. VLDB 2006 – Three weeks ago • Research Session 5: Sensor Data (dedicated to cleaning!) • Title: Adaptive Cleaning for RFID Data Streams • Authors: Shawn R. Jeffery, Minos Garofalakis, Michael J. Franklin • Title: A Deferred Cleansing Method for RFID Data Analytics • Authors: Jun Rao, Sangeeta Doraiswamy, Hetal Thakkar, Latha S. Colby • Title: Online Outlier Detection in Sensor Data Using Non-Parametric Models • Authors: Sharmila Subramaniam, Themis Palpana, Dimitris Papadopoulos, Vana Kalogeraki, Dimitrios Gunopulos

  8. RFID: Radio Frequency IDentification

  9. RFID data is dirty • A simple experiment: • 2 RFID-enabled shelves • 10 static tags • 5 mobile tags

  10. RFID data has many dropped readings Typically, use a smoothing filter tointerpolate Smoothing Filter RFID Data Cleaning SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id But, how to set the size of the window? Smoothed output Raw readings Time

  11. Window Size for RFID Smoothing Fido moving Fido resting Reality Raw readings Small window Large window  Need to balance completeness vs. capturing tag movement

  12. Truly Declarative Smoothing • Problem: window size non-declarative • Application wants a clean stream of data • Window size is how to get it • Solution: adapt the window size in response to data

  13. Itinerary • Introduction: RFID data cleaning • A statistical sampling perspective • SMURF • Per-tag cleaning • Multi-tag cleaning • Ongoing work • Conclusions

  14. A Statistical Sampling Perspective • Key Insight: RFID data  random sample of present tags • Map RFID smoothing to a sampling experiment

  15. Tags E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 Tag 1 Tag 2 Tag 3 Tag 4 RFID’s Gory Details Antenna & reader Read Cycle (Epoch) Tag List (For Alien readers)

  16. RFID Smoothing to Sampling  Now use sampling theory to drive adaptation!

  17. SMURF • Statistical Smoothing for Unreliable RFID Data • Adapts window based on statistical properties • Mechanisms for: • Per-tag and multi-tag cleaning

  18. E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 Per-Tag Smoothing: Model and Background • Use a binomial sampling model 1 Si pi piavg (Read rate of tag i) 0 Time (epochs) Smoothing Window wi Bernoulli trials

  19. E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 Per-Tag Smoothing: Completeness • If the tag is there, read it with high probability  Want a large window 1 pi 0 Time (epochs) Reading with a low pi Expand the window

  20. Per-Tag Smoothing: Completeness Desired window size for tag i With probability 1-  Expected epochs needed to read

  21. Per-Tag Smoothing: Transitions • Detect transitions as statistically significant changes in the data The tag has likely left by this point 1 pi 0 Time (epochs) E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 Statistically significant difference Flag a transition and shrink the window

  22. Per-Tag Smoothing: Transitions • Statistically significant # observed readings # expected readings Is the difference “statistically significant”?

  23. SMURF in Action Fido moving Fido resting SMURF  Experiments with real and simulated data show similar results

  24. Multi-tag Cleaning • Some applications only need aggregates • E.g., count of items on each shelf • Don’t need to track each tag! • Use statistical mechanisms for both: • Aggregate computation • Window adaptation

  25. Aggregate Computation • –estimators (Horvitz-Thompson) • Count: • P[tag i seen in a window of size w]: Use small windows to capture movement Use the estimator to compensate for lost readings

  26. E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 Window Adaptation • Upper bound window similar to per-tag • “Transition” based on variance within subwindows Nw Count Nw’ Time (epochs)

  27. Multi-tag Scenario

  28. Ongoing Work: Spatial Smoothing • With multiple readers, more complicated Two rooms, two readers per room C A B D Reinforcement  A? B? A U B? A B? Arbitration  A? C? U  All are addressed by statistical framework!

  29. Beyond RFID Other sensor data • -estimator for other aggregates • Use SMURF for sensor networks • Use SMURF in general streaming systems (e.g., TelegraphCQ) • Remove RANGE clause from CQL Other streaming data

  30. Related Work • Commercial RFID middleware • Smoothing filters: need to set smoothing window • RFID-related work • Rao et al., StreamClean: complementary • Intel Seattle, HiFi, ESP: static window size • BBQ, MauveDB • Heavyweight, model-based • SMURF is non-parametric, sampling-based • Statistical filters (digital signal processing & DB) • Non-linear digital filters inspired SMURF design

  31. Conclusions • Current smoothing filters not adequate • Not declarative! • SMURF: Declarative smoothing filter • Uses statistical sampling to adapt window size

  32. Thanks! Questions?

More Related