240 likes | 374 Views
MICS Workshop, 29.06.2010. Order Reconstruction and Data Integrity Testing of Sensor Network Data. Matthias Keller, ETH Zürich. PermaSense Matterhorn Deployment. August 2008 – today Single base station 17 sensor nodes TinyOS /Dozer [Burri2007] Constant rate < 0.1 MByte /node/day.
E N D
MICS Workshop, 29.06.2010 Order Reconstruction and Data Integrity Testingof Sensor Network Data Matthias Keller, ETH Zürich
PermaSense Matterhorn Deployment August 2008 – today Single base station 17 sensor nodes TinyOS/Dozer [Burri2007] Constant rate < 0.1 MByte/node/day
Problem in Finding Temporal Order of Generation Inconsistencies between packet generation timestamp and sequence number
Approach Feedback Sensor Network 1 System/Error Model System Status Data Model SN Data Packet Analysis Filtered, Annotated Data Analysis User Domain Research
Approach Feedback Sensor Network 1 System/Error Model System Status Data Model SN Data Packet Analysis Filtered, Annotated Data Analysis User Domain Research
Related Work • Post-mortem time reconstruction • Volcano deployment: Problems with FTSP [WernerAllen2006] • SunDial: Reconstruct global time from light intensity measurements [Gupchup2009] • Phoenix: Time reconstruction under frequent loss of local node state [Gupchup2010] • This work does not reconstruct timestamps, but annotates the data with information on the temporal order of generation • Data integrity in data warehousing • Cleaning of erroneous user inputs, i.e. with the help of dictionaries [Rahm2000] • Data integrity based on conformance of observed system behavior to a system/error model of the system
Research Questions • How can we model networked embedded systems for analyzing data integrity and network status? • What information can we reliably extract from sensor network data? • Node resets, topology changes, … • Temporal order of generation, duplicates, lost data … • How can we design observable systems? • Minimally needed status information, timing information, …
Research Questions • How can we model networked embedded systems for analyzing data integrity and network status? • What information can we reliably extract from sensor network data? • Node resets, topology changes, … • Temporal order of generation, duplicates, lost data … • How can we design observable systems? • Minimally needed status information, timing information, … What follows, is a simple first attempt.
System Model • Sensor Node • Periodic data sampling • Packet forwarding • Local state • Unique sender address • Local clock with bounded drift • Sequence number counter • Packet queue Period ps Dynamic multi-hop tree topology • Base Station • Sink of data collection tree • Synchronized GMT clock • Only component with a global notion of time
Example: Journey of a Single Packet “ ‘ ‘“ Updated packet Updated packet GEN WAIT TX RX WAIT TX RX Source address Sequence number s Elapsed time te Payload d Arrival timestamp ta 1 2040 0 abcd - 1 2040 0+2 abcd - 1 2040 2 abcd - 1 2040 2+4 abcd - 1 2040 6 abcd 2010-06-29 10:27:15 Estimated packet generation time tp = ta – te = 2010-06-29 10:27:09
Error Model ^ ^ Data loss • Clock drift ρ [ -ρ; +ρ] • Directly affects measurement of • Sampling period ps • Contribution to elapsed time te • Indirectly leading to inconsistencies • Time stamp order tp vs. • order of packet generation s Node reboot ✗ ✗ ✗ Waiting packets Empty queue Queue reset Packet duplicates Node reboots Hard reboot: Power cycle Soft reboot: Watchdog reset Shortens packet period Lost 1-hop ACK 2 ✗ 1 } } 3 ps <ps Retransmission
Formal System Model with Drift Considering a single sensor node with incrementing i: • Sampling period: ps • Clock drift: ρ(i) [ -ρ; +ρ] • Packet generation time: tg(i) = t0 + i * ps * (1+ρ) • Packet sequence number: s(i) = imodsmax • Sojourn time on node n: ts(i, n) • Elapsed time: te(i) = (n)ts(i, n) • Arrival time at base station: ta(i) • Estimated generation time: tp(i) = ta(i) – te(i) • Error bound on generation time calculation: |tp(i) – tg(i)| = | (n)[ ts(i, n) * ρ(i) ] | te(i) * ρ ^ ^ ^
Packet Analysis Considering data of a single sensor node • Packet input format: (s, te, d, ta) • Sequence number s, elapsed time te, payload d, arrival time ta • Packet output format: (s, te, d, ta, id, [tl, tu]) • Unique packet identifier id reflects temporal order of generation • Bound on packet generation time [tl, tu] Goals of packet analysis • Add information id, tl, tu to input packets that comply to system and error model • Classify all other packets as incorrect: they are witnesses for model violations
Analysis Concepts • Remove uncertainty caused by sequence number s(i) = imodsmax • Assign packets to epochs • Determine unique packet id • Determine upper and lower bounds on packet generationtg [tl, tu] • Use forward and backward reasoning • Remove non-compliant packets • Duplicated packets • Empty generation time intervals • Incorrect epochs (duplicated s, too long) problems:- clock drift- reboots problems:- clock drift- reboots
Analysis Concepts • Remove uncertainty caused by sequence number s(i) = imodsmax • Assign packets to epochs • Determine unique packet id • Determine upper and lower bounds on packet generationtg [tl, tu] • Use forward and backward reasoning • Remove non-compliant packets • Duplicated packets • Empty generation time intervals • Incorrect epochs (duplicated s, too long) problems:- clock drift- reboots problems:- clock drift- reboots
Separate Data into Epochs Epoch centers Epoch: Packets generated between two consecutive resets of the sequence number Epoch center TC: Timestamp of (hypothetical) packet having sequence number smax/2 Sequence numbers are unique within an epoch ^ T = smax/ps Epoch i+1 Epoch i+2 Epoch i
Mapping of Packets to Epochs Timestamp tp, sequence number s: s 0 smax
Epoch Assignment with Reboots and Drift Ensure clear assignment of packets to epochs: Bound on elapsed time te:
Epoch Assignment Algorithm Process packets from a single node: • Order packets by generation timestamp tp • Initialize algorithm: i=0, epoch e(i)=0 • If tp(i)-tp(i-1) < Lmax – Lmin + 2 *ρ* te • e(i) = e(i-1) Else if tp(i)-tp(i-1) ≥ Lmax – Lmin + 2 *ρ* te • e(i) = e(i-1) + 1 • id(i) = e(i) * smax + s(i) • Increment i ^ ^ ^ ^
Packet Analysis Duplicates SN Data Duplicate Filter Duplicate-free Data Epoch Assignment Epochs Under given system and error model: Violatingdata Correct data with annotated id
Epochs: Known Good Network Operation Equally spaced epoch centers
Epochs: Unstable Network Operation Phase shift due to reset expected distance unexpected data
Conclusions and Outlook • Data integrity testing and order reconstruction based on a system and error model of a real system • Give guarantees on data quality • Duplicate-free data • Correct temporal order of generation • Correct logical ordering • Improve analysis method andsystem model • Reduce unexplained packets • Integrate results of data filteringbased on physical models Physical values Temporal order Logical order