290 likes | 430 Views
Toward Community Sensing. Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng Zhao Microsoft Research Information Processing in Sensor Networks | April 24, 2008. TexPoint fonts used in EMF.
E N D
Toward Community Sensing Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng ZhaoMicrosoft Research Information Processing in Sensor Networks | April 24, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
Motivation: Traffic monitoring Deployedsensors,high accuracyspeed data What about148th Ave? Detector loops Traffic cameras How can we get accurate road speed estimates everywhere?
Cars as traffic sensors • Many cars have Personal Navigation Devices (PNDs) • Know exact location and speed! • Fuse GPS, map information, engine speed, … • Modern PNDs have network connection Can use cars as speed sensors! Example: Dash Express (GPS + GPRS/WiFi)
SenseWeb Community Sensing Vision Realize full potential of population owned sensors Must respect privacy and preference about sharing! Privately-heldsensors Common goal Estimate spatialphenomenon(traffic, weather, …) Construct 3D cities News coverage Contributesensor data Request data
Privacy concern of GPS traces Images courtesy of John Krumm Dense GPS traces allow to identify people’s locations, activities, intents, etc. Even anonymization or strong obfuscation doesn’t help. Key idea: Avoid dense sampling! Need to predict from sparse samples
Phenomenon modeling s1 s4 s3 s1 s2 s3 s5 s7 s8 s6 s9 s9 s10 s11 s12 s12 Which segments should we monitor? (Normalized) speeds as random variables Joint distribution allows modeling correlations Can predict unmonitoredspeeds from monitored speeds using P(S5 | S1, S9)
Minimizing uncertainty • Can estimate prediction error at segment Si Var(Si | SA = sA) • Expected error at segment Si • Expected mean squared error EMSE(A) = i Var(Si | SA) = + + • A* = argmin|A|· k EMSE(A) Does not take “importance” of Si into account A={S1,S2,S3,S4} s1 s1 s1=.5 s1=.9 s2 s2=.6 s2 s2=1 s3 s3=.8 S3=1 s3 s5 s6 Var(S5|SA)= s4 s4=.6 s4 s4=1 Var(S6|SA)= .08 .1 Lesstravelled P(S5|sA) s7 Var(S7|SA)= 0 1 .3 Frequentlytravelled Var(S5|sA)=.01 Var(S5|sA)=.1
Taking demand into account • Model demand Di as random variables (e.g., Poisson)E.g., Di = #cars on segment Si • Demand weighted MSEDMSE(A) = i E[Di] Var(Si | SA) • Error reduction: R(A) = DMSE(;)-DMSE(A) Want: A* = argmax|A|· k R(A) NP-hard optimization problem s1 s2 s3 50 D5 = 10 D6 = s5 = ¢ ¢ ¢ + + s6 Var(S5|SA)= s4 Var(S6|SA)= .08 .1 200 D7 = s7 Var(S7|SA)= .3
Selecting informative locations Greedy algorithm: • A ; • For i = 1:k do • s*= argmaxs R(A [ {s}) • A A [ {s*} How well does this heuristic do? s4 s1 s2 s2 s3 s5 s7 s7 s8 s6 s9 s10 s10 s11 s11 s12
Selection B s4 s1 s2 s3 s5 s7 s8 s6 s9 s10 s11 Diminishing returns Selection A s4 s1 s2 s3 s5 s7 s8 s6 s9 Utility R(A) is submodular*! s10 s11 Adding s’ helps a lot! Adding s’ doesn’t help much s’ + S’ Observe new location B Large improvement Submodularity: A + S’ Small improvement For A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B) *See store for details
~63% Why is submodularity is useful? Theorem [Nemhauser et al ‘78] Greedy algorithm gives constant factor approximation F(Agreedy) ¸ (1-1/e) F(Aopt) Greedy algorithm gives near-optimal set of locations to observe Have no control over where the sensors (cars, cell phones) are going to be!
Querying a roving sensor How can we cope with uncertain sensor availability? Query! s1 s2=.9 s2 Response: “I’m at S2,going 55 mph” s3 s5 s6 s4 Query! No response(no data) s7
Modeling sensor availability • Set W of observations (cars) we can select from • If select car Cj, observe Si with probability P(i | Cj) Observations W = {C1,…,Cm} Pick B µ W s1 s1 s2 s3 C1 s5 s6 Road segmentsV = {S1,…,Sn} Random A µ V from P(A | B) C2 s4 Goal: Maximize expected utility: B* = argmax|B|· kA P(Aj B) R(A) s7 s7 C3 Utility R(A)
Optimizing community sensing Lemma: Whenever R(A) is submodular, the function F(B) = |A|· k P(A j B) R(A) is submodular Can use the greedy algorithm to optimize selection F(B) is sum over exponentially many terms Theorem: For any , can find set B’ such that F(B’) ¸ (1-1/e) max|B|· k F(B) - with probability 1-, using independent samples of R(A)
Handling user preferences • Need to respect user preferences • “Sample my speed at most once per day” • “Don’t measure my speed for the next hour” • “Never sample close to my home” • “Wait at least 10 minutes between samples” • Can accommodate preferences using constraint optimization: B* = argmaxB F(B) subject to C(B) · L Can still get near-optimal solutions (details in paper) SensingBudget Complex cost function
Community Sensing Summary Phenomenon Demand Availability & Preferences • Optimize value of probing roving sensors • Utility (expected error reduction) • Demand (usage: “utilitarian” impact) • Sensor availability • Predict location based on history • Preferences • Abide by preferences • E.g., frequency / number of probes, min. inter-probe interval • Other constraints: e.g., “Not near my home!”
Phenomenon modeling • 3 months of data from 534 segments across 7 highways and interstates near Seattle, WA • Samples at 15 minute intervals • Use Gaussian Process to model road speeds (covariance function based on road network topology) • Can compute utility R(A) in closed form!
Demand modeling Expected demand(rush hour) Demand = #cars on road segment Estimate demand based on 3166 ClearFlow route requests
Evaluating model accuracy Accurate estimation of prediction error! Demand-weighted RMS Lower is better Number of locations
Demand driven querying Lower is better 65% error reduction using only 10 (of 534) observations! Optimized sensing requires 10x fewer samples!
Availability modeling • Microsoft Multiperson Location Survey (MSMLS) [Krumm ‘06] • GPS traces from 85 drivers, 6+ days each • Associate GPS readings with road segments“Map matching” • Two models of sensor availability • Spatial obfuscation • Sparse querying GPS usedin MSMLS
Spatial obfuscation • Motivation: Privacy through enforcing uncertainty about sensor location Request road speed at some location in area X X CommunitySensing Service Populationof sensors Anonymized response fromrandom car in cell X (if available)
Spatial obfuscation Lower is better Discretization ≈ Utility / Privacy knob High accuracy even with coarse discretization 23
Obfuscation by sparse querying • Associate roving sensors with anonymous ID • Learn availability model for each sensor from data Request road speed and location from car Ci CommunitySensing Service Populationof sensors Response from car Ci (if connected to network available)
Obfuscation by sparse monitoring Lower is better Biggest difference in “important” part of the curve 50% error reduction over mean if querying 10 “cars” 25
Mobile vs. fixed sensors • When does it “pay off” to use mobile vs. fixed sensors? • Experiment: cost C(B) = #fixed(B) + #mobile(B) max F(B) s.t. C(B)· L Fixedbudget Mobile sensors pay off if fixed sensors 4x as expensive
Extensions / Future work • Spatio-temporal models (see paper) • How to quickly learn good models (see paper) • Other applications: • Population fitness? • News coverage? • Reconstruction of 3D cities? • Formal privacy guarantees?
Related work • Travel time estimation using cell phones [Wunnava et al ’07] • Privacy-aware querying of cars with GPS & cell phones [Bayen et al ’08, forthcoming] • Spatial monitoring, experimental design etc. (see paper)
Conclusions • Presented integrated approach to community sensing • Theoretical analysis near-optimal sensing policies • Extensive empirical evaluation on traffic monitoring case study Phenomenon Demand Availability& Preferences