300 likes | 332 Views
Nonmyopic Adaptive Informative Path Planning for Multiple Robots. Amarjeet Singh (UCLA) Andreas Krause (Caltech) William Kaiser (UCLA). rsrg @caltech. ..where theory and practice collide. TexPoint fonts used in EMF.
E N D
Nonmyopic Adaptive Informative Path Planning for Multiple Robots Amarjeet Singh (UCLA) Andreas Krause (Caltech) William Kaiser (UCLA) rsrg@caltech ..where theory and practice collide TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA
Depth Predict atunobservedlocations Location across lake Monitoring rivers and lakes[IJCAI ‘07] • Need to monitor large spatial phenomena • Temperature, nutrient distribution, fluorescence, … NIMSKaiseret.al.(UCLA) Can only make a limited number of measurements! Color indicates actual temperature Predicted temperature Use robotic sensors tocover large areas Where should we sense to get most accurate predictions?
Detection Range Detected Survivors Urban Search & Rescue How can we coordinate multiple search & rescue helicopters to quickly locate moving survivors?
Related work Information gathering problems considered in Experimental design (Lindley ’56, Robbins ’52…), Value of information (Howard ’66), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …) Existing algorithms typically • Heuristics: No guarantees! Can do arbitrarily badly. • Find optimal solutions (Mixed integer programming, POMDPs): Very difficult to scale to bigger problems. Want algorithms that have theoretical guarantees and scale to large problems!
F(A1) = 4 F(A2) = 10 How to quantify collected information? Sensing quality function F(A) assigns utility to set A of locations, e.g., • Expected reduction in MSE for predictions based GP model Want to pick sensing locations A µ V to maximize F(A)
Selecting sensing locations G1 G2 G3 G4 Given: finite set V of locations Want: A*µ V such that Typically NP-hard! Greedy algorithm: Start with A = ; For i = 1 to k s* := argmaxs F(A [ {s}) A := A [ {s*} How well does the greedy algorithm do? 6
Y1 Y2 Y3 Y4 Y5 Y2 Y1 Y‘ Key observation: Diminishing returns Selection A = {Y1, Y2} Selection B = {Y1,…, Y5} Many sensing quality functions are submodular*: Information gain [Krause & Guestrin ’05] Expected Mean Squared Error [Das & Kempe ’08] Detection time / likelihood [Krause et al. ’08] … *See paper for details Adding Y’ doesn’t help much Adding Y’ will help a lot! New observation Y’ + Y’ B Large improvement Submodularity: A + Y’ Small improvement For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B)
Selecting sensing locations G1 G2 G3 G4 Given: finite set V of locations Want: A*µ V such that Typically NP-hard! Greedy algorithm: Start with A = ; For i = 1 to k s* := argmaxs F(A [ {s}) A := A [ {s*} Theorem[Nemhauser et al. ‘78]: F(AG) ¸ (1-1/e) F(OPT) Greedy near-optimal! 8
Challenges for informative path planning Use robots to monitorenvironment Not just select best k locations A for given F(A). Need to … take into account cost of traveling between locations … cope with environments that change over time … need to efficiently coordinate multiple agents Want to scale to very large problems and have guarantees
Outline and Contributions Path Constraints Dynamicenvironments Multi-robotcoordination
s4 Most informative locationsmight be far apart! 2 1 2 s1 1 s5 s2 1 s3 1 s10 1 s11 1 Informative path planning So far: max F(A) s.t. |A|· k Known as submodular orienteering problem. Best known algorithms (Chekuri & Pal ’05, Singh et al ’07) are superpolynomial! Can we exploit additional structure to get better algorithms? Robot needs to travelbetween selected locations Locations V nodes in a graph C(A) = cost of cheapest path connecting nodes A max F(A) s.t. C(A) · B Greedy algorithm fails arbitrarily badly!
F(B) B1 B2 Additional structure: Locality • If A, B are observation sets close by, then F(A[B) < F(A) + F(B) • If A, B are observation sets, at least r apart, then F(A[B)¼F(A) + F(B) • Sensors that are far apart are approximately independent • Holds for many objective functions (e.g., GPs with decaying covariance etc.) • We showed locality is empirically valid! [we only assume F(A[B) ¸° (F(A) + F(B))] Call such an F(r,°)-local A2 A1 F(A) r
C1 C2 S1 SB C4 C3 The pSPIELOR Algorithmbased on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06 • pSPIEL: Efficient nonmyopic algorithm (padded Sensor Placements at Informative and cost-Effective Locations) Select starting and endinglocation s1and sB Decompose sensing region into small, well-separated clusters Solve cardinality constrained problem per cluster (greedy) Combine solutions using orienteering algorithm Smooth resulting path g2,2 g1,2 g1,1 g2,1 g1,3 g2,3 g4,3 g3,1 g3,3 g4,4 g4,1 g3,2 g3,4 g4,2
Guarantees for pSPIELORbased on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06 Theorem: For (r,°)-local submodular F pSPIEL finds a path A with submodular utility F(A)¸(°) OPTF path length C(A)·O(r) OPTC *See paper for details
Rescued Survivors 80 pSPIEL 60 Greedy Rescue Range 40 Expected number of survivors rescued Detection Range 20 Heuristic (Chao et al) Detected Survivors 0 0 20 30 40 50 10 Number of timesteps pSPIEL Results: Search & Rescue Sensor Planning Research Challenge • Coordination of multiple mobile sensors to detect survivors of major urban disaster • Buildings obstruct viewfield of camera • F(A) = Expected # of people detected pSPIEL outperforms existing algorithmsfor informative path planning
Outline and Contributions pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering Path Constraints Dynamicenvironments Multi-robotcoordination
Dynamic environments So far: maxA F(A) s.t. C(A) · B • Assumes we know the sensing quality F in advance • Plan a fixed (nonadaptive) path / placement A In practice: • Model unknown; need to learn as we go • Environment changes dynamically • Active learning: Find adaptive policy that modifies solution based on observations Gigantic POMDP (intractable) Can we efficiently find a good solution?
X12=? X23 =? F(…) = 2.1 F(…) = 2.4 Sequential sensing expected utility over outcome of observations Sensingpolicy X5=17 X5=? X5=21 X3 =16 X3 =? X2 =? X7 =19 X7 =? F(X5=17, X3=16, X7=19) = 3.4 F() = 3.1 Want to pick sensing policy ¼ to maximize F(¼)
NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09] At each timestep t • Plan nonadaptive solution A* = argmax Ft(A) • Execute first step of nonadaptive solution • Receive observations obs • Update sensing quality Ft+1(A) = Ft(A | obs) 8 A Defines a Nonmyopic Adaptive informatIVEpolicy NAIVE How well does this policy compare to the optimal policy? Efficient!E.g., usingpSPIEL
Value of optimalpolicy OPT Uncertainty in model parameters Application specific Guarantees for NAÏVE-pSPIEL [Singh, K, Kaiser IJCAI ‘09] Theorem: (see paper for details) At every timestep t it holds that Ft(NAIVE) = (1) Ft(OPT) – O(H(|obs)) Key idea: Replace Ft by Gt(¼) = Ft(¼) + ¸ I(£ | ¼) where ¸ 0 is a learning rate parameter Need to trade off exploration (reducing H()) and exploitation(maximizing F(A))
l = 0.1 l = 0 l = 0.5 l = 0.9 Exploration-exploitation tradeoff Intermediate values of ¸ lead to best performance 100 80 60 Expected number of survivors rescued 40 20 0 0 10 20 30 40 50 Number of timesteps
NAIVE-pSPIEL OR NAIVE-Greedy pSPIEL OR Greedy Results: Search & Rescue 80 60 Expected number of survivors rescued 40 20 0 0 10 20 30 40 50 Number of timesteps Adaptive planning leads to significant performance improvement!
Example paths Greedy algorithm • pSPIELOR
0.2 NAIVE-pSPIEL 0.15 0.1 % of critical locations observed 0.05 pSPIEL 0 0 10 20 30 40 Number of timesteps Results: environmental monitoring • Monitor photosyntheticallyactive regions underforest canopy • F(A) = #”critical” regions covered Adaptive planning leads to significant performance improvement!
Outline and Contributions pSPIELOR exploits(r,°)-locality to near-optimallysolve submodular orienteering Path Constraints NAÏVE-pSPIEL implicitly trades offexploration and exploitation toobtain near-optimal adaptive policy Dynamicenvironments Multi-robotcoordination
Multi-robot coordination Can use single-robot algorithm to plan joint policy Exponential increase in complexity with #robots max¼1…¼k F(¼1U¼2U … U ¼k) s.t.C(¼1)·B; C(¼2)·B; … ;C(¼k)·B ¼k ¼1 s t ¼2
Sequential allocation Use pSPIEL to find policy P1 for the first robot max¼1 F(¼1) s.t. C(¼1) · B Optimize for second robot (P2) committing to nodes in P1 max¼2 F(¼1 U¼2) s.t. C(¼2) · B Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1 max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B ¼k ¼1 s t ¼2
RewardOpt Sequential allocation for multiple robots –Greedy over policies Greedy selection of nodes with no path cost constraint NAÏVE-pSPIELOR policy planning 1 + RewardOpt RewardPS ¸ = O(1/°) Arbitrarily Poor ?? Theorem: RewardSA ¸ Performance comparison • Works for any single robot path adaptive planning algorithm! • Independent of number of robots used! • Key tool for analysis: Extension of submodular functions to adaptive policies
3 Robots 2 Robots 1 Robot Multi-robot results Diminishing returns as the number of robots increases 120 100 80 60 Average number of survivors rescued 40 20 0 0 10 20 30 40 50 Number of timesteps
Conclusions • New algorithm pSPIELOR for nonadaptive informative path planning for (r,°)-local submodular functions • New algorithm, NAÏVE-pSPIELOR for adaptive informative path planning using implicit exploration-exploitation analysis • Extensions to multiple robots by sequential allocation • Perform well on real world problems