370 likes | 461 Views
Rescuing an Endangered Species with Monte Carlo AI. Tom Dietterich based on work by Dan Sheldon et al. Overview. Collaborative project to develop optimal conservation strategies for Red-Cockaded Woodpecker (RCW)
E N D
Rescuing an Endangered Species with Monte Carlo AI Tom Dietterich based on work by Dan Sheldon et al.
Overview • Collaborative project to develop optimal conservation strategies for Red-Cockaded Woodpecker (RCW) • Institute for Computational Sustainability (Cornell and OSU):Daniel Sheldon, BistraDilkina, Adam Elmachtoub, Ryan Finseth, AshishSabharwal, Jon Conrad, Carla P. Gomes, David Shmoys • The Conservation Fund: Will Allen, Ole Amundsen, Buck Vaughan • Recent paper: Maximizing the Spread of Cascades Using Network Design, UAI 2010
Red-Cockaded Woodpecker • Originally wide-spread species in S. US • Population now shrunken to 1% of original size 5000 breeding groups ~12,000 birds Federally-listed endangered species • Lifestyle: • nests in holes in 80yo+ Longleaf pine trees • sap from the trees defends the nest • takes several years to excavate the hole • Will colonize man-made holes Wikipedia
Spatial Conservation Planning • What is the best land acquisition and management strategy to support the recovery of the Red-Cockaded Woodpecker (RCW)?
Problem Setup Available parcels Conserved parcels Current patches Potential patches Given limited budget, what parcels should we conserve to maximize the expected number of occupied patches in Tyears?
Metapopulation Model • Population dynamics in fragmented landscape • Stochastic patch occupancy model (SPOM) • Patches = occupied / unoccupied • Colonization • Local extinction
SPOM: Stochastic Patch Occupancy Model Time 1 Time 2 • Patches are either occupied or unoccupied • Two types of stochastic events: • Local extinction: occupied unoccupied • Colonization: unoccupied occupied (from neighbor) • Independence among all events
Network Cascades • Models for diffusion in (social) networks • Spread of information, behavior, disease, etc. • E.g.: suppose each individual passes rumor to friends independently with probability ½ Note: “activated” nodes are those reachable by red edges
SPOM Probability Model i i pij 1-βj j j plj k k l l • To determine occupancy of patch at time • For each occupied patch from time ,flip coin with probability to see if colonizes • If is occupied at time , flip a coin with probability to determine survival (non-extinction) • If any of these events occurs, is occupied • Parameters: • : colonization probability • : extinction probability • Simple parametric functions of patch-size, inter-patch distance, etc.
Monte Carlo Simulation of a SPOM Non-extinction a a a a a Colonization b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Time Key idea: a metapopulation model is a cascade in the layered graph representing patches over time
Metapopulation = Cascade a a a a a b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Time Key idea: a metapopulation model is a cascade in the layered graph representing patches over time
Metapopulation = Cascade a a a a a b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Time Key idea: a metapopulation model is a cascade in the layered graph representing patches over time
Metapopulation = Cascade a a a a a b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Time Key idea: a metapopulation model is a cascade in the layered graph representing patches over time
Metapopulation = Cascade a a a a a b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Key idea: a metapopulation model is a cascade in the layered graph representing patches over time
Monte Carlo Simulations a a a a a b b b b b Patches c c c c c d d d d d e e e e e 1 2 3 4 5 Each simulation can produce a different cascade
Insight #1: Objective as Network Connectivity i i i i i j j j j j k k k k k Live edges l l l l l m m m m m targets Conservation objective: maximize expected # occupied patches at time T Cascade objective: maximize expected # of target nodes reachable by live edges
Insight #2: Management as Network Building Initial network Parcel 1 Parcel 2 Conserving parcels adds nodes and (stochastic) edges to the network
Insight #2: Management as Network Building Initial network Parcel 1 Parcel 2 Conserving parcels adds nodes to the network
Insight #2: Management as Network Building Initial network Parcel 1 Parcel 2 Conserving parcels adds nodes to the network
Monte Carlo Evaluation of a Proposed Purchase Plan set of reachable nodes at time • Goal is to maximize , where is our purchasing plan • Run multiple simulations. Count the number of occupied parcels at time . Compute the average:
Research Question • How many samples do we need to get a good estimate? • Answer: We can use basic statistical methods (confidence intervals and hypothesis tests) to measure the accuracy of our estimate. 95% confidence interval for the mean is our estimate; is the true value • We can increase until the accuracy is high enough
Evaluating a Purchase Plan Initial network Parcel 1 Parcel 2 Plan 1: Purchase nothing
Plan 2: Purchase Parcel #1 Initial network Parcel 1 Parcel 2
Plan 3: Purchase Parcels 1 and 2 Initial network Parcel 1 Parcel 2
How many different purchasing plans are there for parcels? • We can’t afford to evaluate them all
Solution Strategy(aka Sample Average Approximation) • Assume we own all parcels. Run multiple simulations of bird propagation • Join all of those simulations into a single giant graph • Goal of maximizing expected # of occupied patches at time is approximated by # of reachable patches in the giant graph • Define a set of variables , one for each parcel that we can buy • Solve a mixed integer program to decide which variables are and which are
Solving the Deterministic Problem • CPLEX commercial optimization package (sold by IBM; free to universities) • Applies a method known as Branch and Bound • NP-Hard, so can take a long time but often finds a solution if the problem isn’t too big or too hard
Experiments • 443 available parcels • 2500 territories • 63 initially occupied • 100 years • Population model is parameterized based (loosely) on RCW ecology • Short-range colonizations (<3km) within the foraging radius of the RCW are much more likely than long-range colonizations
Greedy Baselines • Adapted from previous work on influence maximization • Start with empty set, add actions until exhaust budget • Greedy-uc – choose action that results in biggest immediate increase in objective [Kempe et al. 2003] • Greedy-cb – use ratio of benefit to cost [Leskovec et al. 2007] • These heuristics lack performance guarantees!
Results Upper bound! M = 50, N = 10, Ntest = 500
Results Upper bound! M = 50, N = 10, Ntest = 500
Results Initial population Upper bound! M = 50, N = 10, Ntest = 500 Conservation Reservoir
Conservation Strategies Source population Conservation Reservoir • Both approaches build outward from source • Greedy buys best patches next to currently-owned patches • Optimal solution builds toward areas of high conservation potential • In this case, the two strategies are very similar
A Harder Instance Move the conservation reservoir so it is more remote.
Conservation Strategies Build outward from sources Greedy Baseline SAA Optimum (our approach) $150M $260M $320M Path-building (goal-setting)
Shortcomings of the Method • All parcels are purchased at time • Reality: money arrives incrementally • All parcels are assumed to be for sale at • Reality: parcel availability and price can vary from year to year • How about an MDP? Each year we can see wherethe birds actually spread to and then update our purchase plans accordingly • This is a very hard MDP, no known solution method • Current method is very slow
Status • The Conservation Fund is making purchasing decisions based (partially) on the plans computed using this model • Alan Fern, Shan Xue, and Dan Sheldon have developed an extension that proposes a schedule for purchasing the parcels