270 likes | 441 Views
External Memory Value Iteration. Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela. Agent. s t. c t. a t. Environment. Motivation: Reinforcement Learning.
E N D
External Memory Value Iteration Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela
Agent st ct at Environment Motivation: Reinforcement Learning • Aim: Write Controller to act successfully in the environment • Minimize Cost/Maximize Rewards Edelkamp, Jabbar & Bonet
Motivation: External Reinforcement Learning • Cover deterministic, non-deterministic, probabilistic environments (and games) • But what to do, if the agent’s state space or policy space is too large to be computed and stored in RAM? • Disk Space is Cheap (500 GB ~ 100$) External Memory Algorithm Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
Uniform Search Modell: Deterministic Non-Deterministic Probabilistic Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
ε-Optimal for solving MDPs, AND/OR trees… • Problem: • Needs to have the whole state space in the main memory. Edelkamp, Jabbar & Bonet
Why External Memory Algorithms ? • Search algorithms perform well as long as they consume RAM only! • Virtual memory slows down the performance! Virtual Address Space 0x000…000 7 I/Os Memory Page 0xFFF…FFF Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Memory Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
External Memory Model [Vitter and Shriver, 94] If the input size is very large, running time depends on the I/Os rather than on the number of instructions. M B Input of size N >> M Edelkamp, Jabbar & Bonet
A C Remove Duplicates w.r.t 2 previous layers External Sort Open (2) Compact Open (2) E D B D A A A E D A B Open (2) A D D C D D E E E Open (1) External Breadth-First Search (Munagala and Ranade, SODA’99) A Open (0) For undirected graphs, subtracting two layers is enough [Munagala & Ranade, 99]. For directed graphs, the longest back-edge has to be taken into account [Zhou & Hansen, 05]. Edelkamp, Jabbar & Bonet
External Memory Algorithms for Implicit Graphs • Frontier Search [Korf, 03] • External A* [Edelkamp, Jabbar, Schrödl, 04] • Structured Duplicate Detection [Zhou & Hansen, 04]. • Cost-Optimal External Planning [Edelkamp, Jabbar, 06] • Model Checking for Linear Temporal Logic • [Jabbar & Edelkamp, 05] for safety error detection • [Edelkamp & Jabbar, 06] for liveness detection (cycle) • [Barnat, Brim, Simecek, 07] for liveness detection (cycle) • Real-Time Model Checking/Scheduling [Edelkamp, Jabbar, 06] Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Memory Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
External Memory Algorithm for Value Iteration • What makes value iteration different from the usual external memory search algorithms? • Answer: • Propagation of information from states to predecessors! Edges are more important than the states. Ext-VI works on Edges: Edelkamp, Jabbar & Bonet
Phase I: Generate the edge space by External BFS. Open(0) = Init; i = -1 while (Open(i-1) != empty) Open(i) = Succ(Open(i-1)) Externally-Sort-and-Remove-Duplicates(Open(i)) forloc = 1 to Locality(Graph) Open(i) = Open(i) \ Open(i - loc) i++ endwhile External Memory Value Iteration Remove previous layers Merge all BFS layers into one edge list on disk! Opent = Open(0) UOpen(1) U … UOpen(DIAM) Temp = Opent Sort Opent wrt. the successors; Sort Tempwrt. the predecessors Edelkamp, Jabbar & Bonet
2 1 2 7 1 h=3 5 0 0 2 I T T 1 3 8 10 1 2 1 6 4 9 Working of Ext-VIPhase-II Temp : Edge List on Disk – Sorted on Predecessors h= 3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0 {(Ø, 1), (1,2), (1,3), (1,4),(2,3), (2,5),(3,4), (3,8),(4,6),(5,6), (5,7),(6,9),(7,8), (7,10),(9,8), (9,10)} {(Ø,1),(1,2),(1,3), (2,3),(1,4), (3,4),(2,5),(4,6), (5,6),(5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)} h= 3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0 h’= 3 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0 Opent : Edge List on Disk – Sorted on Successors Alternate sorting and update until residual < epsilon Edelkamp, Jabbar & Bonet
……… Complexity Analysis • Phase-I: External Memory Breadth-First Search. • Expansion: • Scanning the red bucket: O(scan(|E|)) • Duplicates Removal: • Sorting the green bucket having one state for every edge from the red bucket. • Scanning and compaction. • O(sort(|E|)) • Subtraction: • Removing states of blue buckets (duplicates free) from the green one. • O(l xscan(|E|)) Complexity of Phase-I: O(l xscan(|E|) + sort(|E|) ) I/Os Edelkamp, Jabbar & Bonet
Complexity Analysis • Phase-II: Backward Update • Update: • Simple block-wise scanning. • Scanning time for red and green files: O(scan(|E|)) I/Os • External Sort: • Sorting the blue file with the updated values to be used as red file later: O(sort(|E|)) I/Os • Fast External Sort: • If |E| / M < Max file pointers • O(scan(|E|)) I/Os Sorted on preds ……… Sorted on states Updated h-values Total Complexity of Phase-II: For tmax iterations, O(tmax xsort(|E|)) I/Os With Fast External Sort: O(tmax xscan(|E|)) I/Os Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
Experiments: 3x3 Sliding Tiles Puzzle Number of Iterations differ!! Edelkamp, Jabbar & Bonet
3x4 Sliding Tile Puzzle with p=0.9 (State space: 12!/2 = 239 x 106) • On 2 Gigabytes, VI could not generate the state space. • External VI Finished: • Took 45 GB of disk space for the edges. • Total 1,357,171,197 edges. • Took 437 hours and 72 iterations to converge. • ε = 0.0001 • RAM used: 1.4 Gigabytes Edelkamp, Jabbar & Bonet
Race Track Domain • Example Edelkamp, Jabbar & Bonet
Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet
Summary Achievements • First I/O efficient disk-based algorithm for solving Markov Decision Processes. • I/O Complexity Analysis. Features • General Cost Model • Can Pause-and-Resume Execution to add more Hard Disks. Refinements • Disk Space eaten by Duplicate States: Start “Early”Delayed Duplicate Detection Edelkamp, Jabbar & Bonet
Outlook • Application to Bellman-Ford • Parallel External Value Iteration: During the time of internal update, hard disk is not in use.. Edelkamp, Jabbar & Bonet