230 likes | 336 Views
Advanced Topics. NP-complete reports. Continue on NP, parallelism. Reprise: Non-determinism. Informal: add to any algorithm taking a guess at one or more places forking and pursuing one or more possibilities
E N D
Advanced Topics NP-complete reports. Continue on NP, parallelism
Reprise: Non-determinism • Informal: add to any algorithm • taking a guess at one or more places • forking and pursuing one or more possibilities • If there is a Non-deterministic algorithm, then there is a regular/standard algorithm • just try all the possibilities • may take a long time
Reprise: the class P • … is all problems for which there exist an algorithm with complexity bounded by a polynomial.
Reprise: the class NP • all problems for which there is an algorithm, possibly non-deterministic, that assuming you take the right paths, is bounded by a polynomial • Alternative definition: you can check that the answer is correct in polynomial time.
Reprise: does P = NP? • Is it possible to find actual standard algorithms for these NP problems? • THE great problem of computer science. • Proving it false would also be significant. • Theoretical problem with considerable practical value.
NP complete • A set of NP problems that can be translated into each other in polynomial time so… • If one of the problems can be solved in polynomial time • aka tractible • …. they all can.
NP-hard • A problem is NP-hard if there is an NP-complete problem that can be translated into it in polynomial time. • but not necessarily the other way. • NP-hard problems are at least as hard as NP-complete problems.
NP-hard example • Robot path planning in a dynamic environment
Reports on NP-complete problems • Tetris • Knapsack problem • Steiner Tree problem • Graph coloring • Minesweeper • Subset problem
Note • There are methods for getting answers to NP problems, but they aren't guaranteed to be optimal. • Called heuristics or approximations
Distributed computing • Approach to NP problems: fork a new process • That is, use distributed computing to investigate the different choices • Some problems may be embarrassingly parallelizable.
Sources • Many • Google: http://code.google.com/edu/parallel/mapreduce-tutorial.html • Note: there is controversy re: MapReduce • may be issue of patent • Is it the right framework • ??
Concepts • key/value pair • Master / Worker • nodes on network • may be one Master and many Workers • hashing: quick way to find data (key/value data) • piece / partition / split / shard
Example from Google tutorial • Compute pi using many workers, each doing a calculation using pseudo-random function. • no data (NOT typical MapReduce problem) • Worker picks a random pointin the square. If it is in the circle,worker increments a counter. • http://faculty.purchase.edu/jeanine.meyer/processing/piEstimate/applet/
Formulas • Area_of_circle = pi * r2 • Area_of_square containing circle = 4 * r2 • So r2 = Area_of_square / 4 • Let Ac be Area_of_circle and As be Area_of_square • Then pi = 4 * Ac / As • Estimate for pi is 4 * counter / Number_of_points_tried
Informal proof • The chances of any point being in the circle is proportional to the ratio of the areas. • Choosing many points randomly carries out this test. • We could [simply] use for-loops and do the calculation for every point.
MapReduce • Model for distributed (aka parallel) computing • There are different products that implement MapReduce. From a google search: • Google • Apache Hadoop: Open source • Teradata • Amazon • Greenplum • Platform
MapReduce • Programmers sets up program for Master and for Workers. Typically, the Master program sets up and partitions input array(s). • Typically, data is key/value pairs. • Programmers write • Map functions that process data, possibly making use of functions in the MapReduce library • Reduce functions that combine the results • Workers work on Map tasks and/or Reduce tasks. The Map task is applied to the worker's piece (aka shard) of the input array.
MapReduce for pi estimate • Not typical in that there is no data • The map function does the calculation • When all done, the reduce function adds up all the individual counters and calculates the estimate for pi
Speed up for pi estimate • Suppose • each step (getting the 2 random values and determining if in circle) takes K steps • suppose 1000 workers calculating all together 1000000 values • suppose adding 2 numbers takes 1 time unit • Time without distributed computing: 1000000*K • Time with distributed computing 1000*K + 1000 • Speed up is slightly less than 1000
Follow-up • Look up examples using MapReduce • Note: one example is Google maintaining its keyword index by scanning (crawling) the web
Speaker Twitter: @kmwinterfield • IBM Smarter Cities • Social media for political campaigns • World Community Grid
Homework • Prepare question for Kevin • follow on twitter and send message OR • post on moodle • Continue with postings • Research unique NP complete problem and post summary and source!