130 likes | 327 Views
What is Computer Science About? Part 2: Algorithms. Design and Analysis of Algorithms. Why we study algorithms: many tasks can be reduced to abstract problems if we can recognize them, we can use known solutions. example: Graph Algorithms
E N D
Design and Analysis of Algorithms • Why we study algorithms: • many tasks can be reduced to abstract problems • if we can recognize them, we can use known solutions
example: Graph Algorithms • graphs could represent friendships among people, or adjacency of states on a map, or links between web pages... • determining connected components (reachability) • finding shortest path between two points • MapQuest; Traveling Salesman Problem • finding cliques • completely connected sub-graphs • uniquely matching up pairs of nodes • e.g. a buddy system based on friendships • determining whether 2 graphs have same connectivity (isomorphism) • useful for visual shape recognition (e.g. tanks from aerial photographs+edge detection) • finding a spanning tree (acyclic tree that touches all nodes) • minimal-cost communication networks
Kruskal’s Algorithm for Minimum-Spanning Trees // input: graph G with a set of vertices V // and edges (u,v) with weights (lengths) KRUSKAL(G): A = ∅ foreach vi V: cluster[vi] i // singletons foreach edge (u,v) ordered by increasing weight: if cluster[u] ≠ cluster[v]: A = A {(u, v)} foreach w V: if cluster[w] = cluster[u]: cluster[w] cluster[v] // merge return A // subset of edges • it is greedy • is it correct? (always produce MST?) • is it optimal? (how long does it take?)
characterize algorithms in terms of efficiency • note: we count number of steps, rather than seconds • wall-clock time is dependent on machine, compiler, load, etc... • however, optimizations are important for real-time sys., games • are there faster ways to sort a list? invert a matrix? find a completely connected sub-graph? • scalability for larger inputs (think: human genome): how much more time/memory does the algorithm take? • polynomial vs. exponential run-time (in the worst case) • depends a lot on the data structure (representation) • hash tables, binary trees, etc. can help a lot • proofs of correctness • can you prove Euclid’s algorithm is correct? • can you prove an algorithm will guarantee to output the longest palindrome in a string? • is the code for billing long-distance calls correct?
Why do we care so much about polynomial run-time? • consider 2 programs that take an input of size n (e.g. length of a string, number of nodes in graph, etc.) • run-time of one scales up as n2 (polynomial), and the other as 2n (exponential)
Why do we care so much about polynomial run-time? • consider 2 programs that take an input of size n (e.g. length of a string number of nodes in graph, etc.) • run-time of one scales up as n2 (polynomial), and the other as 2n (exponential) • exponential algorithms are effectively “unsolvable” for n>~16 even if we used computers that were 100 times as fast! a computational “cliff”
helpful rules of thumb: 210 ~ 1 thousand (1,024) 220 ~ 1 million (1,048,576) 230 ~ 1 billion (1,073,741,824) 1 Gb = 232 = 4294967296 ~ 4 billion bytes
Moore’s Law(named after Gordon Moore, founder of Intel) • Number of transistors on CPU chips appears to double about once every 18 months • Similar statements hold for CPU speed, network bandwidth, disk capacity, etc. • but waiting a couple years for computers to get faster is not an effective solution to NP-hard problems Dual Core Itanium Pentium-4 80486 Motorola 6800 source: Wikipedia
P vs. NP (CSCE 411) • problems in “P”: solvable in polynomial time with a deterministic algorithm • examples: sorting a list, inverting a matrix... • problems in “NP”: solvable in polynomial time with a non-deterministic algorithm • given a “guess”, can check if it is a solution in polynomial time • no known polynomial-time algorithm exists, and they would take exponential time to enumerate and try all the guesses in the worst case • example: given a set of k vertices in a graph, can check if they form a completely connected clique; but there are exponentially many possible sets to choose from
even harder problems (complexity classes) P NP P vs. NP (CSCE 411) • most computer scientists believe P≠NP, though it has yet to be rigorously proved • what does this mean? • that there are intrinsically “hard” problems for which a polynomial-time algorithm will never be found graph clique, subset cover, Traveling Salesman Problem, satisfiability of Boolean formulas, factoring of integers... sorting a list, inverting a matrix, minimum-spanning tree...
Being able to recognize whether a problem is in P or NP is fundamentally important to a computer scientist • Many combinatorial problems are in NP • knapsack problem (given n items with size wi and value vi, fit as many as possible items into a knapsack with a limited capacity of L that maximizes total value. • traveling salesman problem (shortest circuit visiting every city) • scheduling – e.g. of machines in a shop to minimize a manufacturing process • Finding the shortest path in a graph between 2 nodes is in P • there is an algorithm that scales-up polynomially with size of graph: Djikstra’s algorithm • however, finding the longest path is in NP! (hence we do not expect there are complete and efficient solutions for all cases) • Applications to logistics, VLSI circuit layout...
not all hope is lost... • Even if a problem is in NP, there might be an approximation algorithm to solve it efficiently (in polynomial time) • However, it is important to determine the error bounds. • For example, an approx. alg. might find a subset cover that is “no more than twice the optimal size” • A simple greedy algorithm for the knapsack problem: • put in item with largest weight-to-value ratio first, then next largest, and so on... • can show that will fill knapsack to within 2 times the optimal value