640 likes | 778 Views
Distributed Control Algorithms for Artificial Intelligence. by Avi Nissimov, DAI seminar @ HUJI, 2003. Control methods. Goal: deliberation on task that should be executed, and on time when it should be executed. Control in centralized algorithms Loops, branches
E N D
Distributed Control Algorithms for Artificial Intelligence by Avi Nissimov, DAI seminar @ HUJI, 2003
Control methods • Goal: deliberation on task that should be executed, and on time when it should be executed. • Control in centralized algorithms • Loops, branches • Control in distributed algorithms • Control messages • Control for distributed AI • Search coordination
Centralized versus Distributed computation models • “Default” centralized computation model: • Turing machine. • Open issues in distributed models: • Synchronization • Predefined structure of network • Network graph structure knowledge on processors • Processor identification • Processor roles
Notes about proposed computational model • Asynchronous • (and therefore non-deterministic) • Unstructured (connected) network graph • No global knowledge – neighbors only • Each processor has unique id • No server-client roles, but there is a computation initiator
Complexity measures • Communication • Number of exchanged messages • Time • In terms of slowest message (no weights on network graph edges); ignore local processing • Storage • Common number of bits/words required
Control issues • Graph exploration • Communication over the graph • Termination detection • Detection of state when no node is running and no message is sent
Graph exploration: Tasks • Routing of message from node to node • Broadcasting • Connectivity determination • Communication capacity usage
Echo algorithm • Goal: spanning tree building • Intuition: got a message – let it go on • On reception of message on first time, send it to all of the neighbors, ignoring the rest • Termination detection – after the nodes respond, send [echo] message to father
Echo alg.: implementation receive[echo] from w; father:=w; received:=1; for all (v in Neighbors-{w}) send[echo] to v; while (received < Neighbors.size) do receive[echo]; received++; send [echo] to father
Echo algorithm - properties • Very useful in practice, since no faster exploration can happen • Reasonable assumption – “fast” edges tend to stay fast • Theoretical model allows worst execution, since every spanning tree can be a result of the algorithm
DFS spanning tree algorithm:Centralized version DFS(u, father) if (visited[u]) then return; visited[u]=true; father[u]=father; for all (neigh in neighbors[u]) DFS(neigh, u);
DFS spanning tree algorithm:Distributed version On reception of [dfs] from v if (visited[u]) then send [return] to v; status[v]:=returned; return; visited:=true; status[v]:=father; sendToNext();
DFS spanning tree algorithm:Distributed version (Cont.) On reception of [return] from v status[v]:=returned; sendToNext(); sendToNext: if there is w s.t.status[w]=unused then send [dfs] to w; else send [return] to father
Discussion, complexity analysis • Sequential in nature • There is 2 messages on each node therefore • Communication complexity is 2m • All the messages are sent in sequence • Time complexity is 2m as well • Explicitly un-utilizing parallel execution
Awerbuch linear time algorithm for DFS tree • Main idea: why to send to node that is visited? • Each node sends [visited] message in parallel to all the neighbors • Neighbors update their knowledge on status of the node before they are visited in O(1) for each node (in parallel)
Awerbuch algorithm: complexity analysis • Let (u,v) be edge, suppose u is visited before v. Then u sends [visit] message on (u,v); and v sends back [ok] message to u. • If (u,v) is also a tree edge, [dfs], [return] messages are sent too. • Comm. complexity: 2m+2(n-1) • Time complexity: 2n+2(n-1)=4n-2
Relaxation algorithm - idea • DFS-tree property: if (u,v) is edge in original graph, then v is in path (root,..,u) or u is in path of (root,..,v). • Union of lexically minimal simple paths (lmsp) satisfies this property. • Therefore, all we need is to find lmsp for each node in graph
Relaxation algorithm – Implementation On arrival of [path, <path>] if (currentPath>(<path>,u)) then currentPath:=(<path>, u); send all neighbors [path, currentPath] // (in parallel, of course)
Relaxation algorithm – analysis and conclusions • Advantages – low complexity: • In k steps, all the nodes with length k of lmsp are set up, therefore time complexity is n • Disadvantages: • Unlimited message length • Termination detection required (see further)
Other variations and notes • Minimal spanning tree • Requires weighting the nodes, much like Kruskal’s MST algorithm • BFS • Very hard, since there is no synchronization; much like iterative deepening DFS • Linear message solution • Like centralized; sends all the information to next node; unlimited message length.
Connectivity Certificates • Idea: let G be network graph. Throw from G some edges, while preserving k paths when available in G; and all the paths if G itself contains less than k paths (for each {u,v}) • Applications: • Network capacity utilization • Transport reliability insurance
Connectivity certificate: Goals • The main idea of certificates is to use as less edges as possible, there always is the trivial certificate – whole graph. • Finding minimal certificate is NP-hard problem • Sparse certificate is one that contains no more than k*n edges
Sparse connectivity certificate: Solution • Let E(i) be a spanning forest in graph G\Union(E(j)) for 1<=j<=i-1; then Union(E(i)) is a sparse connectivity certificate • Algorithm idea – calculate all the forests simultaneously – if an edge closes a cycle in tree of i-th forest, then add the edge to forest (i+1)-th (rank of the edge is i+1)
Distributed certificate algorithm Search(father) if (not visited) then for all neighbor v s.t. rank[v] ==0 send[give_rank] to v; receive[ranked, <i>] from v; rank[v]:=i; visited:=true;
Distributed certificate algorithm (cont.) Search(v) (cont.) for all w s.t. needs_search[w] and rank[w]>=rank[father] in decreasing order needs_search[w]:=false; send [search ] to w; receive [return]; send [return] to father
Distributed certificate algorithm (cont.) On receipt of [give_rank] from v rank[v]:=min(i) s.t. i>rank[w] for all w; send [ranked, <rank[v]>] to v; On receipt of [search] from father Search(father);
Complexity analysis and discussion • There is no reference to k in algorithm; it calculates sparse certificates for all k’s • There is at most 4 messages on each edge – therefore time and communication complexity is at most 4m=O(m) • Ranking the nodes in parallel, we can achieve 2n+2m complexity
Termination detection: definition • Problem: detect a state when all the nodes are awaiting for messages in passive state • Similar to garbage collection problem – determine the nodes that no longer can accept the messages (until “reallocated” – reactivated) • Two approaches: tracing vs. probe
Processor states of execution: global picture • Send • pre-condition: {state=active}; • action: send[message]; • Receive • pre-condition: {message queue is not empty}; • action: state:=active; • Finish activity • pre-condition: {state=active}; • action: state:=passive;
Tracing • Similar to “reference counting” garbage collection algorithm • On sending a message, increases children counter • On receiving message [finished_work], decreases children counter • When finishes work, and when children counter equals zero, sends a [finished_work] message to the father
Analysis and discussion • Main disadvantage: doubles (!!) the communication complexity • Advantages: simplicity, immediate termination detection (because the message is initiated by terminator). • Variations may send [finished_work] message on chosen messages; so called “weak reference”
Probe algorithms • Main idea: Once per some time, “collect garbage” – calculate number of sent minus number of received messages per processor • If sum of these numbers is 0 – then there is no message running on the network. • In parallel, find out if there is an active processor.
Probe algorithms – details • We will introduce new role – controller; and we will assume it is in fact connected to each node. • Once in some period (delta), controller sends [request] message to all the nodes. • Each processor sends back [deficit= <sent_number-received_number>].
Think it works? Not yet… Suppose U sends a message to V and becomes passive; then U receives [request] message and replies (immediately) [deficit=1]. Next processor W receives [request] message; it replies [def=0] since it got no message yet Meanwhile V activates W by sending it a message, receives reply from W and stops; receives [request] and replies [def=-1] But W is still active….
How to work it out? • As we saw, a message can pass “behind the back” of the controller, since the model is asynchronous • Yet, if we add some additional boolean variable on each of processors, such as “was active since last request”, we can deal with this problem • But that means, we will detect termination only in 2*delta time after the termination actually occurs
Variations, discussion, analysis • If there is more one edge between the controller and a node, usage of “echo” when initiator=controller, sum calculated inline • Not immediate detection, initiated by controller • Small delta causes to communication bottleneck, while large delta causes long period before detection
CSP and Arc Consistency • Formal definition: find x(i) from D(i) so that if Cij(v,w) and x(i)=v then x(j)=w • Problem is NP-complete in general • Arc-consistency problem is removing all values that are redundant: if for all w from D(j) Cij(v,w)=false then remove v from D(i) • Of course, Arc-consistency is just the primary step of CSP solution
Sequential AC4 algorithm For all Cij,v in Di,w in Dj if Cij(v,w) then count[i,v,j]++; Supp[j,w].insert(,<i,v>); For all Cij,v in Di checkRedundant(i,v,j) ; While not Q.empty <j,w> =Q.deque(); forall <i,v> in Supp[j,w] count[i,v,j]--; checkRedundant(i,v,j);
Sequential AC4 algorithm: redundancy check checkRedundant(i,v,j) if (count[i,v,j]=0) then Q.enque(<i,v>); Di.remove(v);
Distributed Arc consistency • Assume that each variable x(i) is assigned to separate processor, and that all the mutually dependent variables assigned to neighbors. • The main idea of algorithm: Supp[j,w] and count[i,v,j] lay on processor of x(j); while D(i) is on processor i; if v is to be removed from D(i), then j processor sends message
Distributed AC4: Initialization Initialization: For all Cij,v in Di For all w in Dj_initial if Cij(v,w) count[v,j]++; if count[v,j]=0 Redundant(v) Redundant(v): if v in Di Di.remove(v); SendQueue.enque(v);
Distributed AC4: messaging On not SendQueue.empty v=SendQueue.deque for all Cji send [remove v] to j On reception of [remove w] from j for all v in Di such that Cij(v,w) count[v,j]--; if count[v,j]=0 Redundant(v)
Distributed AC4: complexity • Assume A=max{|Di|}, m=|{Cij}|. • Sequential execution: both loops pass over all the Cij,v in Di and w in Dj => O(mA^2) • Distributed execution: • Communication complexity: on each edge can be at most A messages => O(mA); • Time complexity: each node sends in parallel each of A messages => O(nA). • Local computation: O(mA^2) because of initialization
Dist. AC4 – Final details • Termination detection is not obvious, and requires explicit implementation • Usually probe algorithm is preferred because of big quantity of messages • AC4 ends in three possible states • Contradiction • Solution • Arc Consistent sub-set
Task assignment for AC4 • Our assumption was that each variable is assigned to different processor. • Special case is multiprocessor computer, when all the resources are on hand • In fact, that is NP-hard problem to minimize communication cost when assignment has to be done by computer => heuristic approximation algorithms.
From AC4 to CSP • There are many heuristics, taught mainly in introduction AI course (such as most restricted variable and most restricting value) that tells which variables should be removed after arc-consistency is reached • On contradiction termination – usage in back-tracing
Loop cut-set example • Definition: Pit in loop L – a vertex in directed graph, such that both edges of L are incoming. • Goal: break loops in directed graph. • Formulation: Let G=<V,E> be graph; find C subset of V such that any loop in G contains at least one non-pit vertex. • Applications: Belief networks algorithms
Sequential solution • It can be shown that finding minimal cut-set is NP-hard problem, therefore approximations are used instead • Best known approximation – by Suermondt and Cooper – is shown on next slide • Main idea: on each step drop all leaves and then find a vertex so that is common to the maximal number of cycles that has 1 incoming edge
Suermondt and Cooper algorithm C:=empty; While not V.empty do remove all v such that deg(v)<=1; K:={v in V: indeg(v)<=1} v:=argmax{deg(v): v in K}; C.insert(v); V.remove(v);
Edge case • There is still one subtlety (that isn’t described in Tel’s article) – what to do if K is empty while V is not (for example, if G is Euler path on octahedron)