270 likes | 384 Views
Functional Collection-Oriented Programming. Guy Blelloch Carnegie Mellon University. Collection-oriented programming. Programmer emphasis is on operations over collections of values. (Data Driven) Array based: APL, Nial, FP, Matlab Database: SQL, Linq Scripting: SETL, Python
E N D
Functional Collection-OrientedProgramming Guy Blelloch Carnegie Mellon University
Collection-oriented programming • Programmer emphasis is on operations over collections of values. (Data Driven) • Array based: APL, Nial, FP, Matlab • Database: SQL, Linq • Scripting: SETL, Python • Data parallel: *Lisp, HPF, Nesl, Id, ZPL • Map-reduce • All of these support some form of Map and some form of reduce.
Collection-oriented programming • Concise code • Promotes a functional style of programming • Has become popular even without parallelism (matlab, python, sql, …) • Parallelism • Map is naturally parallel • Many collection operations are parallel: reduce, scan, collect, flatten, transpose, … • Most often DETERMINISTIC
Collection-oriented programming • “Concurrency” (Non-deterministic environment) • On its own not really useful for “concurrent” applications (e.g. operating systems, or front-end of a web server).
Flat vs. Nested Can collections contain collections? Can arbitrary functions be mapped? • Flat languages • APL, SQL, Map-reduce, HPF, Matlab • Nested Languages • SETL, Python, Nesl, Id I conjecture that flat CO languages will never be general purpose—not good for trees, divide-and-conquer, …
Quicksort in NESL function quicksort(S) = if (#S <= 1) then S else let a = S[rand(#S)]; S1 = {e in S | e < a}; S2 = {e in S | e = a}; S3 = {e in S | e > a}; R = {quicksort(v) : v in [S1, S3]}; in R[0] ++ S2 ++ R[1];
Quicksort in X10 double[] quicksort(double[] S) { if (S.length < 2) return S; double a = S[rand(S.length)]; double[] S1,S2,S3; finish { async { S1 = quicksort(lessThan(S,a));} async { S2 = eqTo(S,a);} S3 = quicksort(grThan(S,a)); } append(S1,append(S2,S3)); }
Matrix Multiplication Fun A*B { if #A < k then baseCase.. A11,A12,A21,A22 = QuadSplit(A) B11,B12,B21,B22 = QuadSplit(B) Parallel { C11 = A11*B11 + A12*B21 C12 = A11*B12 + A12*B22 C21 = A21*B11 + A22*B21 C22 = A21*B12 + A22*B22 } return QuadJoin(C11,C12,C21,C22) } Need to be able to program for locality.
Question: • How general is functional CO programming? • Advantages • High-level/concise • Natural/Intuitive • Deterministic Parallelism (for all partial results) • No need for annotations, commutativity, regions • No speculation • Simple cost model (even including locality) • Potential Disadvantages • Performance • Major rewriting of code • Does not support “concurrency” on its own
Barnes Hut function bTree(Pts,box as (x0,y0,s)) = if #pts = 0 then EMPTY else if #pts = 1 then LEAF(p[0]) else let xm = x0 + s/2; ym = y0 + s/2; parallelLet T1 = bTree({(x,y,d) in pts | x<xm & y<ym}, (x0,y0,s/2)); T2 = bTree({(x,y,d) in pts | x<xm & y>=ym}, (x0,y0+s/2,s/2)); .. in NODE(cmass(T1,T2,T3,T4),box,T1,T2,T3,T4)
Barnes Hut function force(p,LEAF(p’)) = force(p,p’) | force(p,EMPTY) = 0 | force(p,(c,box,T1,T2,T3,T4) if far(p,box) then forceC(p,c) else force(p,T1)+force(p,T2)+force(p,T3) +force(p,T4) function forces(Points,T) = {move(p,force(p,T)) : p in Points};
“Algorithms in the Real World” • Compression: • JPEG *Easily expressed with no shared writeable state ^Depends on algorithm
Compression: • JPEG ^Depends on algorithm
Barnes Hut function bTree(Pts,box as (x0,y0,s)) = if #pts = 0 then EMPTY else if #pts = 1 then LEAF(p[0]) else let xm = x0 + s/2; ym = y0 + s/2; parallelLet T1 = bTree({(x,y,d) in pts | x<xm & y<ym}, (x0,y0,s/2)); T2 = bTree({(x,y,d) in pts | x<xm & y>=ym}, (x0,y0+s/2,s/2)); .. in NODE(cmass(T1,T2,T3,T4),box,T1,T2,T3,T4)
Barnes Hut function force(p,LEAF(p’)) = force(p,p’) | force(p,EMPTY) = 0 | force(p,(c,box,T1,T2,T3,T4) if far(p,box) then forceC(p,c) else force(p,T1)+force(p,T2)+force(p,T3) +force(p,T4) function forces(Points,T) = {force(p,T) : p in Points};
Graph Connectivity 0 2 3 1 4 5 6 Edge List Representation: Edges = [(0,1), (0,2), (2,3), (3,4), (3,5), (3,6), (1,3), (1,5), (5,6), (4,6)]
0 2 1 2 3 1 1 4 1 6 5 6 1 6 2 2 1 1 1 6 1 6 1 1 Graph Contraction 0 2 3 1 4 5 6 Form stars relabel contract
Hooks = [(0,1), (1,3), (1,5), (3,6), (4,6)] Graph Connectivity 0 2 3 1 4 5 6 Edge List Representation: Edges = [(0,1), (0,2), (2,3), (3,4), (3,5), (3,6), (1,3), (1,5), (5,6), (4,6)]
Graph Connectivity L = Vertex Labels, E = Edge List function connectivity(L, E) = if #E = 0 then L else let FL = {coinToss(.5) : x in [0:#L]}; H = {(u,v) in E | Fl[u] and not(Fl[v])}; L = L <- H; E = {(L[u],L[v]): (u,v) in E | L[u]\=L[v]}; in connectivity(L,E);
Conclusions/Questions • Perhaps Functional Programming is adequate for most/all parallel applications. • Collections seems to encourage a functional style even in non functional languages • Give fully deterministic results/and partial results
Quicksort in Multilisp (defun quicksort (L) (qs L nil)) (defun qs (L rest) (if (null L) rest (let ((a (car L)) (L1 (filter (lambda (b) (< b a)) (cdr L))) (L3 (filter (lambda (b) (>= b a)) (cdr L)))) (qs L1 (future (cons a (qs L3 rest))))))) (defun filter (f L) (if (null L) nil (if (f (car L)) (future (cons (car L) (filter f (cdr L)) (filter f (cdr L))))
Quicksort in Multilisp (futures) Work = O(n log n) Not a very good parallel algorithm Span = O(n)
Scan code function addscan(A) = if (#A <= 1) then [0] else let sums = {A[2*i] + A[2*i+1] : i in [0:#a/2]}; evens = addscan(sums); odds = {evens[i] + A[2*i] : i in [0:#a/2]}; in interleave(evens,odds);,
Fourier Transform function fft(a,w) = if #a == 1 then a else let r = {fft(b, w[0:#w:2]): b in [a[0:#a:2],a[1:#a:2]} in {a + b * w : a in r[0] ++ r[0]; b in r[1] ++ r[1]; w in w};
Sparse Vector Matrix Multiply function sparseMxV(M,v) = {sum({v[i]*w : i,w in row}) : row in M};
MapReduce function mapReduce(MAP,REDUCE,documents) = let temp = flatten({MAP(d) : d in documents}); in flatten({REDUCE(k,vs) : (k,vs) in collect(temp)}); • function wordcount(docs) = • mapReduce(d => {(w,1) : w in wordify(d)}, • (w,c) => [(w,sum(c))], • documents); • wordcount(["this is is document 1”, • "this is document 2"]); • [(“1”,1),(“this”,2),(“is”,3),(“document”,2),(“2”,1)]