220 likes | 516 Views
NESL: Revisited. Guy Blelloch Carnegie Mellon University. Experiences from the Lunatic Fringe. Guy Blelloch Carnegie Mellon University. Title: 1995 Talk on NESL at ARPA PI Meeting. NESL : Motivation. Language for describing parallel algorithms Ability to analyze runtime
E N D
NESL: Revisited Guy Blelloch Carnegie Mellon University
Experiences from the Lunatic Fringe Guy Blelloch Carnegie Mellon University Title: 1995 Talk on NESL at ARPA PI Meeting
NESL : Motivation Language for describing parallel algorithms • Ability to analyze runtime • To describe known algorithms Portable across different architectures • SIMD and MIMD • Shared and Distribute memory Simple • Easy to program, analyze and debug
NESL : In a nutshell Simple Call-by-Value Functional Language + Built in Parallel type (nested sequences) + Parallel map (apply-to-each) + Parallel aggregate operations + Cost semantics (work and depth) *Sequential Semantics* Some non-pure features at “top level”
NESL : History • Developed in 1990 • Implemented on CM, Cray, MPI, and sequentially using a stack based intermediate language • Interactive environment with remote calls • Over 100 algorithms and applications written – used to teach parallel algorithms • Mostly dormant since 1997
Original “mapquest” Web based interface for finding addresses Zooming, panning, finding restaurants
NESL : Nested Sequences Built-in “parallel” type: [3.0, 1.0, 2.0] : [float] [[4, 5, 1, 6], [2], [8, 11, 3]] : [[int]] “Yoknapatawpah County” : [char] [“the”, “rain”, “in”, “Spain”] : [[char]] [(3,”Italy”), (1, “sun”)] : [int*[char]]
NESL: Parallel Map A = [3.0, 1.0, 2.0] B = [[4, 5, 1, 6], [2], [8, 11, 3]] C = “Yoknapatawpah County” D = [“the”, “rain”, “in”, “Spain”] Sequence Comprehensions: {x + .5 : x in A} -> [3.5, 1.5, 2.5] {sum(b) : b in B} -> [16, 2, 22] {c in C | c < ‘n} -> “kaaaahc” {w[0] : w in D} -> “triS”
NESL : Aggregate Operations A = [3.0, 1.0, 2.0] D = [“the”, “rain”, “in”, “Spain”] E = [(3,”Italy”), (1,“sun”)] Parallel write : [‘a] * [int*‘a] -> [‘a] D <- E -> [“the”,“sun”,“in”,“Italy”] Prefix sum : (‘a*‘a->‘a)*‘a*[‘a] -> [‘a]*‘a scan(‘+,2.0,A) -> ([2.0,5.0,6.0],8.0) plus_scan(A) -> [0.0,3.0,4.0] sum(A) -> 6.0
NESL: Cost Model Combining for parallel map: pexp = {exp(e) : e in A} Can prove runtime bounds for PRAM: T = O(W/P + D log P)
NESL Other Libraries • String operations • Graphical interface • CGI interface for web applications • Dictionary operations (hashing) • Matrices
Example : Quicksort (Version 1) function quicksort(S) = if (#S <= 1) then S else let a = S[rand(#S)]; S1 = {e in S | e < a}; S2 = {e in S | e = a}; S3 = {e in S | e > a}; in quicksort(S1) ++ S2 ++ quicksort(S3); D =O(n) W = O(n log n)
Example : Quicksort (Version 2) function quicksort(S) = if (#S <= 1) then S else let a = S[rand(#S)]; S1 = {e in S | e < a}; S2 = {e in S | e = a}; S3 = {e in S | e > a}; R = {quicksort(v) : v in [S1, S3]}; in R[0] ++ S2 ++ R[1]; D = O(log n) W = O(n log n)
Example : Representing Graphs 0 2 3 1 4 Edge List Representation: [(0,1), (0,2), (2,3), (3,4), (1,3), (1,0), (2,0), (3,2), (4,3), (3,1)] Adjacency List Representation: [[1,2], [0,3], [0,3], [1,2,4], [3]]
Use hashing to avoid non-determinism Example : Graph Connectivity L = Vertex Labels, E = Edge List function randomMate(L, E) = if #E = 0 then L else let FL = {randBit(.5) : x in [0:#L]}; H = {(u,v) in E | Fl[u] and not(Fl[v])}; L = L <- H; E = {(L[u],L[v]): (u,v) in E | L[u]\=L[v]}; in randomMate(L,E); D = O(log n) W = O(m log n)
Lesson 1: Sequential Semantics • Debugging is much easier without non-determinism • Analyzing correctness is much easier without non-determinism • If it works on one implementation, it works on all implementations • Some problems are inherently concurrent—these aspects should be separated
Lesson 2: Cost Semantics • Need a way to analyze cost, at least approximately, without knowing details of the implementation • Any cost model based on processors is not going to be portable – too many different kinds of parallelism
Lesson 3: Too Much Parallelism Needed ways to back out of parallelism • Memory problem • The “flattening” compiler technique was too aggressive on its own • Need for Depth First Schedules or other scheduling techiques • Various bounds shown on memory usage
Limitations Communication was a bottleneck on machines available in the mid 1990s and required “micromanaging” data layout for peak performace. Language would needs to be extended • PSCICO Project (Parallel Scientific Computing) was looking into this Hard to get users for a new language
Relevance to Multicore Architecture • Communication is hopefully better than across chips • Can make use of multiple forms of parallelism (multiple threads, multiple processors, multiple function units) • Schedulers can take advantage of shared caching [SPAA04] • Aggregate operations can possibly make use of on-chip hardware support