170 likes | 258 Views
Are New Languages Necessary for Manycore? . David I. August Department of Computer Science Princeton University. THIS is the Problem!. ?. 2004. SPEC CPU INTEGER PERFORMANCE. TIME. Why New Multicore Languages Will Fail. Money is earned by relieving customer pain The Market
E N D
Are New Languages Necessary for Manycore? David I. August Department of Computer Science Princeton University
THIS is the Problem! ? 2004 SPEC CPU INTEGER PERFORMANCE TIME
Why New Multicore Languages Will Fail • Money is earned by relieving customer pain • The Market • Legacy, Legacy, Legacy • Programmers adopt new programming models • Parallel programming is more difficult • Parallel programming models have longevity issues • Automatic Thread Extraction (ATE)
Automatic Thread Extraction “That isn't to say we are parallelizing arbitrary C code, that's a fool's errand!” – Richard Lethin “Compiler can’t determine a tree from a graph…” – Burton Smith “Compiler can’t determine dependences without type information. Even then…” – Burton Smith “Decades of automatic parallelization work has been a failure…” – James Larus “All that icky pointer chasing code...” – Tim Mattson
How To Get Parallelism For Multicore? • Nine months ago, with an open mind… • A priori select ALL C programs from SPEC CINT 2000 • Our objective function (in priority order): • Extract meaningful parallelism • Prefer automatic over manual • Minimize impact to the programmer when manual
Our Results M.L.O.P.: 5 Generations 32 Cores 5.3x Speedup
Our Recipe Recent Compiler Technology: • Decoupled Software Pipelining (DSWP) [MICRO 05] • Parallel-Stage DSWP (PS-DSWP) • Speculative DSWP (Spec-DSWP) [PACT 07] • Existing Technology: Speculative DOALL, TLS • Targeted Memory Profiling • Procedure Boundary Elimination [PLDI 06] Hardware Support: • Compiler-Controlled Speculation • Streaming Communication [MICRO 06]
Typical Example: 197.parser Threads run on multicore model with Itanium 2 cores. Find English Sentences Parse Sentences (95%) Emit Results DSWP PS-DSWP (Spec DOALL Middle Stage)
What We Learned • A new way of thinking about dependences: Go With the Flow • TLP is easier to extract than ILP • A holistic approach is better • A limitation exists in the sequential model: Determinism
Determinism: A Double Edged Sword while(<cond>): <work> x = Rand() <work> int Rand(): state = f2(state) return f1(state) 1 2 3 4 DOALL 1 2 3 4 SEQUENTIAL 56 LOCs in 11 programs: 22 annotations Only 2 programs needed more Most common culprit: Custom Allocators
What about Manycore? Multicore • New languages aren’t necessary • Legacy code easily adjusted Manycore • Implicitly Parallel Sequential Programming • No optimization for sequential (custom allocators) • Points of non-determinism specified • Parallel algorithms in sequential codes • Debuggability, Understandability, Sanity
The Answer Originates with ATE The Old Way: PL folks would write languages, Architecture folks would make HW, andCompiler folks would dutifully connect the two. This will fail for Manycore: • Unduly burden the programmer • Performance will suffer There’s a New Way…
SPEC 2006: 403.gcc Threads run on multicore model with Itanium 2 cores.