360 likes | 692 Views
Cilk ++. Kristoffer Stensen Bjørn Fevang. History. Developed since 1994 at the MIT Laboratory for Computer Science Commercial version , Cilk ++, developed by Cilk Arts, Inc. Intel Corporation acquired Cilk Arts in 2009 Released Intel Cilk Plus in 2010. Principle.
E N D
Cilk++ Kristoffer StensenBjørn Fevang
History • Developedsince 1994 at the MIT Laboratory for Computer Science • Commercial version, Cilk++, developed by Cilk Arts, Inc. • Intel Corporation acquiredCilk Arts in 2009 • Released Intel Cilk Plus in 2010
Principle • Programmer responsible for exposingparallelism • Run-time environment divides workbetweenprocessors
Three keywords • cilk_spawn • cilk_sync • cilk_for • Faithfullinguisticextensionof C++ • Serial elision: Removalofthecilkkeywords
DAG Model ofMultithreading • Vertices: instructions • Edges: dependenciesbetweeninstructions • x precedes y, x≺y:x must completeybefore y starts • Neither x≺y, nor y≺x:x and y areparallel (x∥y)
The Work Law • Work: The total amountof time spent in all instructions • Equals execution time on 1 processor: T1 TP≥T1/P
The Span Law • Span: The longestpathofdependencies in the DAG • Equals thetheoretically fastest time the DAG could be executed on a computer with an infinitenumberofprocessors: T∞ TP≥ T∞
Parallelism • The ratio ofwork to span: T1/ T∞
Runtime System • Multiprocessor scheduling is NP-complete • Cilk++ employswork-stealing • Provablytightbounds • The runtime system exploits an arbitrarynumberofcoresnearoptimally • Negligible overhead on single core(less than 2%)
PerformanceBounds • Expectedrunning timeTP ≤ T1/P + O(T∞)
PerformanceBounds • Bounds on stackspace
WorkStealing • Runtime system allocates as many operating-system threads (workers) as thereareprocessors • Worker’sstackoperates like a queue • Spawnedsubroutine’sactivationframe is pushedontothebottomofthestack • Popped from thebottomwhenreturned
WorkStealing • Workersthat run outofworkbecomesthieves and stealthe top frame from anothervictim • Stack is a double-endedqueue • Sufficientparellelism leads to infrequentstealing • Negligible communication and synchronizationcosts
WorkStealing • Adaptswell in multiprogrammedcomputingenvironments • Performance-composable programs
Race Detection • Strand: a sequenceofseriallyexecutedinstructionscontainingnoparallelcontrol • Data race: logicallyparallel strands accessthe same shared location, withnolocks in common, and at least one ofthe strands writesto the location
Cilkscreen • Race detectorbased on provablygoodalgorithms • Guarantees to report a race bug if the race bug is exposed • Identifiestheparallelcontrolconstructs in theexecutingapplicationprecisely • Tracksthe series-parallel relationships of strands • Localizesthe race in theapplicationsourcecode
ReducerHyperobjects • Mitigate races on nonlocal variables withoutcreatinglockcontention or requiringcoderestructuring
Locking • May createbottleneck • Candestroyparallelism • Jumbles up the order
Restructuring • Accumulate and concatenate lists • Time-consuming • May require expert skill
ReducerHyperobject • Linguisticconstruct • Strands have different «views» ofthe same object • A strand canaccess and changeit’sview’s state independently • Viewsarecombinedwithreduce()-method
Reference • Charles E. Leiserson, The Cilk++ concurrencyplatform, 2010