360 likes | 505 Views
Parallel Programming. Introduction. Idea has been around since 1960’s pseudo parallel systems on multiprogram-able computers True parallelism Many processors connected to run in concert Multiprocessor system Distributed system stand-alone systems connected
E N D
Introduction • Idea has been around since 1960’s • pseudo parallel systems on multiprogram-able computers • True parallelism • Many processors connected to run in concert • Multiprocessor system • Distributed system • stand-alone systems connected • More complex with high-speed networks
Programming Languages • Used to express algorithms to solve problems presented by parallel processing systems • Used to write OSs that implement these solutions • Used to harness capabilities of multiple processors efficiently • Used to implement and express communication across networks
Two kinds of parallelism • Existing in underlying hardware • As expressed in programming language • May not result in actual parallel processing • Could be implemented with pseudo parallelism • Concurrent programming – expresses only potential for parallelism
Some Basics • Process • An instance of a program or program part that has been scheduled for independent execution • Heavy-weight process • full-fledged independent entity with all the memory and other resources that are ordinarily allocated by OS • Light-weight process or thread • shares resources with program it came from
Primary requirements for organization • Must be a way for processors to synchronize their activities • 1st processor input and sorts data • 2nd processor waits to perform computations on sorted data • Must be a way for processors to communicate data among themselves • 2nd processor needs data
Architectures • SIMD (single-instruction, multiple-data) • One processor is controller • All processors execute same instructions on respective registers or data sets • Multiprocessing • Synchronous (all processors operate at same speed) • Implicit solution to synchronization problem • MIMD (multiple-instruction, multiple-data) • All processors act independently • Multiprocessor or distributed processor systems • Asynchronous (synchronization critical problem)
OS requirements for Parallelism • Means of creating and destroying processes • Means of managing the number of processors used by processes • Mechanism for ensuring mutual exclusion on shared-memory systems • Mechanism for creating and maintaining communication channels between processors on distributed-memory systems
Language requirements • Machine independence • Adhere to language design principles • Some languages use shared-memory model and provide facilities for mutual exclusion through a library • Some assume distributed-memory model and provide communication facilities • A few include both
Common mechanisms • Threads • Semaphores • Monitors • Message passing
2 common sample problems • Bounded buffer problem • similar to producer-consumer problem • Parallel matrix multiplication • N3 algorithm • Assign a process to compute each element, each process on a separate processor N steps
Without explicit language facilities • One approach is not to be explicit • Possible in some functional, logical, and OO languages • Certain inherent parallelism implicit • Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program • Suboptimal
Another alternative without explicit language facilities • Translator offers compiler options to allow explicit indicating of areas where parallelism is called for. • Most effective in nested loops • Example: Fortran
m_set_procs –sets the number of processes share – access by all processes local – local to process compiler directive synchronizes the processes, all processes wait for entire loop to finish; one process continues after loop integer a(100, 100), b(100, 100), c(100,100) integer i, j, k, numprocs, err numprocs = 10 C code to read in a and b goes here err = m_set_procs (numprocs) C$doacross share (a, b, c), local (j, k) do 10 i = 1, 100 do 10 j = 1, 100 c(i,j) = 0 do 10 k = 1, 100 c(i, j) = c(i,j) + a(i, k) * b (k, j) 10 continue call m_kill_procs C code to write out c goes here end
3rd way with explicit constructs • Provide a library of functions • This passes facilities provided by OS directly to programmer • (This is the same as providing it in language) • Example: C with library parallel.h
m_set_procs –creates the 10 processes, all instances of multiply #include <parallel.h> #define size 100 #define NUMPROCS 10 shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE] void multiply (void) { int i, j, k; for (i=m_get_myid(); i < SIZE; i += NUMPROCS) for (j=0; j < SIZE; j++) for (k=0; k < SIZE; k++) c(i, j) += a(i, k) * b (k, j); } main () { int err; // code to read in a and b goes here m_set_procs (NUMPROCS); m_fork (multiply); m_kill_procs (); // C code to write out c goes here return 0; }
4th final alternative • Simply rely on OS • Example: • pipes in Unix OS ls | grep “java” • runs ls and grep in parallel • output of ls is piped to grep
Language with explicit mechanism • 2 basic ways to create new processes • SPMD (single program multiple data) • split the current process into 2 or more that execute copies of the same program • MPMD (multiple program multiple data) • a segment of code associated with each new process • typical case fork-join model, in which a process creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join) • last example similar, but m_kill_procs takes place of join
Granularity • Size of code assignable to separate processes • fine-grained: statement-level parallelism • medium-grained: procedure-level parallelism • large-grained: program-level parallelism • Can be an issue in program efficiency • small-grained: overhead • large-grained: may not exploit all opportunities for parallelism
Thread • fine-grained or medium-grained without overhead of full-blown process creation
Issues • Does parent suspend execution while child processes are executing, or does it continue to execute alongside them? • What memory, if any, does a parent share with its children or the children share among themselves?
Answers in Last example • parent process suspended execution • indicate explicitly global variables shared by all processes
Process Termination • Simplest case • a process executes its code to completion then ceases to exist • Complex case • process may need to continue executing until a certain condition is met and then terminate
Statement-Level Parallelism (Ada) parbegin S1; S2; … Sn; parend;
Statement-Level Parallelism (Fortran95) FORALL (I = 1:100, J=1:100) C(I,J) = 0; DO 10 K = 1,100 C(I,J) = C(I,J) + A(I,k) * B(K,j) 10 CONTINUE END FORALL
Procedure-Level Parallelism (Ada) x = newprocess(p); … … killprocess(x); • where p is declared procedure and x is a process designator • similar to tasks in Ada
Program-Level Parallelism (Unix) • fork creates a process that is an • exact copy of calling process if (fork ( ) == 0) { /*..child executes this part */} else { /* ..parent executes this part */} • a returned 0-value indicates process is the child
Java threads • built into Java • Thread class part of java.lang package • reserved word synchronize • establish mutual exclusion • create an instance of Thread object • define its run method that will execute when thread starts
Java threads • 2 ways (I’ll show you second more versatile way) • Define a class that implements Runnable interface (define run method) • Then pass an object of this class to the Thread constructor • Note: Every Java program is already executing inside a thread whose run method is main.
Java Thread Example class MyRunner implements Runnable { public void run() { … } } MyRunner m = new MyRunner (); Thread t = new Thread (m); t.start (); //t will now execute the run //method
Destroying threads • let each thread run to completion • wait for other threads to finish t.start (); //do some other work t.join () //wait for t to finish • interrupt it t.start (); //do some other work t.interrupt() //tell t we are waiting… t.join () //wait for t to finish
Mutual exclusion class Queue { … synchronized public Object dequeue () { if (empty()) throw … } synchronized public Object enqueue (Object obj) { … } … }
Mutual exclusion class Remover implements Runnable { public Remover (Queue q) { ..} public void run( ) { …q.dequeue() …} } class Insert implements Runnable { public Insert (Queue q) {…} public void run () { …q.enqueue (…) …} }
Mutual exclusion Queue myqueue = new Queue(..); … Remover r = new Remover (q); Inserter i = new Insert (q); Thread t1 = new Thread (r); Thread t2 = new Thread (i); t1.start(); t2.start();
Manually stalling a thread and then reawakening it class Queue { … synchronized public Object dequeue () { try { while (empty()) wait(); } catch (InterruptedException e) //reset interrupt { … } } synchronized public Object enqueue (Object obj) { … notifyAll(); } … }