1 / 34

Paraguin Compiler

Paraguin Compiler. Examples. Examples. Matrix Addition (the complete program) Traveling Salesman Problem (TSP) Sobel Edge Detection. Matrix Addition. The complete program. Matrix Addition (complete). #define N 512 #ifdef PARAGUIN typedef void* __builtin_va_list; #endif

elaineclark
Download Presentation

Paraguin Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paraguin Compiler Examples

  2. Examples • Matrix Addition (the complete program) • Traveling Salesman Problem (TSP) • Sobel Edge Detection

  3. Matrix Addition The complete program

  4. Matrix Addition (complete) #define N 512 #ifdef PARAGUIN typedef void* __builtin_va_list; #endif #include <stdio.h> #include <math.h> #include <sys/time.h> print_results(char *prompt, float a[N][N]); int main(int argc, char *argv[]) { int i, j, error = 0; float a[N][N], b[N][N], c[N][N]; char *usage = "Usage: %s file\n"; FILE *fd; double elapsed_time; struct timeval tv1, tv2;

  5. Matrix Addition (complete) if (argc < 2) { fprintf (stderr, usage, argv[0]); error = -1; } if ((fd = fopen (argv[1], "r")) == NULL) { fprintf (stderr, "%s: Cannot open file %s for reading.\n", argv[0], argv[1]); fprintf (stderr, usage, argv[0]); error = -1; } #pragma paraguin begin_parallel #pragma paraguin bcast error if (error) return -1; #pragma paraguin end_parallel

  6. Matrix Addition (complete) // Read input from file for matrices a and b. // The I/O is not timed because this I/O needs // to be done regardless of whether this program // is run sequentially on one processor or in // parallel on many processors. Therefore, it is // irrelevant when considering speedup. for (i = 0; i < N; i++) for (j = 0; j < N; j++) fscanf (fd, "%f", &a[i][j]); for (i = 0; i < N; i++) for (j = 0; j < N; j++) fscanf (fd, "%f", &b[i][j]); fclose (fd);

  7. Matrix Addition (complete) ; #pragma paraguin begin_parallel // This barrier is here so that we can take a time stamp // Once we know all processes are ready to go. #pragma paraguin barrier #pragma paraguin end_parallel // Take a time stamp gettimeofday(&tv1, NULL); #pragma paraguin begin_parallel #pragma paraguin scatter a b // Parallelize the following loop nest assigning iterations // of the outermost loop (i) to different partitions. #pragma paraguin forall for (i = 0; i < N; i++) for (j = 0; j < N; j++) c[i][j] = a[i][j] + b[i][j];

  8. Matrix Addition (complete) ; #pragma paraguin gather c #pragma paraguin end_parallel // Take a time stamp. This won't happen until after the master // process has gathered all the input from the other processes. gettimeofday(&tv2, NULL); elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / 1000000.0); printf ("elapsed_time=\t%lf (seconds)\n", elapsed_time); // print result print_results("C = ", c); }

  9. Matrix Addition (complete) print_results(char *prompt, float a[N][N]) { int i, j; printf ("\n\n%s\n", prompt); for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { printf(" %.2f", a[i][j]); } printf ("\n"); } printf ("\n\n"); }

  10. Matrix Addition • After compiling with the command: • scc –DPARAGUIN –D__x86_64__ matrixadd.c –cc mpicc \ –o matrixadd.out • This produces: • matrixadd.out • After compiling with the command: • scc –DPARAGUIN –D__x86_64__ matrixadd.c -.out.c • This produces: • matrixadd.out .c (MPI source code) All on one line

  11. Traveling Salesman Problem (TSP)

  12. The Traveling Salesman Problem is simply to find the shortest circuit (Hamiltonian circuit) that visits every city in a set of cities at most once

  13. This problem falls into the class of “NP-hard” problems • What that means is that there is no known “polynomial” time (“big-oh” of a polynomial) algorithm that can solve it • The only know algorithm to solve it is to compare the distances of all possible Hamiltonian circuits. • But there are N! possible circuits of N cities.

  14. Yes, heuristics can be applied to find a “good” solution fast, but there’s no guarantee it is the best • The “brute force” algorithm is to consider all possible permutations of the N cities • First we’ll fix the first city since there are N equivalent circuits where we rotate the cities • We will consider the reverse directions to be different circuits but that’s hard to account for

  15. If we number the cities from 0 to N-1, and 0 is the origination city, then the possible permutations of 4 cities are: • 0->1->2->3->0 • 0->1->3->2->0 • 0->2->3->1->0 • 0->2->1->3->0 • 0->3->1->2->0 • 0->3->2->1->0 Notice that there are some permutations that are the reverse of other. These are equivalent permutations. Since we are fixing origination city, there are (N-1)! permutations instead of N!.

  16. We can compute the distances between all pairs of locations (O(N2)) • This is the input

  17. Problem: Iterating through the possible permutations is recursive, but we need a straight forward for loop to parallelize • Solution: Use a for loop to assign the first two cities • Since city 0 is fixed, there are n-1 choices for city 1 and n-2 choices for city 2 • That means there are (n-1)(n-2) = n2 – 3n + 2 combinations of the first two cities

  18. Assignment of cities 0-2 N = n*n - 3*n + 2; // (n-1)(n-2) perm[0] = 0; for (i = 0; i < N; i++) { perm[1] = i / (n-2) + 1; perm[2] = i % (n-2) + 1; ...

  19. // This structure is used for the "minloc" reduction. We want to find the // minimum distance as well as which processor found it so that we can // get the final minimum circuit. struct { float minDist; int rank; } myAnswer, resultAnswer; int main(intargc, char *argv[]) { inti, j, k, N, p; int perm[MAX_NUM_CITIES], minPerm[MAX_NUM_CITIES+1]; float D[MAX_NUM_CITIES][MAX_NUM_CITIES]; float dist, minDist, finalMinDist; int abort; To use minloc, we need a value to minimize and the location (rank)

  20. abort = processArgs(argc, argv); if (!abort) { for (i = 0; i < n; i++) { D[i][i] = 0.0f; for (j = 0; j < i; j++) { fscanf (fd, "%f", &D[i][j]); D[j][i] = D[i][j]; } } } else { if (n <= 1) printf ("0 0 0\n"); else n = 0; } #pragmaparaguinbegin_parallel #pragmaparaguinbcast abort if (abort) return -1; Read in the lower triangular matrix of relative distances between cities. The values are mirrored across the diagonal.

  21. #pragmaparaguinbcast n D perm[0] = 0; minDist = 9.0e10; // Near the largest value we can represent with a float if (n == 2) { perm[1] = 1; // If n = 2, the N = 0, and we are done. minPerm[0] = perm[0]; minPerm[1] = perm[1]; minDist = computeDist(D, n, perm); } N = n*n - 3*n + 2; // N = (n-1)(n-2)

  22. #pragmaparaguinforall for (p = 0; p < N; p++) { perm[1] = p / (n-2) + 1; perm[2] = p % (n-2) + 1; if (perm[2] >= perm[1]) perm[2]++; initialize(perm, n, 3); do { dist = computeDist(D, n, perm); if (minDist > dist) { minDist = dist; for (i = 0; i < n; i++) minPerm[i] = perm[i]; } } while (increment(perm,n)); } After cities 0, 1, and 2 have been determined, initialize the rest of the cities for the first permutation. Keep the shortest circuit seen thus far. Move to the next permutation

  23. myAnswer.minDist = minDist; myAnswer.rank = __guin_rank; #pragmaparaguin reduce minlocmyAnswerresultAnswer #pragmaparaguinbcastresultAnswer if (__guin_rank == resultAnswer.rank) { printf (" %f ", minDist); for (i = 0; i < n; i++) printf ("%d ", minPerm[i]); printf ("%d\n", minPerm[0]); } #pragmaparaguinend_parallel struct { float minDist; int rank; } myAnswer, resultAnswer; If the current processors is the one who found the solution, then report the solution.

  24. Demonstration

  25. Sobel Edge Detection

  26. Sobel Edge Detection • Given an image, the problem is to detect where the “edges” are in the picture

  27. Sobel Edge Detection

  28. Sobel Edge Detection Algorithm /* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; // handle image boundaries if(x==0 || x==(h-1) || y==0 || y==(w-1)) sum = 0; else{

  29. Sobel Edge Detection Algorithm //x gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]); //y gradient approx for(i=-1; i<=1; i++) for(j=-1; j<=1; j++) sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]); //gradient magnitude approx sum = (abs(sumx) + abs(sumy)); } edgeImage[x][y] = clamp(sum); } } There are no loop-carried dependencies. Therefore, this is a Scatter/Gather pattern.

  30. Loop Carried Dependency • A Loop-Carried Dependency is when a value is computed in one iteration and used in another for (i = 1; i < n; i++) { A[i] = f(A[i-1]); <1>: a[1] = f(a[0]); <2>: a[2] = f(a[1]); <3>: a[3] = f(a[2]); <4>: a[4] = f(a[3]); ... • If we run this loop as a forall, then inter-processors communication is needed

  31. Sobel Edge Detection Algorithm • Inputs (that need to be broadcast or scattered): • GX and GY arrays • grayImage array • w and h (width and height) • There are 4 nested loops (x, y, i, and j) • The final answer is the array • edgeImage

  32. Sobel Edge Detection Algorithm #pragma paraguin begin_parallel /* 3x3 Sobel masks. */ GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1; GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1; #pragma paraguin bcast grayImage w h #pragma paraguin forall 1 for(x=0; x < N; ++x){ for(y=0; y < N; ++y){ sumx = 0; sumy = 0; ... These are the inputs Partition the x loop (outermost loop) Using cyclic scheduling

  33. Sobel Edge Detection Algorithm ... edgeImage[x][y] = clamp(sum); } } ; #pragma paraguin gather edgeImage Gather all elements of the edgeImage array

  34. Questions

More Related