370 likes | 490 Views
Investigate and Parallel Processing using E1350 IBM eServer Cluster. Ayaz ul Hassan Khan (g201002860). Objectives. Explore the architecture of E1350 IBM eServer Cluster Parallel Programming: OpenMP MPI MPI+OpenMP Analyzing the effects of above programming models on speedup
E N D
Investigate and Parallel Processing using E1350 IBM eServer Cluster AyazulHassan Khan (g201002860)
Objectives • Explore the architecture of E1350 IBM eServer Cluster • Parallel Programming: • OpenMP • MPI • MPI+OpenMP • Analyzing the effects of above programming models on speedup • Finding out overheads and optimize as much as possible
Cluster System • The cluster is unique in its dual-boot capability with Microsoft Windows HPC Server 2008 and Red Hat Enterprise Linux 5 operating systems. • The cluster has 3 master nodes, one for Red Hat Linux, one for Windows HPC Server 2008 and one for cluster management. • The cluster has 128 compute nodes. • Each compute node of the cluster is dual-processor having two 2.0 GHz x3550 Xeon Quad-core E5405 processors. • The total number of cores in the cluster is 1024. • Each master node has 1 TB of hard disk space and each compute node has 500 GB of hard disk. • Each master node has 8 GB of RAM. • Each compute node has 4 GB of RAM. • The interconnect is 10 GBASE-SR
Experimental Environment • Nodes: hpc081, hpc082, hpc083, hpc084 • Compilers: • icc: for sequential and OpenMP programs • mpiicc: for MPI and MPI+OpenMP programs • Profiling Tools: • ompP: for OpenMP profiling • mpiP: for MPI profiling
Applications Used/Implemented • Jacobi Iterative Method • Max Speedup = 7.1 (OpenMP, Threads = 8) • Max Speedup = 3.7 (MPI, Nodes = 4) • Max Speedup = 9.3 (MPI+OpenMP, Nodes = 2, Threads = 8) • Alternating Direction Integration (ADI) • Max Speedup = 5.0 (OpenMP, Threads = 8) • Max Speedup = 0.8 (MPI, Nodes = 1) • Max Speedup = 1.7 (MPI+OpenMP, Nodes = 1, Threads = 8)
Jacobi IterativeMethod • Solving systems of linear equations
Jacobi IterativeMethod • Sequential Code for(i = 0; i < N; i++){ x[i] = b[i]; } for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } for(i=0; i < N; i++) x[i] = new_x[i];
Jacobi IterativeMethod • OpenMP Code #pragma omp parallel private(k,i,j, sum) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } #pragmaomp for for(i=0; i < N; i++) x[i] = new_x[i]; } }
Jacobi IterativeMethod • OpenMP Performance
Jacobi IterativeMethod • ompP results (barrier) R00002 jacobi_openmp.c (46-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.09 100 0.07 0.01 0.00 1 0.08 100 0.07 0.00 0.00 2 0.08 100 0.07 0.01 0.00 3 0.08 100 0.07 0.01 0.00 4 0.08 100 0.07 0.01 0.00 5 0.08 100 0.07 0.01 0.00 6 0.08 100 0.07 0.01 0.00 7 0.08 100 0.07 0.01 0.00 SUM 0.65 800 0.59 0.06 0.00 R00003 jacobi_openmp.c (56-58) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.01 800 0.00 0.01 0.00
Jacobi IterativeMethod • ompP results (nowait) R00002 jacobi_openmp.c (43-52) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.08 100 0.08 0.00 0.00 1 0.08 100 0.08 0.00 0.00 2 0.08 100 0.08 0.00 0.00 3 0.08 100 0.08 0.00 0.00 4 0.08 100 0.08 0.00 0.00 5 0.08 100 0.08 0.00 0.00 6 0.08 100 0.08 0.00 0.00 7 0.08 100 0.08 0.00 0.00 SUM 0.63 800 0.63 0.00 0.00 R00003 jacobi_openmp.c (53-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.00 800 0.00 0.00 0.00
Jacobi IterativeMethod • MPI Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; for(k = 0; k < MAX_ITER; k++){ for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); }
Jacobi IterativeMethod • MPI Performance
Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Allgather 1 60.1 6.24 19.16 0.00 Allgather 2 58.8 6.11 18.77 0.00 Allgather 3 57.3 5.96 18.29 0.00 Scatter 4 34.6 3.59 11.03 0.00 Scatter 3 31.8 3.30 10.14 0.00 Scatter 1 30.1 3.13 9.61 0.00 Scatter 2 27 2.81 8.62 0.00 Bcast 2 7.05 0.73 2.25 0.00 Allgather 4 4.33 0.45 1.38 0.00 Bcast 3 2.25 0.23 0.72 0.00 Bcast 1 0.083 0.01 0.03 0.00 Bcast 4 0.029 0.00 0.01 0.00
Jacobi IterativeMethod • MPI+OpenMP Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; omp_set_num_threads(T); #pragma omp parallel private(k, i, j, index) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } #pragmaomp master { MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); } } }
Jacobi IterativeMethod • MPI+OpenMP Performance
Jacobi IterativeMethod • ompP results R00002 jacobi_mpi_openmp.c (55-65) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.03 100 0.02 0.01 0.00 1 0.24 100 0.02 0.23 0.00 2 0.24 100 0.02 0.22 0.00 3 0.24 100 0.02 0.22 0.00 4 0.24 100 0.02 0.22 0.00 5 0.24 100 0.02 0.22 0.00 6 0.24 100 0.02 0.22 0.00 7 0.24 100 0.02 0.22 0.00 SUM 1.72 800 0.15 1.56 0.00 R00003 jacobi_mpi_openmp.c (67-70) MASTER TID execTexecC 0 0.22 100 SUM 0.22 100
Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Scatter 8 34.7 9.62 14.11 0.00 Allgather 1 32.6 9.05 13.28 0.00 Scatter 6 31.3 8.70 12.76 0.00 Scatter 2 30.2 8.39 12.31 0.00 Allgather 3 29.9 8.30 12.18 0.00 Allgather 5 27.6 7.67 11.25 0.00 Scatter 4 27.1 7.51 11.02 0.00 Allgather 7 22.1 6.14 9.00 0.00 Bcast 4 7.12 1.98 2.90 0.00 Bcast 6 2.81 0.78 1.14 0.00 Bcast 2 0.09 0.02 0.04 0.00 Bcast 8 0.033 0.01 0.01 0.00
ADI • Alternating Direction Integration
ADI • Sequential Code • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j];
ADI • #pragmaomp parallel private(iter) • { • for(iter = 1; iter <= MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • #pragma omp for private(i,j) nowait • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • #pragmaomp for private(i,j) • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j]; • } • OpenMP Code
ADI • OpenMP Performance
ADI • ompP results R00002 adi_openmp.c (43-50) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.18 100 0.18 0.00 0.00 1 0.18 100 0.18 0.00 0.00 2 0.18 100 0.18 0.00 0.00 3 0.18 100 0.18 0.00 0.00 4 0.18 100 0.18 0.00 0.00 5 0.18 100 0.18 0.00 0.00 6 0.18 100 0.18 0.00 0.00 7 0.18 100 0.18 0.00 0.00 SUM 1.47 800 1.47 0.00 0.00 R00003 adi_openmp.c (52-57) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.11 100 0.10 0.01 0.00 1 0.11 100 0.10 0.01 0.00 2 0.11 100 0.10 0.01 0.00 3 0.10 100 0.10 0.00 0.00 4 0.11 100 0.10 0.01 0.00 5 0.10 100 0.10 0.01 0.00 6 0.10 100 0.10 0.01 0.00 7 0.10 100 0.10 0.00 0.00 SUM 0.84 800 0.78 0.06 0.00 R00004 adi_openmp.c (61-68) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.38 100 0.38 0.00 0.00 1 0.31 100 0.31 0.00 0.00 2 0.35 100 0.35 0.00 0.00 3 0.29 100 0.29 0.00 0.00 4 0.35 100 0.35 0.00 0.00 5 0.36 100 0.36 0.00 0.00 6 0.36 100 0.36 0.00 0.00 7 0.37 100 0.37 0.00 0.00 SUM 2.77 800 2.77 0.00 0.00 R00005 adi_openmp.c (70-75) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.16 100 0.16 0.00 0.00 1 0.23 100 0.15 0.07 0.00 2 0.19 100 0.14 0.05 0.00 3 0.25 100 0.16 0.09 0.00 4 0.19 100 0.14 0.05 0.00 5 0.18 100 0.17 0.01 0.00 6 0.18 100 0.17 0.01 0.00 7 0.17 100 0.17 0.01 0.00 SUM 1.55 800 1.26 0.29 0.00
ADI • MPI Code • MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • for(i=myrank*(N/P), k=0; k<N/P; i++, k++) • for(j=0;j<N;j++) • apart[k][j] = a[i][j]; • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • for (i = 0; i < N/P; i++){ • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j];
ADI • MPI Code MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); //transpose matrices trans(x, N, N); trans(b, N, N); trans(a, N, N); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); for(i=myrank*(N/P), k=0; k<N/P; i++, k++) for(j=0;j<N;j++) apart[k][j] = a[i][j]; }
ADI • MPI Performance
ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 1 8.63e+04 22.83 23.54 0.00 Gather 3 6.29e+04 16.63 17.15 0.00 Gather 2 6.08e+04 16.10 16.60 0.00 Gather 4 5.83e+04 15.43 15.91 0.00 Scatter 4 3.31e+04 8.76 9.03 0.00 Scatter 2 3.08e+04 8.14 8.39 0.00 Scatter 3 2.87e+04 7.58 7.81 0.00 Scatter 1 5.53e+03 1.46 1.51 0.00 Bcast 2 50.8 0.01 0.01 0.00 Bcast 4 50.8 0.01 0.01 0.00 Bcast 3 49.5 0.01 0.01 0.00 Bcast 1 40.4 0.01 0.01 0.00 Reduce 1 2.57 0.00 0.00 0.00 Reduce 3 0.259 0.00 0.00 0.00 Reduce 2 0.056 0.00 0.00 0.00 Reduce 4 0.052 0.00 0.00 0.00
ADI • MPI+OpenMP Code MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); omp_set_num_threads(T); #pragmaomp parallel private(iter) { int id, sindex, eindex; intm,n; id = omp_get_thread_num(); sindex = id * node_rows/T; eindex = sindex + node_rows/T; int l = myrank*(N/P); for(m=sindex; m<eindex; m++){ for(n=0;n<N;n++) apart[m][n] = a[l+m][n]; l++; }
ADI • MPI+OpenMP Code • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N/P; i++) • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j]; • #pragmaomp master • { • MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • #pragmaomp barrier
ADI • MPI+OpenMP Code • #pragmaomp sections • { • #pragmaomp section • { trans(x, N, N); } • #pragmaomp section • { trans(b, N, N); } • #pragmaomp section • { trans(a, N, N); } • } • #pragmaomp barrier • #pragmaomp master • { • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • l = myrank*(N/P); • for(m=sindex; m<eindex; m++){ • for(n=0;n<N;n++) • apart[m][n] = a[l+m][n]; • l++; • } • } • #pragmaomp barrier • }
ADI • MPI+OpenMP Performance
ADI • ompP results R00002 adi_mpi_scatter_openmp.c (89-96) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.05 200 0.05 0.00 0.00 1 0.05 200 0.05 0.00 0.00 2 0.08 200 0.08 0.00 0.00 3 0.08 200 0.08 0.00 0.00 4 0.08 200 0.08 0.00 0.00 5 0.08 200 0.08 0.00 0.00 6 0.08 200 0.08 0.00 0.00 7 0.08 200 0.08 0.00 0.00 SUM 0.58 1600 0.58 0.00 0.00 R00003 adi_mpi_scatter_openmp.c (99-104) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.06 200 0.05 0.01 0.00 1 34.23 200 0.05 34.18 0.00 2 34.22 200 0.05 34.17 0.00 3 34.22 200 0.05 34.17 0.00 4 34.21 200 0.05 34.16 0.00 5 34.20 200 0.05 34.15 0.00 6 34.21 200 0.05 34.16 0.00 7 34.20 200 0.05 34.15 0.00 SUM 239.54 1600 0.39 239.14 0.00
ADI R00005 adi_mpi_scatter_openmp.c (113) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 64.29 200 0.00 2 64.29 200 0.00 3 64.29 200 0.00 4 64.29 200 0.00 5 64.29 200 0.00 6 64.29 200 0.00 7 64.29 200 0.00 SUM 450.02 1600 0.00 R00004 adi_mpi_scatter_openmp.c (106-111) MASTER TID execTexecC 0 64.28 200 SUM 64.28 200 R00006 adi_mpi_scatter_openmp.c (116-130) SECTIONS TID execTexecCsectTsectCexitBarTmgmtTtaskT 0 0.85 200 0.85 200 0.00 0.00 0.00 1 0.85 200 0.83 200 0.02 0.00 0.00 2 0.85 200 0.44 200 0.41 0.00 0.00 3 0.85 200 0.00 0 0.85 0.00 0.00 4 0.85 200 0.00 0 0.85 0.00 0.00 5 0.85 200 0.00 0 0.85 0.00 0.00 6 0.85 200 0.00 0 0.85 0.00 0.00 7 0.85 200 0.00 0 0.85 0.00 0.00 SUM 6.80 1600 2.12 600 4.67 0.01 0.00 • ompP results
ADI R00007 adi_mpi_scatter_openmp.c (132) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 0.00 200 0.00 2 0.00 200 0.00 3 0.00 200 0.00 4 0.00 200 0.00 5 0.00 200 0.00 6 0.00 200 0.00 7 0.00 200 0.00 SUM 0.01 1600 0.00 R00008 adi_mpi_scatter_openmp.c (134-138) MASTER TID execTexecC 0 34.46 200 SUM 34.46 200 R00009 adi_mpi_scatter_openmp.c (149) BARRIER TID execTexecCtaskT 0 0.00 1 0.00 1 0.28 1 0.00 2 0.28 1 0.00 3 0.28 1 0.00 4 0.28 1 0.00 5 0.28 1 0.00 6 0.28 1 0.00 7 0.28 1 0.00 SUM 1.94 8 0.00 • ompP results
ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 2 8.98e+04 23.32 23.52 0.00 Gather 6 6.57e+04 17.05 17.19 0.00 Gather 8 6.45e+04 16.74 16.89 0.00 Gather 4 6.17e+04 16.03 16.16 0.00 Scatter 4 3.39e+04 8.79 8.87 0.00 Scatter 8 3.1e+04 8.06 8.13 0.00 Scatter 6 2.96e+04 7.68 7.75 0.00 Scatter 2 5.4e+03 1.40 1.41 0.00 Bcast 7 49.5 0.01 0.01 0.00 Bcast 3 49.3 0.01 0.01 0.00 Bcast 5 47.8 0.01 0.01 0.00 Bcast 1 40 0.01 0.01 0.00 Scatter 1 30.5 0.01 0.01 0.00 Scatter 5 30.3 0.01 0.01 0.00 Scatter 7 30.3 0.01 0.01 0.00 Scatter 3 28.8 0.01 0.01 0.00 Reduce 1 1.8 0.00 0.00 0.00 Reduce 5 0.062 0.00 0.00 0.00 Reduce 3 0.049 0.00 0.00 0.00 Reduce 7 0.049 0.00 0.00 0.00
Thanks • Q & A • Any Suggestions?