270 likes | 572 Views
JavaSeis Parallel Arrays. JavaSeis data structures Synchronous parallel model Arrays in Java mpiJava Parallel Distributed Arrays Examples. JavaSeis Data Structures. View data as N-dimensional array Most common view is a series of 3D volumes
E N D
JavaSeis Parallel Arrays • JavaSeis data structures • Synchronous parallel model • Arrays in Java • mpiJava • Parallel Distributed Arrays • Examples
JavaSeis Data Structures • View data as N-dimensional array • Most common view is a series of 3D volumes • Use generic names for each axis: Sample, Trace, Frame, Volume, Hypercube, … • Associate “LogicalNames” with each axis: • Time, Offset, X-Line, InLine • Time, Channel, Shot, Swath
JavaSeis Dataset Logical View Frame Sample Volume Trace
JavaSeis Bounding Box X-Line InLine
Parallel Distributed Objects Node 0 Node 1 Node 2 Node 3 pdoRead pdoRead pdoRead pdoRead pdoExec pdoExec pdoExec pdoExec pdoWrite pdoWrite pdoWrite pdoWrite
The Transpose Problem CPU 1 CPU 0 CPU 2 CPU 3 Local Access Remote Access
Data Parallel Transpose CPU 1 CPU 0 CPU 2 CPU 3 Local Tile Transpose
Arrays in Java • 1D arrays or “arrays of arrays” • Subarrays and multi-dimensional “views” of a 1D array are not supported by the language • Subarrays are constructed by passing the full array and an upper and lower bound • Example reference from Matuszkek, University of Pennsylvania
Design Sources • NCAR / UCAR NetCDF • University Center for Atmospheric Research • High Performance Fortran • Ken Kennedy, Rice University 1993 • Colorado School of Mines • Dave Hale, Mines Java Toolkit • Landmark ProWESS / SeisSpace • ARCO Parallel Seismic Environment
DistributedArray Class Structure TransposeType mpiJava java.lang.Array IMultiArray MultiArray ITranspose Transpose IParallelContext MPIContext Decomposition DistributedArray
“parallel” and “array” packages • org.javaseis.parallel • IParallelContext, ParallelContext • Message passing support • Decomposition • Define decompositions for array dimensions across processors • org.javaseis.array • IMultiArray, MultiArray • Containers for Fortran style multidimensional arrays • ITranspose, Transpose, TransposeType • Transpose operations for Fortran style arrays • DistributedArray • Extends MultiArray to distribute across processors
MultiArray Design Targets • 1D Java arrays of primitive elements or Objects • A superimposed "shape" that follows Fortran conventions • Access via "range" triplets (start,end,increment) • Ranges for Java zero based indexing or Fortran 1 to N based indexing • Access to the "native" storage array for more arbitrary access • Array "elements" can have multiple values (i.e. complex, multi-component) • Designed to be extended to provide JavaSeis DistributedArrays • Allow use of other array utility classes (java.util.Arrays, edu.mines.jtk.dsp.Array)
Transpose Operations • TransposeType • Java “enum” that defines the set of available transpose operations (i.e. T312, T1243, T21) • ITranspose, Transpose • Interface and “pure java” implementation • In-place 2D transpose is the basic operation • Extended to “132” transpose for 3D arrays • Combinations yield full set of 3D transposes • A single “1243” transpose provided for 4D
Message Passing • IParallelContext • Interface for the minimal set of message passing needed to support JavaSeis Parallel Arrays • Send, Receive, getSize, getRank • Barrier, Broadcast (optional) • Shift, Transpose, BinaryTree built from the above • Init and Finish
MPI for Java • mpiJava from Syracuse University (NPAC) selected for SeisSpace • Java wrappers for native MPI calls • Support for sending serialized objects • MPIContext implements IParallelContext • MPICH for native methods • Mpirun –np 16 –machinefile machines.txt java ClassName arguments
DistributedArray • Extends MultiArray • Requires IParallelContext for constructor • Adds distributed tiled transpose (ttran) • Last dimension is spread across processors (Decomposition, BLOCK or CIRCULAR) • Transpose operations support arbitrary distribution of a single dimension • Multiple decompositions possible but not currently supported
Decomposition • Design concept from High Performance Fortran • Default decomposition is BLOCK • Allocates a fixed number of array indices per node • Remainder is “pushed” to the edge, NOT evenly allocated • May result in zero elements on high rank nodes • Simple start,end indexing with stride 1 • CIRCULAR decomposition • Round robin allocation • Remainder spread across nodes • Good for load balancing • Permutation logic required to keep track of indices
BLOCK vs CIRCULAR Decomposition: 13 array indices on 4 nodes 1:4:1 5:8:1 9:12:1 13:13:1 1:13:4 2:10:4 3:11:4 4:12:4
Transform - Transpose Pattern 0 1 2 0 1 2 3 3 Time X-Line X-Line InLine InLine Frequency
Transform - Transpose Pattern // Create a 3D distributed array DistributedArray a = new DistributedArray( Seis.getParallelContext(), 3, float.class, new int {512,256,128}, Decomposition.BLOCK ); // Transform x axis of an array in xyz order computeTransform1D( a ); // Transpose to yxz a.tran213(); // Transform y axis computeTransform1D( a ); // Transpose to zyx a.tran132(); // Transform z axis computeTransform1D( a ); // Transpose back to xyz a.tran321();
Distributed Array Padding • Decomposition will likely have a remainder that requires padding • Constructor allocates an array that accounts for padding • Use constructor with an array of Decomposition’s if transpose operations will be used • Index and range methods only traverse the “live” section of the array
Distributed Array Padding Padded Array Decomposition Padding Partial Array Section Live Section
Planned Additions • Support for other patterns: • Transpose-Reduce • Transpose-Overlap • Arrays of Arrays – optional variable length float[][]a = new float[][10]; for (int i=0; i<10; i++) a[i] = new float[i]; • Parallel Sorting • Requires variable length “array of arrays”
Reduce - Transpose Pattern PDO ( x, y | f ) PDO ( x, y | n ) PDO ( x, n | y )
The Overlap-Transpose Pattern Overlap-Expand Locally Transpose to Distributed Overlap Distributed array 0 1 2 3 0 1 2 3 0 1 2 3
Parallel Data Sort Sort
Parallel Data Sort Block parallel output Variable length Transpose and resort within tile