380 likes | 514 Views
Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte. A Grid Parallel Application Framework. Overview. Parallel Applications on the Grid Latency Hiding by Redundant Processing (LHRP) PGAFramework Related work Conclusion.
E N D
Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte A Grid Parallel Application Framework
Overview • Parallel Applications on the Grid • Latency Hiding by Redundant Processing (LHRP) • PGAFramework • Related work • Conclusion
Parallel Applications on the Grid • Advantages • Access to more resources • Lower costs • Future profits from Grid Economy ? • Challenges • IO problem • Need for easy-to-use Interface • Heterogeneous hardware
Latency Hiding by Redundant Processing • Latency Hiding problem • LHRP Algorithm • CPU type • CPU task assigned to each CPU type • Versioning system • Mathematical model to describe LHRP • Results
LHRP • Latency Hiding • Latency Hiding by Redundantly Processing
LHRP Algorithm • Internal: Only communicates with LAN CPUs. • Border: Communicates with LAN CPUs and one Buffer CPU • Buffer: Communicates with LAN Border CPU and receives data from WAN Border CPU
Computation and Communication Stages • Internal: • Computes borders • Transfers borders (Non-blocking) • Computes core matrix • Waits for transfer ACK
Computation and Communication Stages • Border: • Computes borders • Transfers borders (Non-blocking) • Sends far border • Computes core matrix • Waits for transfer ACK • Checks on far border transfer ACK (if it is the last iteration Wait)
Computation and Communication Stages • Buffer: • Computes borders • Transfers borders (Non-blocking) • Receives far border • Computes core matrix • Waits for transfer ACK • Checks on far border transfer ACK (if it is the last iteration Wait)
LHRP Algorithm Review • Node types: • Internal • Border • Buffer • Far Border transfer • Buffer Node Versioning system
Estimated Algorithm Performance • G: Grid Latency • I: Internal Latency • B: Amount of data tuples used by the Buffer Node • W: Total amount of work for all CPUs • C: Amount of CPUs doing non-redundant work
Experimental Result: Memory Footprint • 21% increase memory use over conventional form of Latency Hiding. • Causes: • Extra Matrix in Buffer Node to store old column versions • Extra far border buffers.
PGAFramework • Objective • Design Requirements • Implementation technology choices • API Design • API Workpool Example • Other API features • Synchronization option • Recursive option
PGAFramework Objective: To create an efficient parallel application framework for the grid that allows a user programmer easy interaction with the Grid resources.
Design Requirements • Platform independence • Self Deployment • Easy-to-Use Interface • Provide the following services without requiring extra effort on the part of the user programmer: • Load Balancing • Scheduling • Fault tolerance • Latency/Bandwidth tolerance
GPAFramework User's Application API (Interface) User's Applications Load Balancing Scheduling Fault Tolerance Latency Bandwidth Tolerance GPAFramework Globus Job Scheduler (Condor) Hardware Resources Design
Desktop PCs Node Deployment Job Submit Node Scheduling Service Resource Discovery GridWay ? Globus Globus Globus Condor SGE PBS Cluster computer node Super computer
Implementation • Java • Platform Independence • JXTA (JXSE) • Peer-to-peer API • Provides tools to work-around NAT's and firewalls • Provides library and module runtime loading • API
Motivation for API Design • Video Codecs • Codecs follow an interfaces • What happens inside the codec does not matter • The input and output for the codec needs to be specified Video Player Display a Gui Load File ... Mpeg endoded stream mpeg ogg h.264 Raw video Data Output video to screen
PGAFramework API API • Schedule processes on Resource • Load user Data • There may be multiple “template” API's • Each API has Interfaces that the user implements • The user “Inserts” his module into the framework • Give data to framework • Create network • Determine topology and net behavior • Send user process to compute nodes • Get data from framework • Compute on data • Return processed data • Request sync (optional) • Get Data from user class • Send to master node • Get data from framework • Store or pipe data • Repeat process in loop until done
API API Sample Code
Synchronization option • RemoteHandler provides an Interface to synchronize data • Data is synced non-blocking • User creates blocking procedures if needed
Recursive Feature • Allows multiple level of parallelization (granularity) Blur pictures Decode Video Cut Raw Video Into Pictures Blur portion of picture Pipeline Work pool Synchronous
Related Work • MPI Implementation for the Grid • MPICH-G2 • GridMPI • MPICH-V2 (MPICH-V1) • Peer-to-peer parallel frameworks • P2PMPI (for cluster computing) • P3 (for cluster computing) • Self deploying frameworks • Jojo
Conclusions • Parallel Applications on the Grid • Latency Hiding by Redundant Processing (LHRP) • PGAFramework • Related work