A Grid Parallel Application Framework

Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte A Grid Parallel Application Framework

Overview • Parallel Applications on the Grid • Latency Hiding by Redundant Processing (LHRP)‏ • PGAFramework • Related work • Conclusion

Parallel Applications on the Grid • Advantages • Access to more resources • Lower costs • Future profits from Grid Economy ? • Challenges • IO problem • Need for easy-to-use Interface • Heterogeneous hardware

Latency Hiding by Redundant Processing • Latency Hiding problem • LHRP Algorithm • CPU type • CPU task assigned to each CPU type • Versioning system • Mathematical model to describe LHRP • Results

LHRP • Latency Hiding • Latency Hiding by Redundantly Processing

LHRP Algorithm • Internal: Only communicates with LAN CPUs. • Border: Communicates with LAN CPUs and one Buffer CPU • Buffer: Communicates with LAN Border CPU and receives data from WAN Border CPU

Computation and Communication Stages • Internal: • Computes borders • Transfers borders (Non-blocking)‏ • Computes core matrix • Waits for transfer ACK

Computation and Communication Stages • Border: • Computes borders • Transfers borders (Non-blocking)‏ • Sends far border • Computes core matrix • Waits for transfer ACK • Checks on far border transfer ACK (if it is the last iteration Wait)‏

Computation and Communication Stages • Buffer: • Computes borders • Transfers borders (Non-blocking)‏ • Receives far border • Computes core matrix • Waits for transfer ACK • Checks on far border transfer ACK (if it is the last iteration Wait)‏

Buffer Node Versioning Algorithm

LHRP Algorithm Review • Node types: • Internal • Border • Buffer • Far Border transfer • Buffer Node Versioning system

Estimated Algorithm Performance • G: Grid Latency • I: Internal Latency • B: Amount of data tuples used by the Buffer Node • W: Total amount of work for all CPUs • C: Amount of CPUs doing non-redundant work

Estimated Algorithm Performance

Experimental Result: Memory Footprint • 21% increase memory use over conventional form of Latency Hiding. • Causes: • Extra Matrix in Buffer Node to store old column versions • Extra far border buffers.

Experimental Results: Performance

PGAFramework • Objective • Design Requirements • Implementation technology choices • API Design • API Workpool Example • Other API features • Synchronization option • Recursive option

PGAFramework Objective: To create an efficient parallel application framework for the grid that allows a user programmer easy interaction with the Grid resources.

Design Requirements • Platform independence • Self Deployment • Easy-to-Use Interface • Provide the following services without requiring extra effort on the part of the user programmer: • Load Balancing • Scheduling • Fault tolerance • Latency/Bandwidth tolerance

GPAFramework User's Application API (Interface)‏ User's Applications Load Balancing Scheduling Fault Tolerance Latency Bandwidth Tolerance GPAFramework Globus Job Scheduler (Condor)‏ Hardware Resources Design

Desktop PCs Node Deployment Job Submit Node Scheduling Service Resource Discovery GridWay ? Globus Globus Globus Condor SGE PBS Cluster computer node Super computer

Implementation • Java • Platform Independence • JXTA (JXSE)‏ • Peer-to-peer API • Provides tools to work-around NAT's and firewalls • Provides library and module runtime loading • API

Motivation for API Design • Video Codecs • Codecs follow an interfaces • What happens inside the codec does not matter • The input and output for the codec needs to be specified Video Player Display a Gui Load File ... Mpeg endoded stream mpeg ogg h.264 Raw video Data Output video to screen

PGAFramework API API • Schedule processes on Resource • Load user Data • There may be multiple “template” API's • Each API has Interfaces that the user implements • The user “Inserts” his module into the framework • Give data to framework • Create network • Determine topology and net behavior • Send user process to compute nodes • Get data from framework • Compute on data • Return processed data • Request sync (optional)‏ • Get Data from user class • Send to master node • Get data from framework • Store or pipe data • Repeat process in loop until done

API Sample Code

API API Sample Code

API Sample Code

Synchronization option • RemoteHandler provides an Interface to synchronize data • Data is synced non-blocking • User creates blocking procedures if needed

Recursive Feature • Allows multiple level of parallelization (granularity)‏ Blur pictures Decode Video Cut Raw Video Into Pictures Blur portion of picture Pipeline Work pool Synchronous

Related Work • MPI Implementation for the Grid • MPICH-G2 • GridMPI • MPICH-V2 (MPICH-V1)‏ • Peer-to-peer parallel frameworks • P2PMPI (for cluster computing)‏ • P3 (for cluster computing)‏ • Self deploying frameworks • Jojo

Conclusions • Parallel Applications on the Grid • Latency Hiding by Redundant Processing (LHRP)‏ • PGAFramework • Related work

A Grid Parallel Application Framework

A Grid Parallel Application Framework

Presentation Transcript

Drug Discovery Grid -- A real grid application

Drug Discovery Grid -- A real grid application

A Framework for Asynchronous Parallel Machine Learning

A New Parallel Framework for Machine Learning

A New Parallel Framework for Machine Learning

A New Parallel Framework for Machine Learning

Application Scheduling in a Grid Environment

GridSphere: A Grid Portal Framework

Grid Application Deployment

A Framework for Assessment “The Grid”

PRAGMA Grid A Multi-Application Route-Use Global Grid

Grid Computing Framework

The Grid as a Parallel Computer

The Grid: From Parallel to Virtualized Parallel Computing

GridSphere’s Grid Portlets A Grid Portal Development Framework

A New Parallel Framework for Machine Learning

Java Parallel Processing Framework

Grid Application

TAU: A Framework for Parallel Performance Analysis

The Grid as a Parallel Computer

Developing a Reusable Application Framework

WNoDeS – a Grid/Cloud Integration Framework