150 likes | 275 Views
Ccain: Essential CCA. (or, a framework for picky users). Chris Rickett, Craig Rasmussen, Matthew Sottile April 2005 CCA Meeting Lincoln City, Oregon. This paper will also be presented at Parallel CFD ‘05 (these slides will not). Outline of talk. CCA design pattern Ccain design goals
E N D
Ccain: Essential CCA. (or, a framework for picky users) Chris Rickett, Craig Rasmussen, Matthew Sottile April 2005 CCA Meeting Lincoln City, Oregon This paper will also be presented at Parallel CFD ‘05 (these slides will not)
Outline of talk • CCA design pattern • Ccain design goals • CCAIN = CCA INtegration framework • A case study: n-body simulations • Evaluation of Ccain • Discussion
The Essence of CCA • What CCA is: • A design pattern. • A reasonable implementation of the pattern requires: • 1. Factory to create components • createInstance(“ParticlePusher”) • 2. Name service to find components • Port = getPort(“pusher”) • Well defined language bindings • What CCA is not: • Implementation details • dlopen : CCA doesn’t mandate shared objects • Language interoperability tools • The frameworks are just realizations of the pattern.
Ccain design goals • Ccain was designed to respond to what our customers want. • Simple • Implementation difficulty should scale with component complexity (at worst). • High performance • Agnostic of parallel implementation. • Shared and distributed memory models. • Efficient implementation. • Portable • No platform specific code. • Minimal tool chain. • Framework requires a C compiler. • Non-intrusive • No interference with user programming style. • Data structure freedom.
Ccain implementation • Pure C core. • Services, BuilderServices implementation. • 1700 lines of code, 1 week to develop a working, debugged implementation. • Thread safe implementation (pthreads now). • Fortran 90/95/2003, C++, and C bindings. • C++ via extern “C” interfaces, and so is Fortran. • Python, ZPL, … bindings trivial through C interface. • Static component palette. • …but component composition is dynamic. • After all, they’re just pointers… • Removes many headaches. • Portability, framework complexity, no $!@*ing LD_LIBRARY_PATH, etc… • Failures occur at link time with static, not 3 weeks into a run. • We are not convinced people want a dynamic palette.
A case study: n-body simulations • The component design: • Driver provides a go port, uses an accelerator and pusher port. • N-body component provides accelerator and pusher ports. • Two source files. • Driver.c : main loop that calls accelerate and push over and over. • Nbody_naïve.f03 : a basic O(n^2) implementation of accelerator and a simple pusher. No fancy algorithms in here. • Good test though : O(n^2) really hammers particle data structures. push accelerate
Ccain F03 Pusher Port Implementation subroutine NBody_push(c_self, p, n, dt) bind(C) type(C_PTR), value, intent(in) :: c_self integer(C_INT), value, intent(in) :: n real(C_DOUBLE), value, intent(inout) :: dt type(particle) :: p(n) integer :: i for i = 1, n p%x = p%x + p%vx * dt p%y = p%y + p%vy * dt p%z = p%z + p%vz * dt end for end subroutine NBody_push No modification to data structure usage Core code unmodified
Performance: Experimental setup • Ran tests with 500 particles, 1000 iterations with : • Five non-component implementations testing baseline performance of different data access methods and representations. • Babel/Ccaffeine components with arrays of doubles and arrays of structs. • Ccain components with arrays of doubles and arrays of structs. struct particle { double x,y,z,vx,vy,vz,m; }; TYPE particle = REAL(KIND=8) :: x,y,z,m REAL(KIND=8) :: vx,vy,vz END TYPE particle double particles[][] =
Performance: Runtime Data (seconds) Matt’s original F95 code, 5 flavors. Mean: Comparing 3 versions of the code (2.4 GHz Xeon, Suse Linux) Analysis: Set/get calls likely slowed down babel/ccaffeine. (we followed the instructions!)
Fortran 2003 Status • ISO C binding module becoming available • Already in IBM XLF, Cray Fortran, (Sun?) • Coming soon to Intel, … • Chris adding to gfortran. • Craig taking credit for Chris slave labor. • These F2003 compilers have … “issues”. • ‘i’,’c’,’k’,’y’,’ ‘,’s’,’t’,’r’,’i’,’n’,’g’,’s’,C_NULL_CHAR • Ccain works with standard F9x too.
Measures of Complexity CcainCCA/Babel Framework: Time to build: 2.5 s ~ 1 hour # files: 10 1117 # .in files: 2 75 # extra tools: 0 4 ls -l *.tar .7 MB 70 MB User: # make files: 1 40 # line mod: ~230 same or less? # files 9 142 (10% u) development time: ~1 day ~10 days
Mission Accomplished • Goals revisited: • Simple • 4 lines per port routine + the usual stuff (setServices, port definitions) • No stubs or skeletons. • Well… one for F90. • High performance • Ran the same speed as non-component code • Portable • Ran on P4/Xeon, G4, PPC970 (32 and 64 bit modes), AMD64, Cray X1 • This is all we had available to test with last week. • Compiled with Intel, IBM, Cray, Visual C++, and GNU compilers. • Ran successfully under Suse and Yellowdog Y-HPC Linux, MacOS X, Windows XP Pro, UNICOS • … We are compiler, architecture, and operating system agnostic! • Non-intrusive • No data structure changes from original code. • Data types restricted by languages, NOT the framework.
Ccain Fortran Port Definition ! NEED Comparable C header file too type :: PusherPort include "CcainBasePort.fh" type(C_FUNPTR) :: push end type PusherPort interface subroutine push(c_self, particles, numParticles, dt) bind(C) use intricsic :: iso_c_binding type(C_PTR), value, intent(in) :: c_self real(C_DOUBLE), value, intent(inout) :: dt integer(C_INT), value, intent(in) :: numParticles type(particle), dimension(numParticles) :: particles end subroutine push end interface
Ccain setServices subroutine NBody_setServices(c_self, services) & bind(C, name=”NBody_setServices") type(C_PTR), value :: c_self, services type(NaiveNBody), pointer :: self type(PusherPort), pointer :: pPort character(C_CHAR) :: pPortName(9) = C_CHAR_”PushPort”// C_NULL_CHAR call C_F_Pointer(c_self, self) self%myServices = services call C_F_Pointer(self%c_pPort, pPort) pPort%push = C_FUNLOC(NBody_push) call addProvidesPort(services, self%c_pPort, pPortName, pPortType) end subroutine NBody_setServices