190 likes | 324 Views
Experiences deploying Clusterfinder on the grid. Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007. Experiences deploying Clusterfinder on the grid. What is the deployment problem? A prototype solution using “grid-modules” “environments” Status and conclusions.
E N D
Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007
Experiences deploying Clusterfinder on the grid What is the deployment problem? A prototype solution using “grid-modules” “environments” Status and conclusions
Deployment is when ... users each of many can (build and) run each of many applications hosts. on each of many
Deployment is when ... users each of many can (build and) run each of many applications hosts. on each of many “each” = >90% “many” = >10
Deployment is when ... users each of many can (build and) run each of many applications certificates/password files VOs (update of grid-mapfile, sharing software) firewalls hosts. on each of many repository/distribution/version control data access “standard software” (compiler, ...) environment “each” = >90% “many” = >10
grid-modules A prototype system for getting software from where it is maintained to where it is used. • Inspired by environment modules package • load/unload (PATH) • initadd/initclear (.profile) • for software from a remote repository • update/deinstall • build/clean • test
grid-modules: install and use • grid-modules-clone NEWHOST(LIST) • also copies ~/.subversion for passwords • grid-module [update|load|initadd|build|test] [gridmod|env|gmon|cf|proc|gat]
grid-modules: adding modules • set_module_info agd_rep='svn://svn.gac-grid.org/software‘ all_modules=‘gridmod cf‘ case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; *) rep=unknown; frag=unknown;; esac • customization scripts
grid-modules: adding modules • set_module_info agd_rep='svn://svn.gac-grid.org/software‘ planck_rep='http://www.mpa-garching.mpg.de/svn/planck-group/planckbranches‘ all_modules=‘gridmod cf proc‘ case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; proc) rep=$planck_rep/ProC-2.3; frag=proc/build/dist/bin;; *) rep=unknown; frag=unknown;; esac • customization scripts ===== proc.build ===== cd ~/grid-modules/proc/ProC-base ant ===== proc.load ===== mkdir -p $HOME/.planck echo "allowIncompleteConf = true" > "$HOME/.planck/pipelinecoordinator.pref" ===== proc.unload ===== rm -r $HOME/.planck
environments A prototype system for making different hosts look alike. • Does a required software package exist on a remote host, and where is it installed? export IMAGEMAGICK_HOME=/usr/local/ImageMagick-6.3.2 • Make it available! export PATH=$PATH:/usr/local/ImageMagick-6.3.2/bin • Host-specific information must be maintained by somebody somewhere. • require modules or take the bull by the horns
environments: load_env The trick is to find the right scripts to execute for each host. if ! hostname=`hostname -f 2>/dev/null`; then hostname=`hostname`; fi scripts=`sed -n "s/^ *$hostname *//p" <<EOF astrogrid.aei.mpg.de aei buran.aei.mpg.de aei lx32i1.cos.lrz-muenchen.de lrz g95 lrz-32 lx64a2.cos.lrz-muenchen.de lrz g95 lrz-64 ... EOF` cd ~/grid-modules/env/bin source ./default if [[ -f local ]]; then \ echo sourcing local environment script source local elif [[ "$scripts" ]]; then \ echo For $hostname sourcing these scripts: $scripts for script in $scripts; do source ./$script; done fi This may need to be changed when adding a new host
environments: scripts The work is done in the scripts. ===== default ===== export GSL_INCL=-I/usr/include export GSL_LIBS=-L/usr/lib export IMAGEMAGICK_INCL=-I/usr/include/ export IMAGEMAGICK_LIBS=-L/usr/lib/ export FC='gfortran -std=gnu -fno-second-underscore' export F_PORTABILITY_FLAGS=-DPLANCK_GFORTRAN export F_COMMONFLAGS='-W -Wall -Wno-uninitialized -Wno-unused -O2 -Wfatal-errors $(F_PORTABILITY_FLAGS)' export FCFLAGS='-c $(F_COMMONFLAGS) -I$(INCDIR)' export CC=gcc export CCFLAGS_NO_C='-W -Wall -I$(INCDIR) $(GSL_INCL) $(IMAGEMAGICK_INCL) -fno-strict-aliasing -O2 -g0 -s -ffast-math' export CCFLAGS='$(CCFLAGS_NO_C) -c‘ ===== lrz ===== export GSL_INCL='$(GSL_INC)' export GSL_LIBS='$(GSL_SHLIB) $(GSL_BLAS_SHLIB)' export ANT_HOME=/lrz/sys/apache-ant-1.6.5 module load gsl module load java module load gcc/4.1.0 module load g95 module load mpi.shmem/gcc export PATH=/lrz/sys/jdk1.5.0_07/bin:${PATH} ====== g95 ===== export FC=g95 export F_PORTABILITY_FLAGS=-DPLANCK_G95 New scripts may need to be written for new hosts Defaults work in most cases. Cooperates with modules. Defaults can be overridden.
Status • ca. 23 AGD hosts + 9 DGI hosts are accessible • F90 build of Clusterfinder successful on 22 hosts (70%) • Some of the problems experienced: • difficulty finding FQDNs of resources, hosts listed by mistake • gsissh disabled • default job factory type disabled for globusrun-ws • no gsiscp installed, or unexpected default ports • svn not installed, too old, or not allowed connections • shell not bash, .profile not processed with batch jobs • file quota too small • some hosts (lx[32|64]ia1 at LRZ) share a file system • no F90 compiler installed, or hard to find • deep changes in grid-modules are hard to update
Conclusions • Clusterfinder has been deployed on “many” hosts using a prototype deployment system that is “easily” extendable to many users and many applications. • The system handles diversity without standing in the way of defining standards. • AGD should use this system or decide on something better, but should not diverge.