500 likes | 595 Views
The EPIKH Project. (Exchange Programme to advance e-Infrastructure Know-How). CE+WN+siteBDII Installation and configuration. Bouchra RAHIM(rahim@cnrst.ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011. www.epikh.eu. Outline.
E N D
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra RAHIM(rahim@cnrst.ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011 www.epikh.eu
Outline • Computing Element overview • Worker Node overview • CE CREAM overview • gLite stack overview • gLite CE siteBDII • gLite CE cream and WN
gLite overview worker node
glite overview • User Interface: it’s the point of access for users to glite grid services • WMS: it’s the component that optimize resource usage. • CE: the machine who manage worker nodes • WN: the machines who actually execute applications • SE: machines where files are stored • LFC: used to “find” files on the grid • BDII: services responsible to publish all info of your sites • Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched
Computing Element Overview • Computing Element provides some of main services of a site. • Main functionalities: • job management (job submission, job control) • job status updated for WMS • Communicate with BDII site that publishes all information regarding the computing element • It can runs several kinds of batch system: • Torque + MAUI • LSF • SGE • Condor
Torque + MAUI • Torque server service: • pbs_serverprovides basic batch services such as receiving/creating a batch job. • Torque client service: • psb_momplaces jobs into execution. It’s is also responsible for returning job’s output to the user. • MAUIsystem service: • job_schedulercontains site’s policy to decide which job is going to be executed and when.
Site BDII* • By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. • It collect all site GRISes* (for example SE,RB,LFC,etc...) • Service is named bdii • Log file: /opt/bdii/var/bdii.log • *BDII = Berkeley Database Information Index • **GRIS = Grid Resouce Information Service
Worker Node Element Overview • They are machines which really execute your job. • User can only access their services by a Computing Element. • Their characteristics are collected by Computing Element that publishes all information by BDII services
CE Cream overview • Computing Resource Execution And Management • Accept job submission requests belonging from a WMS and other job management request. • It exposes a web services interface
Requirements • Three or more machine: • One will be used to perform CE installation; • One will be used to perform site BDII installation; • Others will be used to perform WN installation; • Architecture: 64 bit • Operating System: Scientific Linux 5 • Two machines with a public ip address, direct and reverse address resolution on a DNS (CE and BDII ) • The CE machine must be equipped with an X509 certificate
Preparing the Linux machine • Network Time Protocol settings # yum install ntp • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on
Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: • SELINUX=disabled • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot
Repository set up-BDII • Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-BDII_site" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
package installation-BDII • Use yum to install needed packets # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-BDII_site
Yaim Configuration • All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory • it’s better to make a copy of the original files #mkdir/opt/glite/yaim/etc/siteinfo/ #mkdir/opt/glite/yaim/etc/siteinfo/services/ #cp /opt/glite/yaim/examples/siteinfo/site-info.def /opt/glite/yaim/etc/siteinfo/site-info.def #cp /opt/glite/yaim/examples/siteinfo/services/glite-bdii_site /opt/glite/yaim/etc/siteinfo/services/glite-bdii_site #cp /opt/glite/yaim/examples/users.conf /opt/glite/yaim/etc/siteinfo/users.conf #cp /opt/glite/yaim/examples/groups.conf /opt/glite/yaim/etc/siteinfo/groups.conf #cp /opt/glite/yaim/examples/siteinfo/edgusers.conf /opt/glite/yaim/etc/siteinfo/edgusers.conf
Yaim Configuration • You can find some template files in : ftp://repo.magrid.ma/pub/CE_WN_BDII/ • Edit the site-info.def file and change the following variables: • SITE_NAME=MA-ZZ-School (Name of the site) • CE_HOST=pcXX.magrid.ma (XX the machine that will be a CE) • SITE_BDII_HOST=pcYY.magrid.ma(the current machine) • Edit the services/glite-bdii_site file and change the following variables: • SITE_NAME=MA-ZZ-School • SITE_DESC="MA-ZZ-School"
Yaim Configuration-BDII • Run the configuration Command: • if everything is OK, run a basic test • ldapsearch -x -h pcYY.magrid.ma -p 2170 -b "mds-vo-name=local,o=grid" • /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-BDII_site
Preparing the Linux machine # yum install ntp Preparing the Linux machine • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date with an ntp server # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Network Time Protocol settings • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on
Preparing the Linux machine • SELINUX=disabled Preparing the Linux machine • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Disable Selinux: make sure /etc/selinux/config contains line: • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot
Repository set up-CE • Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mvdag.repodag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPO="dag lcg-CA glite-CREAM glite-TORQUE_serverglite-TORQUE_utils" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
package installation-CE • Use yum to install needed packets # yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-CREAM # yum install glite-TORQUE_server glite-TORQUE_utils • Due to a dependency problem within the Tomcat distribution in SL5 first install xml-commons-apis: yum install xml-commons-apis
Before configuration-HostCertificates • Some preliminary steps before configuration: • copy host certificate in default path: # cd # mv /root/pcXXcert.pem /etc/grid-security/hostcert.pem # mv root/pcXXkey.pem /etc/grid-security/hostkey.pem # chmod 400 /etc/grid-security/hostkey.pem # chmod 600 /etc/grid-security/hostcert.pem
YAIM configuration-CE • Main file to edit is site-info.def, where you specify some general settings and other component’s parameters (CE Cream) • Other file to be edited are: wn-list.conf, users.conf,groups.conf, services/glite-creamce • Set variables with corrected values replacing example ones. # vi services/glite-creamce CEMON_HOST=pcXX.$MY_DOMAIN CREAM_DB_USER=eumed CREAM_DB_PASSWORD=grid2011 BLPARSER_HOST=pcXX.$MY_DOMAIN
YAIM configuration-CE Declare the worker nodes in wn-list.conf # vi wn-list.conf pcAA.magrid.ma pcBB.magrid.ma
YAIM configuration-CE CE_HOST=pcYY.magrid.ma CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=ScientificSL CE_OS_RELEASE=5.5 #cat /etc/redhat-release CE_OS_VERSION="Boron" CE_OS_ARCH=x86_64 CE_MINPHYSMEM=512 #cat /proc/meminfo on WN CE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE CE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06” http://gkswiki.fzk.de/index.php5/Configuration_of_the_CREAM_CE
YAIM configuration-CE • How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR ? • Try to search for you value in this link: • http://www.italiangrid.org/grid_operations/site_manager/HEP-SPEC06 • https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_412 • https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results • For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so: • CE_SI00 = 3800 • CE_SF00 = 3800 • CE_CAPABILITY="CPUScalingReferenceSI00=3800” • CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06” • Where (3800/40)/4= 23.75
YAIM configuration-CE BATCH_SERVER=$CE_HOST JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD=grid2011 DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS="eumed" QUEUES=“eumed" EUMED_GROUP_ENABLE="eumed"
YAIM configuration-CE • After editing you can launch command: #/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -n TORQUE_server -n TORQUE_utils #/opt/glite/yaim/bin/yaim -r -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -f config_cream_blparser http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32
Check the CE • http://grid.pd.infn.it/cream/field.php?n=Main.CheckYourCREAMCEConfiguration • Download the script • wget http://grid.pd.infn.it/cream/CheckCreamConf/current/CheckCreamConf.pl • chmod +x CheckCreamConf.pl • Run it: • ./CheckCreamConf.pl • Check output : • CheckCreamConf.log
Preparing the Linux machine # yum install ntp Preparing the Linux machine • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Network Time Protocol settings • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on
Preparing the Linux machine • SELINUX=disabled Preparing the Linux machine • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Disable Selinux: make sure /etc/selinux/config contains line: • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot
Repository set up-WN Repository set up-CE # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-WN glite-TORQUE_client " # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done • Add to system repository ones specific for middleware to install
package installation-WN # yum clean all # yum install -y lcg-CA ca-policy-egi-core ca-policy-lcg # yum groupinstall glite-WN # yum install glite-TORQUE_client package installation-CE • Use yum to install needed packets
WN - YAIM Configuration • You can use same configuration file edited on CE: • this can be done on all worker node of a site; • so you don’t neet to re-edit anything! • Copy configuration files from CE machine using scp command: mkdir /opt/glite/yaim/etc/siteinfo/ mkdir /opt/glite/yaim/etc/siteinfo/services #Copy the following files site-info.def ,users.conf,groups.conf and wn-list.conf from ce root@pcYY:/opt/glite/yaim/etc/siteinfo/site-info.def #copy the glite-wn from examples/services • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client
WN - YAIM Configuration • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a
Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a
Tests on CE • SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes # /etc/init.d/gLite status
Tests on CE • SSH access to CE and then become a gilda user: # su – eumed001 • Create a file and add the following: $ vi test.sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname • Set right permission to be executable: $ chmod 700 test.sh
Tests on CE • Launch job locally on CE $ qsub –q eumed test.sh • Then check list of job in execution on CE $ qstat –a ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----0.pc22.magrid.ma eumed001 short test.sh 5839 -- -- -- 00:15 R -- • In case you want to more info: $ qstat -f 3 • In case you want to abort a job execution: $ qdel 3 #that is jobid
Tests on CE • If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ ls test.sh.e3 test.sh.o3 $ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain
JDL example $ vim hostname-cream.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f"; OutputSandboxBaseDestUri = "gsiftp://localhost/tmp“;
Working test • SSH access to UI to test if CE can receive and execute simple job • $ ssh gridXX@ui01.magrid.ma #password: gridXX • #set up the certificate • mkdir /home/grid01/.globus • [root@ui01 ~]# cp /root/user_cert/usercert.pem /home/grid01/.globus/usercert.pem • [root@ui01 ~]# cp /root/user_cert/userkey.pem /home/grid01/.globus/userkey.pem • [root@ui01 ~]# chown grid01 /home/grid01/.globus/usercert.pem • [root@ui01 ~]# chown grid01 /home/grid01/.globus/userkey.pem • [root@ui01 ~]# chmod 400 /home/grid01/.globus/userkey.pem • [root@ui01 ~]# su – grid01 • [grid01@ui01 ~]$ voms-proxy-init --voms eumed • Enter GRID pass phrase: [grid2011] • $ voms-proxy-init --voms eumed • password[grid2011] • #glite-ce-job-submit –r pc22.magrid.ma:8443/cream-pbs-eumed –o ID hostname-cream.jdl • #glite-ce-job-status –i ID
Troubleshooting • Which logs are supposed to be open if something goes wrong?: • /var/log/message, for general errors • /opt/glite/var/log (especially glite-ce-cream.log) • /var/spool/pbs/server_priv/accounting/<data>, if even local submission on batch system doesn’t work.
References • INFNGRID generic installation guide: • http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2 • YAIM configuration variables • https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables • CE Cream installation guide: • GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki] • YAIM system administrator guide: • https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400 • EUMEDGRID wiki: • http://wiki.eumedgrid.eu/bin/view • EuMedGRID sites installation and setup tips • http://wiki.eumedgrid.eu/twiki/bin/view/InfrastructureStatus/EumedSiteInstallation • How To Check And Test Your CREAMCE • http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAMCE
Thank you for your kind attention ! Any questions ?