1 / 24

WORKER NODE

WORKER NODE. GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008. OUTLINE. OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING. OVERVIEW. The Worker Node is a service where the jobs run. Its main functionally are: execute the jobs

Download Presentation

WORKER NODE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORKER NODE GIUSEPPE PLATANIAINFN Catania 30 June - 4 July, 2008

  2. OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING

  3. OVERVIEW The Worker Node is a service where the jobs run. Its main functionally are: execute the jobs update to Computing Element the status of the jobs It can run several kinds of client batch system: Torque LSF SGE Condor

  4. TORQUE client The Torque client is composed by a: pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user

  5. Worker Node installation & configuration using YAIM

  6. There are several kinds of metapackages to install: ig_WN “Generic” WorkerNode. ig_WN_noafs Like ig_WN but without AFS. ig_WN_LSF LSF WorkerNode. IMPORTANT: provided for consistency, it does not install LSF softwarebut it apply some fixes via ig_configure_node. ig_WN_LSF_noafs Like ig_WN_LSF but without AFS. ig_WN_torque Torque WorkerNode. ig_WN_torque_noafs Like ig_WN_torque but without AFS. WHAT KIND OF WN?

  7. Repository settings REPOS="ca dag ig jpackage gilda glite-wn_torque.repo" Download and store repo files: for name in $REPOS; do wget \ http://grid018.ct.infn.it/mrepo/repos/$name.repo -O \ /etc/yum.repos.d/$name.repo; done

  8. INSTALLATION yum install jdk java-1.5.0-sun-compat yum install lcg-CA yum install ig_WN_torque_noafs In case you want to AFS installed on: yum install openafs openafs-client kernel-module-openafs-`uname -r` yum install ig_WN_torque Gilda rpms: yum install gilda_utils gilda_applications

  9. Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def

  10. Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/<your_site-info.def> Open /opt/glite/yaim/etc/gilda/<your_site-info.def> file using a text editor and set the following values according to your grid environment: CE_HOST=<write the CE hostname you are installing> TORQUE_SERVER=$CE_HOST Customize ig-site-info.def

  11. WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to be set with the list of all your WNs hostname. WARNING: It’s important to setup it before to run the configure command Customize ig-site-info.def

  12. GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/j2sdk1.4.2_12“ JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 VOS=“gilda” ALL_VOMS=“gilda” Customize ig-site-info.def

  13. QUEUES="short long infinite“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS In case of to configure a queue fo a single VO: QUEUES="short long infinite gilda“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE=“gilda” Customize ig-site-info.def

  14. WN Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/<your_site-info.def> -n ig_WN_torque_noafs

  15. Worker Node testing

  16. Verify if the pbs_mom is active and if its status is free: [root@wn root]# /etc/init.d/pbs_mom status pbs_mom (pid 3692) is running... [root@wn root]# pbsnodes -a wn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,availmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectime=1159016111 Testing

  17. First of all, check if a generic user on WN can do ssh to the CE without type the password: [root@wn root] su – gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001] The same test has to be executed between the WNs in order to run MPI jobs: [gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001] Testing

  18. FIREWALL setup

  19. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -s <ip_you_want> --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p all -s <your CE ip address> -j ACCEPT -A RH-Firewall-1-INPUT -p all -s <your WN ip address> -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables

  20. IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

  21. Troubleshooting

  22. [root@wn root]# su – gilda001 [gilda001@wn gilda001] ssh ce gilda001@ce’s password: probably this wn hostname is not in /etc/ssh/shosts.equiv or its ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE Solution (to run on CE): Ensure that the wn is in pbs list using: [root@ce root]# pbsnodes –a And then: [root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv [root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts Troubleshooting

  23. [root@wn root]# pbsnodes -a wn.localdomain state = down np = 2 properties = lcgpro ntype = cluster Solution: [root@wn root]# /etc/init.d/pbs_mom restart Troubleshooting

More Related