480 likes | 647 Views
DPM Installation. Rosanna Catania – Consorzio COMETA Joint EELA-2/EGEE-III tutorial for trainers Catania, 2008 June 30th – 2008 July 4th. Outline. Overview Installation Administration and troubleshooting References. Outline. Overview Installation Administration Troubleshooting
E N D
DPM Installation Rosanna Catania – Consorzio COMETA Joint EELA-2/EGEE-III tutorial for trainers Catania, 2008 June 30th – 2008 July 4th
Outline • Overview • Installation • Administration and troubleshooting • References
Outline • Overview • Installation • Administration • Troubleshooting • References
DPM Overview • “a file is considered to be a Grid file if it is both physically present in a SE and registered in the file catalogue.” [ gLite 3.1 User Guide p.103] • The Storage Element is the service which allows a user or an application to store data for future retrieval. All data in a SE must be considered read-only and therefore can not be changed unless physically removed and replaced. Different VOs might enforce different policies for space quota management. • The Disk Pool Manager (DPM) is a lightweight solution for disk storage management, which offers the SRM (Storage Resource Manager) interfaces (2.2 released in DPM version 1.6.3).
DPM Overview • Each DPM–type Storage Element (SE), is composed by an head node and a disk server on the same machine. • The DPM head node has to have al least one filesystem in this pool, and then an arbitrary number of disk servers can be added by YAIM. • The DPM handles the storage on Disk Servers. It handles pools: a pool is a group of file systems, located on one or more disk servers. The DPM Disk Servers can have multiple filesystems in the pool.
DPM Overview • Usually the DPM head node hosts: • SRM server (srmv1 and/or srmv2) : receives the SRM requests and pass them to the DPM server; • DPM server : keeps track of all the requests; • DPM name server (DPNS) : handles the namespace for all the files under the DPM control; • DPM RFIO server : handles the transfers for the RFIO protocol; • DPM Gridftp server : handles the transfer for the Gridftp protocol.
DPM Overview • The Storage Resource Manager (SRM) has been designed to be the single interface (through the correspond-ing SRM protocol) for the management of disk and tape storage resources. Any type of Storage Element in WLCG/EGEE offers an SRM interface except for the Classic SE, which is being phased out. SRM hides the complexity of the resources setup behind it and allows the user to request files, keep them on a disk buffer for a specified lifetime (SRM 2.2 only), reserve space for new entries, and so on. SRM offers also a third party transfer protocol between different endpoints, not supported however by all SE implementations. It is important to notice that the SRM protocol is a storage management protocol and not a file access one.
DPM strengths • Easy to install/configure • Few configuration files • Manageable storage • Logical Namespace • Easy to add/remove file systems • Low maintenance effort • Supports as many disk servers as needed • Low memory footprint • Low CPU utilization
What kind of machines? • 2Ghz processor with 512MB of memory (not a hard requirement) • Dual power supply • Mirrored system disk • Database backups
Before installing • For each VO, what is the expected load? • Does the DPM need to be installed on a separate machine ? • How many disk servers do I need ? • Disk servers can easily be added or removed later • Which file system type ? • At my site, can I open ports: • 5010 (Name Server) • 5015 (DPM server) • 8443 (srmv1) • 8444 (srmv2) • 8446 (srmv2.2) • 5001 (rfio) • 20000-25000 (rfio data port) • 2811 (DPM GridFTP control port) • 20000-25000 (DPM GridFTP data port)
Firewall Configuration • The following ports have to be open: • DPM server: port 5015/tcp must be open locally at your site at least (can be incoming access as well), • DPNS server: port 5010/tcp must be open locally at your site at least (can be incoming access as well), • SRM servers: ports 8443/tcp (SRMv1) and 8444/tcp (SRMv2) must be opened to the outside world (incoming access), • RFIO server: port 5001/tcp must be open to the outside world (incoming access), in the case your site wants to allow direct RFIO access from outside, • Gridftp server: control port 2811/tcp and data ports 20000-25000/tcp (or any range specified by GLOBUS_TCP_PORT_RANGE) must be opened to the outside world (incoming access).
Outline • Overview • Installation • Administration and troubleshooting • References
What kind of machines? • Install SL4 using SL4.X repository (CERN mirror) choosing the following rpm groups: • X Window System • Editors • X Software Development • Text-based Internet • Server Configuration Tools • Development Tools • Administration Tools • System Tools • Legacy Software Development • For 64 bits machines, you have to select also the following groups (not tested) : • Compatibility Arch Support • Compatibility Arch Development Support
Installation Pre-requisites • Start from a machine with Scientific Linux CERN 4.X i386 installed. • Prepare file systems (dir /data, not /dpm !). All the file systems have to have the following permissions : ls -ld /data01 drwxrwx--- 3 dpmmgr dpmmgr 4096 Jun 9 12:14 data01 • Syncronization among all gLite nodes is mandatory. It can be achieved by the NTP protocol with a time server. • Install ntp if not already available for your system: yum install ntp • Add your time server in /etc/ntp.conf restrict <time_server_IP_address> mask 255.255.255.255 nomodify notrap noquery server <time_server_IP> (you can use NTP server ntp-1.infn.it) • Edit /etc/ntp/step-tickers adding your(s) time server(s) hostname • Activate the ntpd service with the following commands: ntpdate <your ntp server name> service ntpd start chkconfig ntpd on
Repository settings • ig_SE_dpm_disk REPOS="ca dag glite-se_dpm_disk ig jpackage gilda" • ig_SE_dpm_mysql REPOS="ca dag glite-se_dpm ig jpackage gilda“ REPOS="ca dag glite-se_dpm glite-se_dpm_disk ig jpackage gilda“ for name in $REPOS; do wget http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done yum clean all yum update
Installation Pre-requisites • Install JDK 1.5.0 before installing the metapackage • yum install jdk java-1.5.0-sun-compat • rpm -ihv http://grid-it.cnaf.infn.it/mrepo/ig_sl4-i386/RPMS.3_1_0_externals/jdk-1.5.0_14-fcs.i586.rpm • rpm -ihv http://grid-it.cnaf.infn.it/mrepo/ig_sl4-i386/RPMS.3_1_0_externals/java-1.5.0-sun-compat-1.5.0.14-1.sl4.jpp.noarch.rpm
Installation • We are ready to install a DPM server and a Disk Server on the same machine, this command will download and install all the needed packages: • yum install ig_SE_dpm_mysql ig_SE_dpm_disk • Install all Certificate Autorities: • yum install lcg-CA • If you plan to use certificates released by unsupported EGEE CA’s, be sure that their public key, signing policy and CRLs (usually distributed with an rpm) are installed in /etc/grid-security/certificates. Install ca_GILDA and gilda-vomscerts • yum install gilda_utils
Installation • If metapackage installation reports some missing dependencies, this is probably due to the protection normally set on the OS repositories. In this cases the metapackage requires a higher version of a package than the one present in the OS repository, usualy provided by the DAG repository • perl-XML-NamespaceSupport 100% |=========================| 2.1 kB 00:00 • ---> Package perl-XML-NamespaceSupport.noarch 0:1.08-6 set to be updated • --> Running transaction check • --> Processing Dependency: perl-SOAP-Lite >= 0.67 for package: gridview-wsclient-common • --> Finished Dependency Resolution • Error: Missing Dependency: perl-SOAP-Lite >= 0.67 is needed by package gridview-wsclient-common • wget http://linuxsoft.cern.ch/dag/redhat/el4/en/i386/RPMS.dag/perl-SOAP-Lite-0.69-1.el4.rf.noarch.rpm • yum localinstall perl-SOAP-Lite-0.69-1.el4.rf.noarch.rpm
Security • Hostname -f • Install host certificate: • Download your certificates in /etc/grid-security: • mv hostxx-cert.pem /etc/grid-security/hostcert.pem • mv hostxx-key.pem /etc/grid-security/hostkey.pem • and set proper permissions: • chmod 644 /etc/grid-security/hostcert.pem • chmod 400 /etc/grid-security/hostkey.pem http://security.fi.infn.it/CA/docs
Site Configuration Files (1/4) • All the configuration values to sites have to be configured in a site configuration file using key-value pairs. • This file is shared among all the different gLite node types. So edit once and keep it in a safe place • Create a copy of /opt/glite/yaim/examples/site-info.def template (coming from the lcg-yaim RPM) to your reference directory for the installation (e.g. /root): cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/gilda‐site‐info.def • The general syntax of the file is a sequence of bash-like assignments of variables (<variable>=<value>, no spaces are allowed around =). • A good syntax test for your site configuration file is to try to source it manually running the command: source my-site-info.def
Site Configuration File (2/4) • Set the following variables: MY_DOMAIN=trigrid.it JAVA_LOCATION=“/usr/java/jdk1.5.0_14“ DPM_HOST=hostxx.$MY_DOMAIN DPMPOOL=Permanent #(Volatile) The DPM can handle two #different kinds of file systems: * volatile : the files contained in a volatile file system can be removed by the system at any time, unless they are pinned by a user. * permanent : the files contained in a permanent file system cannot be removed by the system.
Site Configuration File (3/4) • Set the following variables: DPM_FILESYSTEMS="$DPM_HOST:/data" DPM_DB_USER=dpmmgr DPM_DB_PASSWORD=dpmmgr_password DPM_DB_HOST=$DPM_HOST DPMFSIZE=200 MYSQL_PASSWORD=your_DB_root_passwd VOS="gilda” SE_LIST="$DPM_HOST“ SE_ARCH=“multidisk” ALL_VOMS_VOS="gilda“ RFIO_PORT_RANGE="20000 25000"
Site Configuration File (3/4) • Check: • Copy users and groups example files to /opt/glite/yaim/etc/gilda/ • cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ • Append gilda and geclipsetutor users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf • cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.confcat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf • Define new path of your USERS_CONF and GROUPS_CONF file in /opt/glite/yaim/etc/gilda/<your_site-info.def> • GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.confUSERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf
gLite Middleware Configuration • Now we can configure the node: • /opt/glite/yaim/bin/ig_yaim -c -s site-info.def -n ig_SE_dpm_mysql -n ig_SE_dpm_disk • After configuration remember to manually run the script /etc/cron.monthly/create-default-dirs-DPM.sh as suggested by yaim log. This script create and set the correct permissions on VO storage directories; it will be run monthly via cron.
Outline • Overview • Installation • Administration and troubleshooting • References
Adding a Disk Server (1/2) • On the Disk Server, repeat the slides 14-23 on disk server and then:edit the site.def add your new file system: • DPM_FILESYSTEMS="disk_server02.ct.infn.it:/storage02" • # yum install ig_SE_dpm_disk # /opt/glite/yaim/bin/ig_yaim -c -s site-info.def -n ig_SE_dpm_disk • On the Head Node:# dpm-addfs –-poolname Permanent –-server Disk_Server_Hostname -fs /storage02
Adding a Disk Server (2/2) [root@wm-user-25 root]# dpm-qryconf POOL testpool DEFSIZE 200.00M GC_START_THRESH 0 GC_STOP_THRESH 0 DEF_LIFETIME 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 0 S_TYPE - MIG_POLICY none RET_POLICY R CAPACITY 9.82G FREE 2.59G ( 26.4%) wm-user-25.gs.ba.infn.it /data CAPACITY 4.91G FREE 1.23G ( 25.0%) wm-user-24.gs.ba.infn.it /data01 CAPACITY 4.91G FREE 1.36G ( 27.7%) [root@wm-user-25 root]#
Load balancing • Load balancing • DPM automatically round robins between file systems • Example • disk01: 1TB file system • disk02: very fast, 5TB file system • Solution 1: one file system per disk server • A file will be stored on either disk, equally, if space left • Solution 2: one file system on disk01 two file systems on disk02 • A file will more often end up on disk02, which is what you want
Restrict a pool to one or several VOs/groups By default, a pool is generic: users from all VOs/groups will be able to write in it. • But it is possible to restrict a pool to one or several VOs/groups. See the dpm-addpool and dpm-modifypool man pages. • For instance: • * Possibility to dedicate a pool to several groups $ dpm-addpool --poolname poolA --group alice,cms,lhcb $ dpm-addpool --poolname poolB --group atlas • * Add groups to existing list $ dpm-modifypool --poolname poolB --group +dteam • * Remove groups from existing list $ dpm-modifypool --poolname poolA --group -cms • * Reset list to new set of groups (= sign optional for backward compatibility) $ dpm-modifypool --poolname poolA --group =dteam • * Add group and remove another one $ dpm-modifypool --poolname poolA --group +dteam,-lhcb
Obtained Configuration (1) • RFIO, GridFTP parents run as root • Dedicated user/group • DPM, DPNS, SRM daemons run as dpmmgr • Several directories/files belong to dpmmgr • Host certificate, key > ll /etc/grid-security/ | grep pem-rw-r--r-- 1 root root 5430 May 28 22:02 hostcert.pem-r-------- 1 root root 1675 May 28 22:02 hostkey.pem > ll /etc/grid-security/dpmmgr/ | grep pem-rw-r--r-- 1 dpmmgr dpmmgr 5430 May 28 22:02 dpmcert.pem-r-------- 1 dpmmgr dpmmgr 1675 May 28 22:02 dpmkey.pem
Obtained Configuration (2) • Database connect • /opt/lcg/etc/NSCONFIG • /opt/lcg/etc/DPMCONFIG • <username>/<password>@<mysql_server> • Daemons • service <service_name> {start|stop|status} • Important: services not restarted by RPM upgrade !
Obtained Configuration (3)Virtual Ids • Each user and each group is internally mapped to a "virtual Id". • The mappings are stored in : * the Cns_userinfo table, for the users * the Cns_groupinfo table, for the groups • mysql> use cns_db; mysql> select * from Cns_groupinfo; +-------+-----+-----------+ | rowid | gid | groupname | +-------+-----+-----------+ | 1 | 101 | dteam | | 2 | 102 | atlas | | 3 | 103 | cms | | 4 | 104 | babar | | 5 | 105 | infngrid | +-------+-----+-----------+ • mysql> select * from Cns_userinfo; +-------+--------+-------------------------------------------------------+ | rowid | userid | username | +-------+--------+-------------------------------------------------------+ | 1 | 101 | /C=CH/O=CERN/OU=GRID/CN=Sophie Lemaitre 2268 | | 2 | 102 | /C=CH/O=CERN/OU=GRID/CN=Sophie Lemaitre 2268 - geant4 | | 3 | 103 | /C=CH/O=CERN/OU=GRID/CN=Jean-Philippe Baud 7183 | +-------+--------+-------------------------------------------------------+ The user and group ids are completely independant from the UNIX uids/gids.
Testing a DPM (1/7) • Try to query DPM: [root@infn-se-01 root]# dpm-qryconf POOL Permanent DEFSIZE 200.00M GC_START_THRESH 0 GC_STOP_THRESH 0 DEF_LIFETIME 7 .0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h FSS_POLICY maxfreespace G C_POLICY lru RS_POLICY fifo GID 0 S_TYPE - MIG_POLICY none RET_POLICY R CAPACITY 21.81T FREE 21.81T (100.0%) infn-se-01.ct.pi2s2.it /gpfs CAPACITY 21.81T FREE 21.81T (100.0%) [root@infn-se-01 root]#
Testing a DPM (2/7) • Browse the DPNS: [root@infn-se-01 root]# dpns-ls -l / drwxrwxr-x 1 root root 0 Jun 12 20:17 dpm [root@infn-se-01 root]# dpns-ls -l /dpm drwxrwxr-x 1 root root 0 Jun 12 20:17 ct.pi2s2.it [root@infn-se-01 root]# dpns-ls -l /dpm/ct.pi2s2.it drwxrwxr-x 4 root root 0 Jun 12 20:17 home [root@infn-se-01 root]# dpns-ls -l /dpm/ct.pi2s2.it/home drwxrwxr-x 0 root 104 0 Jun 12 20:17 alice drwxrwxr-x 1 root 102 0 Jun 13 23:11 cometa drwxrwxr-x 0 root 105 0 Jun 12 20:17 infngrid [root@infn-se-01 root]#
Testing a DPM (3/7) • Try the previous two tests from a UI, after you have initialized a valid proxy and exported following variables: [rosanna@infn-ui-01 root]#export DPM_HOST=your_dpm [rosanna@infn-ui-01 root]#export DPNS_HOST=your_dpns
Testing a DPM (4/7) • Try a globus-url-copy: [rosanna@infn-ui-01 rosanna]$ globus-url-copy file://$PWD/hostname.jdl gsiftp://infn-se-01.ct.pi2s2.it/tmp/myfile [rosanna@infn-ui-01 rosanna]$ [rosanna@infn-ui-01 rosanna]$ globus-url-copy gsiftp://infn-se-01.ct.pi2s2.it/tmp/myfile file://$PWD/hostname.jdl [rosanna@infn-ui-01 rosanna]$ [rosanna@infn-ui-01 rosanna]$ edg-gridftp-ls gsiftp://infn-se-01.ct.pi2s2.it/dpm [rosanna@infn-ui-01 rosanna]$ [rosanna@infn-ui-01 rosanna]$ dpns-ls -l /dpm/ct.pi2s2.it/home/cometa
Testing a DPM (5/7) • lcg_utils (from a UI) • If DPM not in site BDII yet • exportLCG_GFAL_INFOSYS=hostxx.trigrid.it:2170 • lcg-cr –v --vo infngrid –d hostxx.trigrid.it file:/dir/file • Otherwise • export LCG_GFAL_INFOSYS=hostxx.trigrid.it:2170 • lcg-infosites --vo gilda se | grep <your_SE> • lcg-cr –v --vo dteam –d dpm01.cern.ch file:/path/to/file • lcg-cp --vo gilda guid:<your_guid> file:/dir/file • rfio (from a UI) • export LCG_RFIO_TYPE=dpm • export DPNS_HOST=dpm01.cern.ch • export DPM_HOST=dpm01.cern.ch • rfdir /dpm/cern.ch/home/myVO • rfcp /dpm/cern.ch/home/myVO/myfile /tmp/myfile
Testing a DPM (6/7) • Try to create a replica: [rosanna@infn-ui-01 rosanna]$lfc-mkdir /grid/cometa/test [rosanna@infn-ui-01 rosanna]$lfc-ls /grid/cometa/test test [...] [rosanna@infn-ui-01 rosanna]$lcg-cr --vo cometa file:/home/rosanna/hostname.jd l -l lfn:/grid/cometa/test05.txt -d infn-se-01.ct.pi2s2.it guid:99289f77-6d3b-4ef2-8e18-537e9dc7cccf [rosanna@infn-ui-01 rosanna]$lcg-cp --vo cometa lfn:/grid/cometa/test05.txt file:$PWD/test05.rep.txt [rosanna@infn-ui-01 rosanna]$
Testing a DPM (7/7) • From an UI: [rosanna@infn-ui-01 rosanna]$ lcg-infosites --vo cometa se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 7720000000 n.a n.a inaf-se-01.ct.pi2s2.it 21810000000 n.a n.a infn-se-01.ct.pi2s2.it 4090000000 n.a n.a unime-se-01.me.pi2s2.it 21810000000 n.a n.a infn-se-01.ct.pi2s2.it 21810000000 n.a n.a infn-se-01.ct.pi2s2.it 14540000000 n.a n.a unipa-se-01.pa.pi2s2.it [rosanna@infn-ui-01 rosanna]$ [rosanna@infn-ui-01 rosanna]$ ldapsearch -x -H ldap://infn-ce-01.ct.pi2s2.it:2170 -b mds-vo-name=resource, o=grid | grep AvailableSpace (GlueSAStateUsedSpace) GlueSAStateAvailableSpace: 21810000000
Log Files • Logs to check: • /var/log/messages • /var/log/fetch-crl-cron.log • /var/log/edg-mkgridmap.log • /var/log/lcgdm-mkgridmap.log
Log Files • DPM server • /var/log/dpm/log • DPM Name Server • /var/log/dpns/log • SRM servers • /var/log/srmv1/log • /var/log/srmv2/log • /var/log/srmv2.2/log • RFIO server • /var/log/rfiod/log • DPM-enabled GridFTP • /var/log/dpm-gsiftp/gridftp.log • /var/log/dpm-gsiftp/dpm-gsiftp.log
checking • Check and eventually fix ownership and permissions of : # ls -ld /etc/grid-security/gridmapdir drwxrwxr-x 2 root dpmmgr 12288 Jun 1 14:25 /etc/grid-security/gridmapdir also check permissions of all the file systems on each disk server:# ls -ld /data01drwxr-xr-x 3 dpmmgr dpmmgr 4096 Jun 9 12:14 data01
checking • On the disk server [root@aliserv1 root]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext3 39G 3.2G 34G 9% / /dev/sda3 ext3 25G 20G 3.8G 84% /data none tmpfs 1.8G 0 1.8G 0% /dev/shm /dev/gpfs0 gpfs 28T 2.3T 26T 9% /gpfsprod [root@aliserv1 root]#
Services and their starting order • On the DPNS server machine:service dpnsdaemon start • On each disk server managed by the DPM :service rfiod start • On the DPM and SRM server machine(s) :service dpm start service srmv1 start service srmv2 start service srmv2.2 start • On each disk server managed by the DPM :service dpm-gsiftp start
Outline • Overview • Installation • DPM service • Troubleshooting • References
Other problems ? • gLite 3.1 User Guide • http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_1 • GILDA gLite3.1 Wiki • https://grid.ct.infn.it/twiki/bin/view/GILDA/GliteElementsInstallation • Main DPM documentation page • https://twiki.cern.ch/twiki/bin/view/LCG/DataManagementTop • DPM Admin Guide • https://twiki.cern.ch/twiki/bin/view/LCG/DpmAdminGuide • LFC & DPM Troubleshooting( https://twiki.cern.ch/twiki/bin/view/LCG/LfcTroubleshooting )