1 / 48

How to get started London Tier2

How to get started London Tier2. O. van der Aa. UK HEP Grid: GridPP, One T1, Four T2. ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD London Tier2

faye
Download Presentation

How to get started London Tier2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to get startedLondon Tier2 O. van der Aa

  2. UK HEP Grid:GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD London Tier2 Brunel, Imperial, QMUL, RHUL, UCL Running the LT2

  3. Imperial College • Spread Across two Sites • Physics Department • 465 KIS2K (Dual Core Intel Woodcrest)running sge 6. • 60TB running dCache • Computing Department • 177 KIS2K (Opterons) runningsge 6. • Storage using the Physics Department one • All Running RHEL4 and RHEL3 and using the LCG Tarball. • Local Physicist, CMS/LHCB/DZero Running the LT2

  4. 324 KSI2K across two clusters. Two CE running pbs/maui • 6.5 TB of storage running DPM • Complex situation wrt to networking. Grid is in demilitarized zone with 200Mb/s max. • Local Physicist are mainly from CMS. Running the LT2

  5. Biggest cluster in London • Mixture of Athlons,Xeons • Opterons. • Total of 1200 KSI2K running separate pbs/maui • Cluster shared by Astronomy/HEP/MaterialSciences. • Storage 18TB runningpoolfs and DPM • Expect to use worker nodelocal disk with luster. 400TB • Local community Atlas oriented Running the LT2

  6. Separate pbs/maui from ce • 160 KSI2K • 8TB running DPM • ATLAS/ILC community • Running slc3 • Will soon buy 265KSI2K and 136TB to come around april Running the LT2

  7. Situation similar to Imperial: • Physics department • 24KSI2K and ~1TB • Computing department • Shared cluster with 50 KSI2K • 1.5 TB running DPM • Running centos3, sge Running the LT2

  8. Resource Summary CPU: 2.5 MSI2K Storage: 94 TB Running the LT2

  9. How are the resources used ?Currently around 70% Running the LT2

  10. How to contact us • Our mailing list: lt2-technical@imperial.ac.uk • The coordinator: o.van-der-aa@ic.ac.uk • The T2 manager: d.colling@ic.ac.uk • via GGUS: http://www.ggus.org • Specify UKI-LT2 in the subject field and the university • Use it for any specific problem once you are setup • Our wiki: http://wiki.gridpp.ac.uk/wiki/London_Tier2 • Used to describe the infrastructure • Gives links to monitoring pages Running the LT2

  11. How to start ? • “The Tree Steps” … • Register for a certificate (as explained in the ngs talk). • https://ca.grid-support.ac.uk/ • With your certificate register to the ltwo virtualorganisation • https://voms.gridpp.ac.uk:8443/voms/ltwo/ • Get access to a user interface • Ask via the lt2-technical mailing list. Each universityin the LT2 has a user interface Running the LT2

  12. Summary, main Grid Components • User Interface (UI) is where the user sits to submit his job • The Virtual Organisation Membership Service (VOMS) is involved in authorizing and authenticating users • The Information System (IS) publishes the individual site information (CE Queue names, SE contact points, #waiting jobs, #running jobs etc) • The Workload Management System (WMS) take the user job find a compatible site and submit the job to the site CE. • The Computing Element (CE) is the entrance point for the jobs to get into the computing cluster. • The Storage Element (SE) is the equivalent of the CE but for data Running the LT2

  13. The Main Grid Components voms wms Running the LT2

  14. Information System • Tree structure showing all available resources in the Grid. • Implemented in the form of a ldap server • Top Level view at • lcg-bdii.gridpp.ac.uk, port 2170 • Interesting to have a look • Use Jxplorer ldap browser • http://www.jxplorer.org/ Running the LT2

  15. Submitting your first job • Get a login on a user interface • In this case gfe03.hep.ph.ic.ac.uk • Initialize your proxy • voms-proxy-init --voms ltwo • Prepare your JDL (Job Description Language) • The name of the executable • The files you want to transfer before the job starts • Your constrains, for example: • How much cpu time you need • Which subset of resources you want to use Running the LT2

  16. The files • Hello.jdl Executable = "/bin/sh"; Arguments = "Hello.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"Hello.sh"}; OutputSandbox = {"std.out", "std.err"}; • Hello.sh #!/bin/sh echo 'Hello LT2 Workshop' whoami hostname Running the LT2

  17. Submitting • Finding matching resources • edg-job-list-match Hello.jdl *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-30min ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr ce1.pp.rhul.ac.uk:2119/jobmanager-pbs-ltwogrid dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgpbs-short gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default *************************************************************************** Running the LT2

  18. Submitting • The actual submission • Edg-job-submit Hello.jdl ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gfe01.hep.ph.ic.ac.uk:9000/izz75vlTThizfJVP-7VGdQ ********************************************************************************************* This is your job identifier you need to keep track of them Running the LT2

  19. Checking the state of your job • Edg-job-status [your job id] Running the LT2

  20. Getting the result • Edg-job-get-output [jobid] • Will store your OutputSandbox files in /tmp/ • Std.out • Std.err • Content of std.out --------- Hello LT2 Workshop lt2-ltwo007 mars092.mars.lesc.doc.ic.ac.uk --------- Running the LT2

  21. JDL: more complex requirements • Specify a CE in a domain • Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID); • Require Some CPU Time (min) • Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID) && (other.GlueCEPolicyMaxCPUTime > 600); • Require Some CPU*KSI2K Time • Requirements = other.GlueCEPolicyMaxCPUTime > 30 * 500/other.GlueHostBenchmarkSI00 )) More on how to master JDL at http://tinyurl.com/28oje9 Running the LT2

  22. Data Management • In the previous example • All files are transferred via the SandBox • SandBox is limited to 100Mb • Clearly something additional is required to transfer bigger datasets  Data Management tools: lcg utils, gfal Running the LT2

  23. Catalogue services • A “file” is identified by a GUID • Several Alias (LFN) can be attached to the GUID • One “file” can be located a several places (PFN) Running the LT2

  24. Uploading a file to a storage element (SE) • Finding list of SE • Lcg-info-sites --vo dteam SE • If you don’t specify an SE the one closest tothe cluster will be used • Uploading • lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta • Returns: guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59 Running the LT2

  25. GUID ? • Remembering GUID is not human friendly • You can give an alias (lfn) to a GUID. • lcg-aa --vo dteam guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59 lfn:/grid/home/lt2wk.dta • You can give an alias when registering the file • lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta -l lfn:/grid/dteam/lt2wk.dta Running the LT2

  26. More on moving files • Copying files back on your UI • lcg-cp --vo dteam lfn:/grid/dteam/lt2wk.dta file:`pwd`/myfile.dta • Replicating files somewhere else • lcg-rep -d se1.pp.rhul.ac.uk --vo dteam lfn:/grid/dteam/lt2wk.dta Running the LT2

  27. Listing files • Listing replicas: • lcg-lr –-vo [yourvo] lfn:<name> • List the guid: • lcg-lg –-vo [yourvo] lfn:<name> • Example • lcg-lr --vo dteam lfn:/grid/dteam/lt2wk.dta • srm://gfe02.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/dteam/generated/2007-04-16/filec6b6fba2-c854-4ee6-a0db-68bd6cd6e0dd • srm://se1.pp.rhul.ac.uk/dpm/pp.rhul.ac.uk/home/dteam/generated/2007-04-16/file5642a5ea-b63f-411a-b56c-84a75137d716 Running the LT2

  28. Sending your job where your files are • In your JDL • InputData = {"lfn:/grid/dteam/lt2wk.dta"}; • DataAccessProtocol ={"file", "srm", "gridftp"}; • Then you have to use the lcg- commandsto copy the files • Alternatively you can link to the gfal libraryand stream the data (man gfal). Running the LT2

  29. Conclusions • In London you have • Around 2500 cpu • 94 TB • All availaible trough the ltwo vo • To get more on how to use • http://www.gridpp.ac.uk/deployment/users/ • Get registered to the ltwo vo. • See the GANGA talk for more high leveltools to submit jobs without having to writejdl. Running the LT2

  30. LT2 Thanks to all of the Team M. Aggarwal, D. Colling, A. Chamberlin, S. George, K. Georgiou, M. Green, W. Hay, P. Hobson, P. Kyberd, A. Martin, G. Mazza, D. Rand, G. Rybkine, G. Sciacca, K. Septhon, B. Waugh, Running the LT2

  31. BACKUP

  32. Listing the SE. Removing files • lcg-infosites –-vo ltwo se • Don’t forget to remove your files • lcg-del Running the LT2

  33. RLS remember file location Running the LT2

  34. VOMS: Virtual Organization Membership Service. • Provides information on the user's relationship with her Virtual Organization: her groups, roles and capabilities. • Provides the list of users for a given VO Running the LT2

  35. GridLoad https://gfe03.hep.ph.ic.ac.uk:4175/cgi-bin/load • Tool to monitor the sites: • -Updates every 5minutes • -Uses the RTM data and stores it in rrd files • Shows theNumber of Jobs in any state • VO view. Stacks the Jobs by VO • CE view. Stacks the Jobs by CE • Still a prototype. Will add • View by GOC and ROC. • Error checking. • Add usage (running cpu/total cpu). • Improve look and feel • Could interface with NAGIOS for raising alarms (high abort rate) Running the LT2

  36. GridLoad What can it be used for ? #Aborted Jobs Problem solved Home dir full • Can be used to have a unique measureof the health of the system • We can then use nagios to find out more • Avoid the to many alarms syndrome ! • You can query the cgi to get graphs for your site Running the LT2

  37. Extracting the private and public keys. • You have to create a .globus directory and extract the keys into it. • Extract your public key: • openssl pkcs12 -in cert.p12 -clcerts -nokeys -out usercert.pem • Chmod 644 usercert.pem • Extract your private key: • openssl pkcs12 -in cert.p12 -nocerts -out userkey.pem • Protected it: chmod 200 userkey.pem Running the LT2

  38. Initialize your Proxy • The Proxy is a temporary key pair that is signed by your private key. It allows to delegate your credidential to another machine where your job will run. • To create a proxy (which will be a file in the /tmp directory) you need to • Voms-proxy-init –-voms ltwo • Type the password to decrypt your public key • You should see this: Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) Enter GRID pass phrase: Creating temporary proxy ............................................... Done Contacting gm01.hep.ph.ic.ac.uk:15002 [/C=UK/O=eScience/OU=Imperial/L=Physics/CN=host/gm01.hep.ph.ic.ac.uk/emailAddress=o.van-der-aa@imperial.ac.uk] "ltwo" Done Creating proxy ............................................ Done Your proxy is valid until Tue Dec 6 23:45:14 2005 Running the LT2

  39. Preparing for submitting jobs • A simple job program is made available in the /tmp/Lecture.tar.gz • Copy it to your home dir and untar it. • To submit a job you need to create a file that contains your requirements this is the so called jdl file (job description language) • We will submit jobs as members of the London Tier 2 VO (LTWO) so we need to specify to run on sites that support it. • For the moment the site that support it is the Imperial College HEP site. Running the LT2

  40. Submit the job • edg-job-submit --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl • Or runjob.sh hello.jdl • The configuration files (gridpp_...) are there to specify to use the imperial Ressource Broker since it is the only one that knows about the ltwo vo. ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gfe01.hep.ph.ic.ac.uk:9000/kvexAiToJyvcBvxdMBTdoA ********************************************************************************************* This is your Job ID Running the LT2

  41. Check the status of your job • Edg-job-status [Your job ID] will get the status of your job Running the LT2

  42. Managing large files • To transfer large files your should not use the input and output sandbox. They are limited to 9MB. • File replication should be used. • The LTWO vo does not have a catalog to register the files so I will describe what can be done. Running the LT2

  43. Globus-url-copy • You can copy file to our SE using the globus-url-copy command • Globus-url-copy file:////myfilegsiftp://gw38.hep.ph.ic.ac.uk/stage2/lcg2-data/ltwo/myfilename • But this is not using the catalog to avoid knowing where your file really is. Running the LT2

  44. Hello.jdl and finding matching ressources • In the Lecture directory • See file Hello.jdl Executable = "/bin/hostname"; #Arguments = "none"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out", "std.err"}; Name of the executable Files you want to retreive Check which ressources match your requirements edg-job-list-match --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl Running the LT2

  45. Exercice • Find out what the GridCR program does • Submit 5 jobs. The output of the GridCR program should be stored on the classic SE • Using your job standard output retreive the files that have been generated. Running the LT2

  46. Check the validity of your proxy • voms-proxy-info will tell you how many hours your delegation is valid. subject : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)/CN=proxy issuer : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) identity : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) type : proxy strength : 512 bits path : /tmp/x509up_u37227 timeleft : 11:58:43 Running the LT2

  47. Finding which ce support the ltwo vo • To get a list of CE that support the ltwo vo you use the lcg-infosites command • Lcg-infosites –vo ltwo ce gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo This is the CE of the HEP group. - If you do lcg-infosites –-vo dteam ce you will get a list of CE in LCG. Running the LT2

  48. Lcg-cr,lcg-rep • To register a file in a catalog and copy it to your beloved SE lcg-cr –-vo [yourvo] file://`pwd`/<name> \ -l lfn:<name> -d yourse If you do not give SE the local one will be used. • To replicate the same file in a different CE • lcg-rep -–vo [yourvo] Running the LT2

More Related