200 likes | 342 Views
Co-allocation Using HARC IV. ResourceManagers. HARC Workshop University of Manchester. Philosophy. New types of RMs can be written by others Existing RMs can be customized Interfaces can be enhanced or changed None of this means changing the acceptor code API is extensible too
E N D
Co-allocation Using HARCIV. ResourceManagers HARC WorkshopUniversity of Manchester
Philosophy • New types of RMs can be written by others • Existing RMs can be customized • Interfaces can be enhanced or changed • None of this means changing the acceptor code • API is extensible too • Good community contribution model • CCT keeps control of the acceptor code • The acceptor code will become very stable (already less than one commit per month) • The community evolves the system
Are RMs Easy to Install • Harder than client software • Much easier than Acceptors • Complexity is in the right place: • Only a few people install and configure Acceptors (infrastructure), which is hard • Some people modify/write RMs, which is not too hard • More people install and configure RMs which is easy • Many people install and configure the Client software, which is trivial
Pre-installation - Perl • RMs are written in perl, to make installation trivial • However, they need a large number of CPAN modules to be installed • Some of these, e.g. Net::SSLeay and Crypt::SSLeay are not trivial • There is a document which contains things to watch out for • Lists previously seen problems, with solutions • Basically a list of exceptions • Now 7 pages of text! • There’s a lot of AIX content...
Pre-installation - Certificate • HARC RM needs a certificate • We don’t recommend re-using the host certificate • Get a service certificate • UK e-Science CA now supports: • harccrm for Compute RMs (CRMs) • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=... • harcacceptor for Acceptors • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....
Installation Procedure • There’s an installer which installs stuff from the CVS tree - this may change • HARC environment variable points to the root of the repo (“negotiation” directory) • You have a subdirectory in • $HARC/rm-service/config • For example • $HARC/rm-service/config/nw-grid/man2
Installation Procedure 1. Create Contents • install.config - more shortly • grid-mapfile - GT-style mapfile for cert to username mapping (usually a sym-link to /etc/grid-security/grid-mapfile) • acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs • cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt 2. Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393 <Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint> </Resource>
Installation Step • Before Installing • Need PERL5LIB and LD_LIBRARY_PATH to be defined in your environment when you install • Or can add these to the config file • Don’t have to set these if you don’t need to • Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm • Script is in $HARC/rm-service/scripts • What does this do?
What happens? • Installs Source files • Creates a crontab & scripts for restarting the RM • Customizes some scripts for stopping/starting the RM • Installs and hashes CA certificates • Output: rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm Makefile.crt ... Skipped cct-ca.crt ... 5fb2fc80.0 old-uk-escience-ca.crt ... 01621954.0 uk-escience-ca.crt ... adcbc9ef.0 uk-escience-root.crt ... 8175c1cd.0 Notice: Don't forget to place your certificate and key files at: /Users/jonmaclaren/man2-rm/x509/server_cert.pem /Users/jonmaclaren/man2-rm/x509/server_key.pem
What’s in /usr/local/man2-rm ? • Some Perl Modules • And OuterRM.pl which gets run • commands - which configures and runs the RM (based on install.config, etc.) • rerun - runs “commands” in the background from crontab • crontab - crontab line which can be added directly to your crontab (don’t cut and paste!) • start-rm, stop-rm - control whether rerun will actually start the RM, using a control file (.do_not_restart) • ./stop-rm • ./start-rm [ -w ] • x509 - subdirectory containing all the CA certs, mapfiles, etc.
Perl Modules • Just an overview here... • There is a doc online which has some details on these
Key Modules • OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ • MainLoop - handles each request • TransactionManager - remembers what transactions (by TID) are running, and what their states are • InnerRM - the main class for different types of RM • SimpleComputeRM • SimpleNetworkRM • Both inherit from InnerRM
SimpleComputeRM • Handles batch queue systems • Deals only with processors/memory • To talk to the scheduler, a subclass of SCBatch is used • SCBatchTorqueMaui.pm • SCBatchTorqueMoab.pm • SCBatchLoadLeveler.pm - not in CVS yet... • Chosen at runtime - RM_COMPUTE_BATCH_TYPE • Simple modules • Less than 200 lines • Override • initialize • makeReservation • cancelReservation • getStatus
Customizing InnerRM • Startup/shutdown • initialize/remove • Parsing (validating) the XML • parseResourceElement • parseWorkElement • maybe parseScheduleElement • Co-allocation • tryMakeAction • tryCancelAction • addResourceBookings • completeTransactionBookings • Others for getTimetable/getStatus
Steps for creating a new RM • Design your XML • Resource element • Work element • Create a new subclass of InnerRM.pm • Use the utility classes where possible • To extend the API, create subclasses of • Resource.java • Work.java
Caveats for RMs • Need to restart to re-read grid-mapfile • When restarted, they forget the bookings • Want to add persistence so that it’s trivial for RM developers to utilize • Thread handling needs work (soon!)
What’s next? • Discussion on MPIg... • Beer?
But first... ...Any Questions?