1 / 15

Experiences with a distributed patch certification

Experiences with a distributed patch certification. Presenter: John Walsh Location: PIC, Barcelona, ES. Motivations. My view of Testing and Certification Should adhere to general scientific principles and methods ‘Deployment and Testing’ is an ‘experiment’

feberhardt
Download Presentation

Experiences with a distributed patch certification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences with a distributed patch certification Presenter: John Walsh Location: PIC, Barcelona, ES

  2. Motivations • My view of Testing and Certification • Should adhere to general scientific principles and methods • ‘Deployment and Testing’ is an ‘experiment’ • Must be independently repeatable • Results must be independently reproducible

  3. EGEE Testbed types • Testbed Types • Multi Site TB • Wide area network • Medium to Large Scale deployments • Must be highly coordinated • “Controlled” environment difficult • A single service may make whole TB unusable for periods

  4. EGEE Testbed types • Testbed Types • Multi Site TB • Wide area network • Medium to Large Scale deployments • Must be highly coordinated • “Controlled” environment difficult • A single service may make whole TB unusable for periods • Single Isolated TB • Generally small scale, limited external access • Can replicate components / simulate conditions of “real world” • Does not reproduce all conditions of the “real world” • Need not reproduce complete infrastructure • Highly controlled, less variables • Single tester can control environment • SAM integration difficult, but SAM standalone possible

  5. TCD Testbeds • TCD runs a medium scale set of Isolated TBs • Xen extensively used • Non-trivial setup • Isolated ELgrid, e-Learning grid: • Replicates core Grid-Ireland infrastructure • 18 sites with 4 WNs each, and national services • Look and feel without impacting on production services

  6. TCD Testbeds • TCD runs a medium scale set of Isolated TBs • Xen extensively used • Non-trivial setup • Isolated ELgrid, e-Learning grid: • Replicates core Grid-Ireland infrastructure • 18 sites with 4 WNs each, and national services • Look and feel without impacting on production services • Isolated TestGrid, allows multiple Grid Infrastructures: • Certification infrastructure: • 18 Grid-Ireland sites,4 WNs each, and national services • Tests Quattor profile changes • Quality control before deployment on production • R-GMA testing infrastructure: WMS, R-GMA, 1 site, 4 WNs • Experimental and Porting infrastructure: >150 nodes, multiple sites • TestGrid allows mixed public and private network address spaces

  7. TCD infrastructure R-GMA TB Certification TB

  8. R-GMA testbed Example R-GMA Certification TB R-GMA registry R-GMA MON XEN hypervisor VM1: WMS VM2: CE VM3: UI VM4: SE VM5-n: WNs • Implements core set of service nodes • Top level • R-GMA registry/browser/schema • gLite WMS (Xen) • Site • R-GMA site mon • gLite UI (Xen) • gLite CE + site BDII + torque (Xen) • gLite Classic SE (Xen) • >2 WN (Xen) • Installation via YAIM • Quattor in catchup mode (even on Production) • 5 TB Fileserver for image backups

  9. Simple Install Procedure • Xen Nodes • Basic SL3 image (copied from repository) • Java 1.4.2 • NTP • Minimal network settings • APT • Basic SL repository • For each node • Install latest (certified) YAIM • Central YAIM configuration • Defines Basic Site Configuration • 3 way diff can check for changes in configurations • Each node configured as per type • WMS requires extra Condor repository

  10. Simple Upgrade Procedure • Nice thing is way images can be used • Each node image should be copied to backup server • Known (good) state • Rollback possible • Then can use images to instantiate nodes very quickly • Can prepare siteInfo.def off-line and copy it to node • Do YAIM install • Fixes up repos in /etc/apt/sources.list.d/lcg.list • Problems? • Raise problem in savannah Patch discussion • Do YAIM configure • Problems(?) • As above

  11. R-GMA certification • Hey presto, now have a TB • R-GMA testbed can only be used for testing: • Correctness behaviour of YAIM • That a patch fixes its target problem • Basic R-GMA components: • rgma-client-check OK • rgma-server-check (mon and reg) OK • Daemon startup scripts OK • Basic R-GMA testsuite OK • That the R-GMA daemons are stable(?) • Whether there are any new tests that can be added to TestSuite • A new SAM test(?)

  12. Stability • R-GMA stability • Tomcat daemon can take days to become unstable • How stable are the components R-GMA depends on ? • MySQL, java JDBC connectors, etc • Is the default configuration OK ? • Can it be improved ? • Stress testing is vital • Should attempt to keep stats on system and component behaviour • Memory usage (any leaks?) • Disk usage, number of files, etc • File descriptors (any descriptors leaking?) • Log files OK ? • Rotation policy OK ?

  13. Patch Problems • Patch may introduce a new problem • Important to discuss with the developers and within SA3 • Issues involved • Evaluation of problem • Will applying this patch cause more problems than solve? • Will it become a showstopper?

  14. Summary • Isolated testbed experience has been positive • Xen lessens hardware costs • Can create custom TBs on demand • Large range of testing scenarios possible • Extra layer of quality control • Non-trivial setup • But once completed it becomes a good scientific testbed • Requires extra infrastructure nodes to be installed • Simple store/test/rollback procedure • Isolated testbed does not capture all scenarios • Scaling of tests may not always be possible • In future intend to add network emulation to help • PPS plays critical post-certification role

  15. TestGrid Simple CA • Many nodes are (re)installed repeatedly • Host certificates must be securely saved • Copy to chosen media and store safely • TestGrid now uses a simpleCA for private network nodes • Allows greater flexibility in generating certs • CA controlled by small team of administrators • Does not require standard cert issuing procedure • Faster turn around on cert generation • Certificates cannot used outside of Testbed environment • Namespace is disjoint to EUGridPMA namespace • Initial overhead in setting up simpleCA • Learning curve • Best setup with local CA expert • Extra RPM for the simpleCA deployed on required nodes

More Related