1 / 16

Code Migration Plans

Code Migration Plans. James Bellinger University of Wisconsin at Madison 22-Feb-2010. Straw Man Proposal. 22-Feb-2010. 1. Proposed Target Dates. 1-Aug-2010: No further CDF use of SL3 at Fermilab. 1-Jun-2010: Users warned to move from SL3 1-May-2010: No further builds on SL3.

hearde
Download Presentation

Code Migration Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Migration Plans James Bellinger University of Wisconsin at Madison 22-Feb-2010 Straw Man Proposal 22-Feb-2010 1

  2. Proposed Target Dates • 1-Aug-2010: No further CDF use of SL3 at Fermilab. • 1-Jun-2010: Users warned to move from SL3 • 1-May-2010: No further builds on SL3. • 1-Apr-2010: Announce ports of 6.1.4, etc • 15-Mar-2010: Announce validation of 6.1.6m • 1-Mar-2010: Announce port of 6.1.6m and 6.1.4mc.m. Request user tests

  3. Deliverables • 6.1.6.m, development, 6.1.4mc.m, 6.1.4.m, + other releases • Compiled/built on SL4 and SL5 • Validated to produce the same results as their SL3 normal counterparts • Use new Oracle client to match server • 6.1.4mc.m tarballs for SL4 and SL5 • Estimate of systematic error in analyses due to OS change • Rpm list for SL5 to allow running SL3 binaries

  4. RPMs for compatibility • To run SL3 binaries on SL4: • tcl-8.3.5-92.2 • To run SL3 binaries on SL5: • tcl-8.3.5-92.2 • readline-4.3-5.2

  5. Validation • Verify code changes have no effect in 100K test • Verify that OS library changes have no effect using PerfiDia on a run section • Verify no significant random number changes in 100K test • Verify that SL3-created files read by SL5 can be re-written on SL5 and re-read on SL3

  6. Validation Details • The thorough OS library validation using PerfiDia needs only to be done once • Each migrated release will be compared only to its mother release, and only to 100K events

  7. Limitations • The only monte carlo release we will port is 6.1.4mc • We will only port a selection of the releases, those used by 1% of the collaboration or more as measured by the number of jobs submitted on the farms in the past 6 months

  8. Procedure for Release X • Tools are almost ready • Review patches and modify for this release • Experts then verify • Produce X.migrate release • Validate X.migrate release • Produce X.m release At least 2 more releases need to be migrated: development and 6.1.4

  9. Review Patches and Modify • The older the release the more complicated this becomes • Guru: 2 days • Apprentice: 4 days • If there are invasive code changes the experts need to also review • Experts: 2 days • Build takes CPU on build machines, contention with other builds possible

  10. Produce X.migrate Release • Apply patches and build • Guru: 1 day • Apprentice: 2 days • If bugs appear, triple these numbers

  11. Validate X.migrate release • 10K test • 1 day • 100K test • 4 days • Hard to test parallel releases without a farm • Have to be able to specify OS

  12. Produce X.m Release • Take patches and build the final release • 1 day • Requires CPU on build machines, contention with other builds possible

  13. Status as of 21-Feb • Distribution tools are being tested • 6.1.6.migrate release is being tested by users • Problem, possibly configuration • PerfiDia next: expect 2 weeks to test, plus 2 weeks personnel delay • 6.1.4mc.migrate release in validation • Few events, so far so good • New Oracle client: need to test

  14. Experience • We need to remove some of the burden from Lynn • Parallelize patch checking • Requires some additional time from Lynn for answering Apprentice questions • Contention for build machines complicates parallelization of release builds

  15. Notes on Operations • If all goes smoothly, 10 days per release requested • Problems have required 20 unexpected days • Illness etc not included • If 10 new releases to be supported, 30 days of Guru time, possibly 20x10 • If 10 new releases, unparallelized; not done until mid-June • Remote sites can take up some validation testing if the X.migrate release is pull-able • Pulling a build takes half a day

  16. Resources Required • Minimum of 3N days of guru time, possibly much more • Minimum of 4 N days of validation time, more if have to repeat • Less time if we can use more machines • Minimum of 14 days for running ntuple generation and PerfiDia

More Related