1 / 25

Level 3 Review

Level 3 Review. Short Term. NT Support. Long Term. The Level 3 Group. Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts. Short Term. Goal. Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too.

Download Presentation

Level 3 Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Level 3 Review Short Term NT Support Long Term The Level 3 Group Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts

  2. Short Term Goal Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too. Requires • Faster build/release system • Verification (minimal). • Better filter author access to NT development environment & experts

  3. Faster Build/Release System New Machine • Initial tests indicate a x2 speed up. • Multiple logins allowed (flexibility) Status • Final release system script testing • Porting of packages to NT5 (we know what has to be changed) • Commit to schedule Do not think there is any real development work left here.

  4. Faster Build/Release System Improve Build System Bug Fixes? • Initial experiments indicate ctest_nt x10 faster than gmake • Recent upgrade allows one to run tests inline. Status • Must be incorporated into release structure • How to deal with “legacy” packages that don’t support the CTEST interface. Likely 1 week of development work here, then some amount of integration time. Initially L3 group, then L3 group & release managers

  5. NT5 Port Status • Done • Changes must be fed back into cvs • Some have (SSS) • Most script changes • Less than 10 individual changes • All Scripts • Full Build system must be tested • Most complex parts have been tested (bldpara) • Initial scripts, etc. Less than a week of work for L3 Group, Rel Mgrs. Simple Author Updates

  6. Verification Verification System ~7000 events/hour • The old build system • more effective use of the 4 cpus. • Have money for extra disk • Keep local copies of files. • Initially by hand, later by script Status • Have to use build systems until verification is ready • Just starting (ftp doesn’t like more than 256 character file names!). 2 weeks of development time & setup time. Requires both L3 Group and Filters Group

  7. Better Development Author Node • Maintained environment for users to develop code • No more installs on own machine • Fast machine. Go to Central Processor Model. • Install just too complex • All platforms, particularly NT Status • One up and going. Configuration complete? • Unix access • Second one ordered • $$ left over for third and possibly fourth • So cheap now! No real work left (one works, others will).

  8. Code State State Of Code • Minimal number of changes to get nt41 to build • Looks like nt43 went! • Test-Script Work • Package authors should translate filename locations using cygwin • About half already do. Status • New Tools Coming in (CPS, Tracking) • Most Errors are understood by us • What can we do to help get them fixed? Feedback Loop? Figuring out the error requires very little time

  9. Expert Access Contacting Us? • Since last week there has been a flood • Almost too much for us to keep up with! • Most questions are one time fixes • Get them fixed, will not pop up again. • Quite gratified with the communication • SMT tracking, Jet Finding, real filtering in the control room! • Many Many people contributed to this! Always room for improvement; ears & Inbox open

  10. Long Term Goal Maintainable and smooth process for building, developing, etc. L3 Executables. Complete verification. Requires • Faster build/release system • Verification • Better filter author access to NT development environment & experts

  11. Build System Improvements • Recommend moving away from SRT_D0 no matter what • Horribly inefficient build system. • On NT, move away from complex cygwin dependencies • Get rid of gmake if at all possible • Base work on ctest_nt • Fast, already well supported by L3 Group Schedule • Proof of principle by July 1. • First production by August 1. • Improvements through fall. 0.5 FTEs for duration

  12. Build System Maintain It • Once stable take the on-purpose view it need not be modified often • L3 filters have a much more restrictive environment than offline packages. • Few extra feature for L3 do not compete with all of L3 • Less changes, more stability.

  13. Doing Builds Goal • Make build system boring • Close to Ferbelizing it • Remove hand art. • Will still need someone to do builds continuously • Target: 0.25 FTE! 0.5 down to 0.25 FTEs for first year Decreases until Run 2b upgrade due to reduce changes??

  14. Verification What Can One CPU Do? Email from Amber • Same speed as L3 Farm • Not memory limited, etc. • 100ms per event • 0.25 MB/event L1/L2 Issues • 0.86 million events/day • 2.5 MB/sec • 210 Gig Bug Fixes approximately one day’s worth of processing • 2-4 nodes, dual CPU, data local for speed Or, use production system…

  15. Verification Production Release • Order of magnitude larger • 10-20 million events • Terabyte of disk space ($2K?) • SAM Integration

  16. NT Environment Author Nodes • Just implemented • Undoubtedly will need refinement. • Increase number of nodes depending upon demand • Each node is very cheap (l3 group) 0.1 FTEs, but in clumps Maintain Author Nodes 0.1 FTEs for life of experiment (L3 group & Rel Mgrs?)

  17. Software Changes OS Upgrades • MS’s slavish commitment to backwards compatibility has some benefits • Level 3 is a conservative system • Imagine OS upgrades will be about once every two years. Testing: 0.05 FTE (based on NT4->NT5 experience). Implementing: 0.05 FTE Some true no matter what we do

  18. Farm Management OS, general management • Automatic tools (AutoStart, etc.). • Minimal security updates (default deny ACLs). 0.05 FTEs (L3 Group) General Management • Configure changes, etc. • All based on the web True no matter the scheme we pick 0.05 FTEs (L3 Group)

  19. Expert Access Brown & UW Commitment We will maintain the Level 3 Trigger/DAQ no matter what its form while the experiment takes data! Providing NT expertise to the experiment is part of this commitment! 0.1 FTE, in clumps (L3 Group)

  20. Where are we now? Releases nt44 is an excellent release Small set of filters in release Several more poised to come into release. Build System Works, but slow ctest_nt improvements with eye to release building Close to ready to move to build machine Verification Ignoring L3 Farm Issues Just starting Needs our effort Will have to occur no matter what we do

  21. Stages

  22. Doing Both Now Linux L4 development work June 1 Minimal Set of Filters Sept 1 Sept 1 Debugged Filters + unpacking Linux L4 Farm Initially Ready

  23. Option 2 Lots of CPU L3 Node Linux Node Front End Crates SB L3 Node Linux Node Feynman (SAM) Online C/R L3 Node Linux Node Examines L3 Node Linux Node MCH FCH FCH FCH Feynman

  24. Possible 3rd Alternative Use Level 3 as a pre-filter/pipeline Raw Bandwidth: 5 mb/sec GB already? Unpacked bandwidth/node: 20 mb/sec Simple Filters and Tools: x5 reduction 200 Hz out Unpacked bandwidth/node: 4 mb/sec Simple Static • Make use of idle CPU power • Maintain what you’ll have by the end of September

  25. Conclusions • Short term effort • Has a well functioning system by end of summer/fall • Long Term • Supportable and maintainable • What is missing? • Better understanding of verification • If go the Linux route have to support both; not thought through. • Third alternative • Makes use of available CPU • Maintains current investment work.

More Related