1 / 10

D0 Grid Data Production Initiative: Coordination Mtg 9

Version 1.0 (meeting edition) 06 November 2008 Rob Kennedy and Adam Lyon Attending: …. D0 Grid Data Production Initiative: Coordination Mtg 9. Outline. Summary and News Open Action Items: none to call out Deployment “Feature List”: drives what is critical No change since last week

regis
Download Presentation

D0 Grid Data Production Initiative: Coordination Mtg 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Version 1.0 (meeting edition) 06 November 2008 Rob Kennedy and Adam Lyon Attending: … D0 Grid Data Production Initiative:Coordination Mtg 9 D0 Grid Data Production

  2. D0 Grid Data Production Outline • Summary and News • Open Action Items: none to call out • Deployment “Feature List”: drives what is critical • No change since last week • Task Status (4 slides): Most of our time. • Deployment 1 Planning: start today

  3. D0 Grid Data Production Summary and News • Summary • Umbrella Packages, Installation Manuals: good • Post-Install Tests: not good yet, seems more like “add user” issues than “FWD Node” issues • A day or so behind, deployment next week. • News: • A job did run via FWD4 in the first test! • …

  4. D0 Grid Data Production Open Action Items(Green = effectively done, Yellow = added notes, Blue = coming week) • <none to call out>

  5. D0 Grid Data Production Current Deployment “Feature” Lists • Deployment 1: Split Data/MC Production Services (NO CHANGE) • Time frame: November 13-17, with 1 week+ observation before holidays • 1. Config: Basic Splitting of Fwd,Que Services between Data and MC Production with 2 Fwd nodes assigned to each, plus 1 Fwd dedicated to all Merging • 2. Fwd4 deployed (w/o virtualization) • 3. Fwd5 deployed • 4. Que2 deployed, with client software to enable parallel use of 2 QUE nodes • 5. New SAM Station (moved off of FWD1) • 6. Condor 7 via “new” 1.10.1m official release from UWisc • 7. FileMax increase on all Fwd nodes to handle large nJob actions • 8. D0Runjob Upgrade for Data Production: Prerequisite for deploying new SAM-Grid release • Deployment 2: Optimize Data and MC Production Configurations (NO CHANGE) • Time frame: December 8-10, with 1 week+ observation before holidays • 1. Config: Optimize Configurations separately for Data and MC Production, especially to increase Data Production “queue” length • 2. New SAM-Grid Release with support for new Job status value at Queuing node

  6. D0 Grid Data Production Task Status (1 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.1 Forwarding Node 4 (Fwd4) • <Snip some completed tasks> • 1.1.1.13 Fwd4: Install with Version-Based FWD Umbrella Product AL JB Thu 10/30/08 Thu 10/30/08 1d • 1.1.1.9 Fwd4: Few Jobs FileMax=As-Is Test AL JB Mon 11/3/08 Wed 11/5/08 3d • 1.1.1.10 Fwd4: Pre-Deployment FileMax=16k Test AL JS Thu 11/6/08 Mon 11/10/08 3d • 1.1.1.11 Milestone: Fwd4 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • 1.1.2 Forwarding Node 5 (Fwd5) • <Snip some completed tasks> • 1.1.2.10 Fwd5: Install with Version-Based FWD Umbrella Product AL JB Thu 10/30/08 Thu 10/30/08 1d • 1.1.2.7 Fwd5: Few Jobs FileMax=As-Is Test AL JB Mon 11/3/08 Wed 11/5/08 3d • 1.1.2.8 Fwd5: Pre-Deployment FileMax=16k Test AL JS Thu 11/6/08 Mon 11/10/08 3d • 1.1.2.9 Milestone: Fwd5 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • 1.1.8 FWD and QUE Packaging with Version-Based Umbrella Product • <Snip some completed tasks> • Milestone: FWD Umbrella Product ready to use "GG,AL" Wed 10/29/08 Wed 10/29/08 0d • 1.1.8.6 Umbrella Product: Update FWD Installation Procedure AL JB Fri 11/7/08 Mon 11/10/08 2d • Change in scheme: Red = ALL critical tasks for deployment 1 completion. • Notes…

  7. D0 Grid Data Production Task Status (2 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.8 FWD and QUE Packaging with Version-Based Umbrella Product • 1.1.8.7 Umbrella Product: Initial QUE Umbrella Product Release GG PM Thu 10/30/08 Thu 10/30/08 1d • 1.1.8.8 Umbrella Product: Rework QUE Installation Procedure AL PM Fri 10/31/08 Fri 10/31/08 1d • 1.1.8.9 Milestone: QUE Umbrella Products ready to use GG PM Fri 10/31/08 Fri 10/31/08 0d • 1.1.8.12 Umbrella Product: Update QUE Umbrella… AL PM Mon 11/3/08 Mon 11/3/08 0.5d • 1.1.8.10 Umbrella Product: Update QUE Installation Procedure AL JB Mon 11/10/08Tue 11/11/08 2d • 1.1.8.13 Umbrella Product: FWD, QUE Installation Procedures archive ALREX Wed 11/12/08Thu 11/13/08 2d • 1.1.8.11 Milestone: FWD and QUE Packaging … done "GG,AL" Thu 11/13/08 Thu 11/13/08 0d • 1.1.3 Queuing Node 2 (Que2) • <Snip some completed tasks> • 1.1.3.12 Que2: Install with Version-Based FWD Umbrella Product AL JB Tue 11/4/08 Tue 11/4/08 1d • 1.1.3.10 Que2: Jim_Client 2-QUE Support: Client Deployment AL REX Wed 11/5/08 Wed 11/5/08 1d • 1.1.3.8 Que2: Regression Test w/1-QUE Client (skipped by ABa) AL JB Thu 11/6/08 Fri 11/7/08 2d • 1.1.3.9 Que2: Integration Test w/2-QUE Client AL JB Mon 11/10/08Mon 11/10/08 1d • 1.1.3.11 Milestone: Que2 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • Notes…

  8. D0 Grid Data Production Task Status (3 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.5 New Distinct Sam Station • <Snip some completed tasks> • 1.1.5.4 SAM Station: Install and Setup Station AL RI? Thu 11/6/08 Fri 11/7/08 2d • 1.1.5.5 SAM Station: Pre-Deployment Test AL RI? Mon 11/10/08Mon 11/10/08 1d • 1.1.5.6 SAM Station: Deployment Plan (Deactivate old/Activate new) AL AL Tue 11/11/08 Tue 11/11/08 1d • 1.1.5.7 Milestone: SAM Station Ready to Deploy AL Tue 11/11/08 Tue 11/11/08 0d • 1.1.5.8 SAM Station: Setup Context Server AL AL Thu 11/13/08 Fri 11/14/08 2d • Not done, original resource busy. Now, this is late and at risk • Notes… • 1.1.6 Deployment Stage 1 • 1.1.6.1 Dep 1: Plan: Split Data/MC Prod Services AL ALL Mon 11/10/08 Wed 11/12/08 3d • 1.1.6.2 Deployment 1: Execute AL REX Thu 11/13/08 Mon 11/17/08 3d • 1.1.6.3 Deployment 1: Monitor AL REX Tue 11/18/08 Mon 11/24/08 5d • 1.1.6.4 Deployment 1: Sign-off AL REX Tue 11/25/08 Tue 11/25/08 1d • 1.1.6.5 MILE 1: Deployment 1 Completed AL Tue 11/25/08 Tue 11/25/08 0d • Bootstrap this today to work ahead: rough work list and known order/priorities • Meeting on Monday (RDK to arrange, I propose 9-10:30am) to work out the details • 17 November 2008 is the drop-dead date to be deployed, what we run for one week before sign-off.

  9. D0 Grid Data Production Task Status (4 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.3.1 SAM-Grid Job Status Info • <snip some tasks> • 1.3.1.4 Upgrade D0Runjob version used by Data Production AL "MD,AL“ Thu 10/23/08 Fri 10/24/08 2d • 1.3.2 Slow Fwd-CAB Job Transition • Note: FileMax change requires a schedd restart (ST). Work into deployment plans. • 1.3.3 Improved H/w Uptime • 1.4 Metrics • nSubmissions plot for Sep ’08 Mike? • Ganglia-base D0Farm plot from Keith  • Notes…

  10. D0 Grid Data Production Deployment 1 Work • Rough Work List • Verify FWD4-5,QUE2 installed; FWD4-5 FileMax increased • FWD1-3 install/upgrade via umbrella package; Increase FileMax • QUE1 install/upgrade via umbrella package • Deactivate SAM station on FWD1 • Activate new SAM station • Configure FWD1-5 • Configure QUE1-2 • Test system: Data Production, MC Production, Reco/MC Merge Jobs • … • Work on Client Side: adapt to use new jim client package • … • Post-Deployment Work: move context server?

More Related