1 / 10

Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005

Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005. DO Status. DO is running SAMGrid for MC production and Reprocessing SAMGrid is a 1 st generation production system Typical configuration, installation and robustness issues are being addressed

zenia
Download Presentation

Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005

  2. DO Status • DO is running SAMGrid for MC production and Reprocessing • SAMGrid is a 1st generation production system • Typical configuration, installation and robustness issues are being addressed • LCG SamGrid interoperability proto-type is going well • OSG resource selector will be developed in order to facilitate similar functionality as with LCG Run II Computing Review

  3. CDF Status • CDF has prototype grid job submission based on the CDF Analysis Facility that uses Condor Glide-in • Running well and usefully in “owner/operator” mode on a few sites • Does not have integrated data handling • May not be handling tarballs • Requires installation on a head node, and outbound node connectivity • Has some legacy security policies to address • CAF is kerberos basedCDF has prototype grid job submission based on the CDF Analysis Facility that uses Condor Glide-in Run II Computing Review

  4. Why? • Glide-in technology is attractive in many ways. • There is always a certain appeal in the next great thing. • Illustrative of a general tension for the Run II experiments • Competing agendas—difficult for CDF to turn down effort. Italians support Glide-CAF • CDF wants to do analysis on the Grid, and they do not want user interface to change. • Probably could have achieved that requirement other ways, however CDF is also vested in the CAF as an model. • Ultimately probably beneficial to both CDF and DO • If Glide-in works in production on a reasonable time scale, might be able to use • VO specific services support is a motivation for the Edge Services Pre-proposal for OSG—Edge Services will almost certainly benefit DO Run II Computing Review

  5. Run II computing in the LHC Era • Grid is the strategic direction for FNAL CD to meet commitments to Run II, CMS and other stakeholders. • 05 Run II computing review complimented DO and CDF on moving to towards grid models • Run II effort task force acknowledges strategy • Concerns about • Availability of resources, especially disk • Urged to make more formal agreements • “Expenses” involved in operating a production Grid • About heavyweight and nonstandard interfaces on the production system • About real world issues for the prototype • Mitigations • DO and FNAL CD proposing an installation team, supported by the review • Move towards standard interfaces, more robust • Guest Scientist positions could be used to leverage knowledge and expertise—particularly in cases where physics potential would also leveraged. Run II Computing Review

  6. OSG Pre Proposals • The OSG Pre Proposal call was targeted at core functionality • SAMGrid was built with the support of PPDG funds. • Noted that a service without customers is of limited use. • Some calls to work closely with TERAGrid. • Still working through details for a full proposal • Encouraged to make a proposal for an OSG that will thrive! Run II Computing Review

  7. Summary Run II Computing Review

  8. RUN II Department Roles • Operations—Running the systems, standing pager rotations/shifts, researching latest technologies • purchasing and deploying equipment • tracking down and fixing problems • code management • Development—exploring use cases, writing code, introducing new features, testing, documenting, exploring technologies • Integration—testing, more testing, training users, transition from development to operations • Planning—how best to use resources to meet stakeholder needs, facility issues • Interfacing – Serve in experiment management roles, bridging the CD and the experiments, CD department to CD department, hosting guest scientists • Participate in physics analysis as collaboration members -- 30% of department FTEs hold scientific positions Run II Computing Review

  9. Risks, expanded • Increased calls on FNAL CD as migration of effort and equipment to LHC • Declining equipment and operations budgets are already limiting the data collection rate. • Over time, limits in the equipment and operating budget will create delays • Operational performance of user code • DO reconstruction code performance and release turn-around • CDF user code has caused inefficiencies on the CAF • COTS Computing • Experiments need best price/performance, which introduces risk. • Moore’s law • Have a good process in place for evaluation, purchase and acceptance. • Each purchase of worker nodes presents challenges • FNAL CD plays engineering/integrator role by default • Commodity fileservers are maintenance intensive Run II Computing Review

  10. Risks, expanded • Data Handling • SAM system, dCache, hardware working well • User patterns are still evolving, sometimes conflicts between wanting to get results out and using standard production. • Scaling with data sample size might have unanticipated consequences. • Count on next generation tape drives to mitigate tape costs • Longevity of hardware components and software applications • Starting to use a 4 year replacement cycle for worker nodes so the equipment is off warranty the final year. • 5 year life cycle on major components, replacement needed again around 2010 when budget for Run II will be extremely limited. • Migrating either experiment from existing mode of operation or user interfaces would be time intensive and costly. Run II Computing Review

More Related