1 / 16

Israel ATLAS TIER-2 Status April 2011 Lorne Levinson

Israel ATLAS TIER-2 Status April 2011 Lorne Levinson. Israel HEP community. ATLAS is the only LHC experiment in which we participate also Phenix (Heavy Ion @ BNL ), ILC , ZEUS Israel is “1.35% of ATLAS” ( MoU pledge, authors, common fund) 25-30 people doing physics analysis 3 sites:

lundy
Download Presentation

Israel ATLAS TIER-2 Status April 2011 Lorne Levinson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Israel ATLAS TIER-2Status April 2011Lorne Levinson Israel ATLAS Tier2 Status

  2. Israel HEP community • ATLAS is the only LHC experiment in which we participate • also Phenix (Heavy Ion @BNL), ILC, ZEUS • Israel is “1.35% of ATLAS” (MoU pledge, authors, common fund) • 25-30 people doing physics analysis • 3 sites: • Tel Aviv University, Tel Aviv (1956) • a university • The Technion Israel Institute of Technology, Haifa (1924) • a university • Weizmann Institute of Science, Rehovot (1934) • a research institute for Biology, Chemistry, Physics, Math & CS) with graduate school (no undergrads) • longest travel is Weizmann  Technion 2 hours office-to-office Israel ATLAS Tier2 Status

  3. Organization • we are a distributed Tier2/Tier3 • each site combines Tier2 and Tier3 resources in the same cluster • all resources shared flexibly between T2 and T3 (Lustre/Storm) • single management and budget, single purchasing • three sites as identical as possible • Steering Committee for overall policy • Management & Operations team for the three sites • stable funding approved until 2012 Israel ATLAS Tier2 Status

  4. Storage Continues to be the biggest reliability issue. • Our hardware is now stable: • replaced DDN 6620’s with DDN 9900 • Fully redundant, 300 disk slots, 8x8Gb/s FC ports  5GB/s • two Lustre “OSS” servers • WI servers with 10Gb/s to cluster, TAU, Tech will install 10G in April • Gave up on Thumpers+Lustre and Thumpers+iSCSI+Lustre. • We NFS mount Thumpers with Solaris+ZFS for extra "archive" storage, home directories or /opt/exp_soft • Lustre + Storm  problem is Storm team does not test new Storm releases on Lustre • Storm-Lustre community must solve this Israel ATLAS Tier2 Status

  5. Storm/Lustre • Storm allows LCGSRM storage and our local global file name space to share the same physical storage. • No rigid boundary • Jobs in cluster can do Linux file io to read SRM files • Storm can run over Lustre (open source) or GPFS (IBM) • Lustre: • Object Storage Targets serve (stripes of) file data • Meta-Data Server holds directories • redundant failover of MDS’s will soon be supported Israel ATLAS Tier2 Status

  6. Storage – installed SRM + local capacity Net TB Israel ATLAS Tier2 Status

  7. Group disks • We are hosting four ATLASGROUPDISKareas • Muon performance (Technion) • Top (Weizmann) • Heavy Ion (Weizmann) • Standard Model (TAU) (empty) Israel ATLAS Tier2 Status

  8. CPU • Last purchase was dual Intel E5520 quad core • May delivery purchase is dual Intel X5650 hex-core • again 4 motherboards per 2U box with redundant power supply • We benefit a lot that some other groups place some cores in our cluster: • * Weizmann: ATLAS+Phenix/Heavy-Ion, HEP Theory, Condensed matter • * Technion: HEP Theory and Bio-informatics • * TAU includes:HEP Theory Israel ATLAS Tier2 Status

  9. Services nodes Virtualize most services • Two 8-core servers, 48GB • Failover • Easier management • VM images • Roll-back • Image sharing • Easier testing: temp machines • May delivery of HW • Deciding among: VMware, Xen, Citrix, KVM • SE not included Israel ATLAS Tier2 Status

  10. Networking Our networking is not good • Geant connection is 2 x 1.5G (subscribed on 2 x 2.5G infrastructure) • “Political” limits: TAU 500M, Technion 350M, WI 400M • Because a 1G line is shared with institute traffic and the shared router is not really able to do 1G duplex • We suspect that the gross mismatch with SARA/NIKHEF’s10G causes failed connections due to dropped packets. • Lowering the # of files & streams to avoid dropped packets leaves us with even worse net BW • Expensive because it is an undersea fiber and one (Italian) company owns the fibers. • An Israeli competitor is installing another fiber now Israel ATLAS Tier2 Status

  11. Networking Israel ATLAS Tier2 Status

  12. GEANT Israel ATLAS Tier2 Status

  13. Networking plans May 2011(?): • Increase international connection: from 3Gb/s to 4Gb/s. • 5G might be possible later this year, but not budgeted. • Replace old routers at entrances to institutes with 10G capable equipment. • This should increase our thru’put and reliability and allow us to actually use a major share of the 1G BW to the sites • Negotiating 10G academic backbone • Could have 10G to Geant in spring 2012 Israel ATLAS Tier2 Status

  14. SAM/NAGIOS • Our NGI did not take on the SAM/NAGIOS monitoring responsibility • After the new NAGIOS tests replaced SAM tests, we received no alerts on failed tests. • This was a severe problem • Finally in December it was agreed with EGI, our NGI and us that we would deploy a NAGIOS test service for Israel, until our NGI succeeded to do it. • The only functioning grid sites in Israel are our 3 ATLAS sites • Our NAGIOS service was up and running in January. Israel ATLAS Tier2 Status

  15. Upcoming work • Deploy Zenoss fabric and service monitor on all three clusters • currently in-test at Weizmann • Deploy Puppet configuration system on all three clusters • We gave up on Quattor after having finally succeeded in getting it to run, • Clear that it was unsustainable • Currently for work nodes at Weizmann • Needs to include gLite nodes • Virtualization of services (excl SE) • Address Storm “untested new version” problem Israel ATLAS Tier2 Status

  16. End Israel ATLAS Tier2 Status

More Related