200 likes | 316 Views
Site Report. US CMS T2 Workshop. Samir Cury on behalf of T2_BR_UERJ Team. Server's Hardware profile. SuperMicro machines 2 X Intel Xeon dual core @ 2.0 GHz 4 GB RAM RAID 1 - 120 GB HDs. Nodes Hardware profile (40). Dell PowerEdge 2950 2 x Intel Xeon Quad core @ 2.33 GHz 16 GB RAM
E N D
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team
Server's Hardware profile • SuperMicro machines • 2 X Intel Xeon dual core @ 2.0 GHz • 4 GB RAM • RAID 1 - 120 GB HDs
Nodes Hardware profile (40) • Dell PowerEdge 2950 • 2 x Intel Xeon Quad core @ 2.33 GHz • 16 GB RAM • RAID 0 – 6 x 1 TB Hard Drives • CE Resources • 8 Batch slots • 66.5 kHS06 • 2 GB RAM / Slot • SE Resources • 5.8 TB Useful for dCache or hadoop Private network only
Nodes Hardware profile (2+5) • Dell R710 • 2 are Xen Servers – not worker nodes • 2 X Intel Xeon Quad core @ 2.4 GHz • 16 GB RAM • RAID 0 – 6 x 2 TB Hard Drives • CE • 8 Batch Slots (or more?) • 124.41 kHS06 • 2 GB RAM / Slot • SE • 11.8 TB for dCache or hadoop Private network only `
First phase nodes Profile (82) • SuperMicro Server • 2 Intel Xeon single core @ 2.66 GHz • 2 GB RAM • 500 GB Hard Drive & 40 GB Hard Drive • CE Resources • Not used – Old CPU’s & low RAM per node • SE Resources • 500 GB per node
Plans for the future - Hardware • Buying 5 more Dell R710 • Deploying 5 R710 when the disks arrive • More 80 cores • More 120 TB Storage • More 1244 kHS06 Total • CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06 • SE - 240 + 120 + 45 = 405 TB
Software profile – CE • OS – CentOS 5.3 64 bits • 2 OSG Gatekeepers • Both running OSG - 1.2.x • Maintenance tasks eased by redundancy – less downtimes • GUMS 1.2.15 • Condor 7.0.3 for job scheduling
Software profile – SE • OS - CentOS 4.7 32 bits • dCache 1.8 • 4 GridFTP Servers • PNFS 1.8 • PhEDEx 3.2.0
Plans for the future: Software/Network • SE Migration • Right now we use dCache/PNFS • We plan to migrate to BeStman/Hadoop • Some effort already comes up with results • Adding the new nodes to the Hadoop SE • Migrate the data • Test with real production environment • Jobs and users accessing • Network Improvement • RNP (our network provider) plan to deliver for us a 10 Gbps link before the next SuperComputing Conference.
T2 Analysis model & associated Physics groups • We have reserved 30 TB for each of the groups: • Forward Physics • B-Physics • Studying the possibility to reserve space for Exotica The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support Some Grid users submit, sometimes run into trouble and give up – don't ask for support
Developments • Condor Mechanism based on suspend to give priority to a very little pool of important users : • 1 pair of batch slots per core • When the priority user’s jobs arrive, it pauses the normal job on the other batch slot • Once it finishes and vacate the slot, his pair automatically resumes. • Documentation can become available for the interested • Developed by Diego Gomes
Developments • Condor4Web • Web interface to visualize condor queue • Shows grid DN’s • Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor • Available on http://condor4web.sourceforge.net • Still have much to evolve, but already works • Developed by Samir
CMS Center @ UERJ • During LISHEP 2009 – January we have inaugurated a small control room for CMS on UERJ:
Shifts @ CMS Center Our computing team have participated on tutorials and now we have four potential CSP Shifters
CMS Centre (quick) profile • Hardware • 4 Dell workstations with 22” monitors • 2 x 47” TV’s • Polycom SoundStation • Software • All the conferences including with the other CMS Centers are done via EVO
Andre Sznajder (Project coordinator) Jose Afonso (Software coordinator) Fabiana Fortes (Site admin) Raul Matos (Trainee) Cluster & Team • Alberto Santoro (General supervisor) • Eduardo Revoredo (Hardware coordinator) • Samir Cury (Site admin) • Douglas Milanez (Trainee)
2009/2010 year’s goals • We have worked in 2009 mostly in • Getting rid of the infra-structure problems • Electrical Insuficciency • AC – Many downtimes due to this • These are solved now • Besides that problems • Running official production on small workflows • Doing private production & analysis for local and Grid users • 2010 goal • Use the new hardware and infra-structure for a more reliable site • Run more heavy workflows and increase participation and presence on official production.
Thanks! • I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ representantive here. • Also want to thank USCMS for this very useful meeting