220 likes | 233 Views
UTA Site Report. Jae Yu Univ. of Texas, Arlington. 6 th DOSAR Workshop University Mississippi Apr . 1 7 – 1 8, 200 8. Introduction. UTA a partner of ATLAS SWT2 Actively participating in ATLAS production Kaushik De is co-leading Panda development
E N D
UTA Site Report Jae Yu Univ. of Texas, Arlington 6th DOSAR Workshop University Mississippi Apr. 17 – 18, 2008
Introduction • UTA a partner of ATLAS SWT2 • Actively participating in ATLAS production • Kaushik De is co-leading Panda development • Phase I implementation at UTACC completed and running • Phase II implementation at Physics and Chemistry Research Building completed • MonALISA based OSG Panda monitoring back online • DDM monitoring project completed • Working on Achelois, the ATLAS Operations Support System • HEP group working with other discipline in shared use of existing computing resources • Interacting with the campus HPC community • Working with HiPCAT, Texas grid community • Co-Linux Condor cluster setup began but going very s-l-o-w
UTA DPCC – The 2003 Solution • UTA HEP-CSE + UTSW Medical joint project through NSF MRI • Primary equipment for D0 re-reconstruction and MC production up to 2005 • Now primarily participating in ATLAS MC production and reprocessing at part of SWT2 resources • Other disciplines also use this facility but at a minimal level • Biology, Geology, UTSW medical, etc • Hardware Capacity • PC based Linux system assisted by some 70TB of IDE disk storage • 3 IBM PS157 Series Shared Memory system • Now being turned into a T3 cluster
UTA – DPCC • 84 P4 Xeon 2.4GHz CPU = 202 GHz • 5TB of FBC + 3.2TB IDE Internal • GFS File system • 100 P4 Xeon 2.6GHz CPU = 260 GHz • 64TB of IDE RAID + 4TB internal • NFS File system • Total CPU: 462 GHz • Total disk: 76.2TB • Total Memory: 168Gbyte • Network bandwidth: 68Gb/sec • HEP – CSE Joint Project • DØ+ATLAS • CSE Research
UTA Tier3 • Conversion of UTA_DPCC • Commissioned 2003 • NSF-MRI grant between CSE, HEP, UT-SWM • Shared resource • 160 cores (Dual Processor 32bit Xeon) • 1GB RAM / core • 45 TB NAS based storage • Rocks 4.3 with SLC 4.6 • Several head nodes
Tier 3 specific customizations • Exploring with Scalla topology • Provide rootd storage to desktop • Provide platform for experimenting with PROOF • Testbed for T2 T3 interactions • Still maintain production activities • Panda Development
SWT2 • Joint effort between UTA, OU, LU and UNM • 2000ft2 in the new building • Designed for 3000 1U nodes • Could go up to 24k cores 1MW Total power capacity Cooling with 5 Livert units
UTA SWT2 Clusters • UTA_SWT2 (Phase I) • Supporting MC Production only • Initial deployment of ATLAS project funds • SWT2_CPB (Phase II) • Supporting MC production and analysis • Main cluster for UTA • Project Funded • UTA_DPCC • Supporting MC production and local analysis • Tier3 prototype • Supporting production system software development • NSF MRI Grant
Installed SWT2 Phase I Equipment • 160 Node cluster (Dell SC1425) • 320 cores (3.2GHz Xeon EM64T) • 2GB RAM/core • 160GB SATA local disk drive • 8 Head Nodes (Dell 2850) • Dual 3.2 GHz Xeon EM64T • 8GB RAM • 2x 73GB (RAID1) SCSI Storage • 16TB Storage System • Direct Data Networks S2A3000 system • 80x250GB SATA drives • 6 I/O servers • IBRIX Fusion file system • Dedicated internal storage network (Gigabit Ethernet) • Has been operating and conducting Panda production over a year
UTA_SWT2 (Phase I) Services • OSG 0.8 Compute Element • ATLAS specific Gatekeeper • Local Replica Catalog • DQ2 Site-services (supports all of SWT2) • Redundant GUMS server for UTA resources
SWT2 Phase II (CPB) Equipment • 50 node cluster (SC 1435) • 200 Cores (2.4GHz Dual Opteron 2216) • 8GB RAM (2GB/core) • 80 GB SATA disk • 2 Head nodes • Dual Opteron 2216 • 8 GB RAM • 2x73GB (RAID1) SAS Drives • 75 TB (raw) Storage System • 10xMD1000 Enclosures • 150x500GB SATA Disk Drives • 8 I/O Nodes • DCache will be used for aggregating Storage • 10GB internal network capacity • In service • Future purchases will focus more on the storage
SWT2_CPB Services • OSG 0.8 Compute Element • Local Replica Catalog • GUMS server for all UTA Resources
ATLAS SWT2 Phase I and II SWT2-PH1@UTACC SWT2-PH2@UTACPB • 320 Xeon 3.2 GHz cores = 940 GHz • 2GB Ram/core = 640GB • 160GB Internal/unit 25.6TB • 8 dual core server nodes • 16TB of storage assisted by 6I/O server • Dedicated Gbit internal connections • 200 Optaron 3.2 GHz cores = 640 GHz • 2GB Ram/core = 400GB • 8GB SATA /unit 4TB • 8 dual core server nodes • 75TB of storage by 10 Dell MD100 Raid assisted by 8 I/O servers • Dedicated Gbit internal connections
SWT2 Growth Courtesy M. Ernst US ATLAS Transparent Distributed Facility Workshop 3/04/08
ATLAS SWT2 Capacity • Current Capacity for Project Funded Resources • Total CPU ~ 1,200K SI2K • Total Disk ~ 92TB (usable) • Additions to SWT2_CPB • Total CPU ~ 2,200K SI2K • Total Disk ~ 300TB (usable) • Future Purchases will be weighted more to storage growth
Network Capacity History at UTA • Had DS3 (44.7MBits/sec) till late 2004 • Choke the heck out of the network for about a month downloading D0 data for re-reconstruction • Met with VP of Research at UTA and emphasized the importance of network backbone for attracting external funds • Increased to OC3 (155 MBits/s) early 2005 • OC12 as of early 2006 • Connected to NLR (10GB/s) through LEARN (http://www.tx-learn.org/) via 1GB connection to NTGP • $9.8M ($7.3M for optical fiber network) state of Texas funds approved in Sept. 2004 • Most areas on LEARN lit • The “Last Mile” connection problem still exists
LEARN Status http://www.tx-learn.org/networkmap.cfm
NLR – National LambdaRail ONENET LONI LEARN 10GB/sec connections
Software Development Activities • MonALISA based ATLAS distributed analysis monitoring • A good, scalable system • Software development and implementation completed • ATLAS-OSG sites are on the LHC Dashboard • New server brought back into service for OSG at UTA • Completeda DDM monitoring project • Working on Achelois, the ATLAS Operations Support System
CSE Student Exchange Program • Joint effort between HEP and CSE • David Levine, Gergley Zaruba and Manfred Huberare primary contacts at CSE • A total of 10 CSE MS Students each have worked in SAM-Grid team • Five generations of the student • Many of them playing leading roles in grid community • Abishek Rana at UCSD • Parag Mashilka at FNAL • Sudhamsh Reddy worked for UTA and back into Ph.D program • New program with BNL implemented • First student on completed the tenure and is on job training • Second set of two Ph.D. students at BNL and completing their tenure • Participating in ATLAS Panda monitoring project • One student working on pilot factory using condor glide-in • Finalizing the integration into the regular service • Facing some budget cut issues here
The LAW Project • Co-Linux based Condor System OU established as the model • Had a meeting between OU team and UTA IT management in Feb. during the HiPCAT meeting • We are very grateful for the OU team willing to come and help • A team of UTA IT personnel designated to work on this project, looking into various aspects • Head of Academic Computing Services is the lead • Had a meeting with her two weeks ago • 28 new machine lab in Engineering as the trial case • IT wants to start with 5 machines or so first • IT will manage and support based software, OS and Co-Linux • All other software related responsibilities on HEP
Conclusions • MonALISA based panda monitoring providing info to LHC Dashboard • New server was brought up Waiting for BNL to separate panda monitor servers out • Completed a project in DDM monitoring • Completed EG2 (photon) CSC note • Plan to connect to 10GB/s NLR via 1GB connection to UTD • Quite expensive to make the final connectio • Involved in ATLAS Computing Operations Support • Working closely with HiPCAT for State-wide grid activities • Started working on setting up a Co-Linux Condor Cluster