250 likes | 346 Views
AMS Computing Y2001-Y2002. AMS Technical Interchange Meeting MIT Jan 22-25, 2002. Vitali Choutko, Alexei Klimentov. Outline. AMS Production Farm requirements architecture prototyping test of HW and SW components HW and SW evaluation for AMS02 Ground Segment
E N D
AMS Computing Y2001-Y2002 AMS Technical Interchange Meeting MIT Jan 22-25, 2002 Vitali Choutko, Alexei Klimentov
Outline • AMS Production Farm • requirements • architecture • prototyping • test of HW and SW components • HW and SW evaluation for AMS02 Ground Segment • Data Transmission SW • Y2002 Milestones AMS TIM, MIT, Jan 22-25 2002
RT data Commanding Monitoring NRT Analysis POIC@MSFC AL POCC POCC HOSC Web Server and xterm XTerm commands Monitoring, H&S data Flight Ancillary data AMS science data (selected) cmds archive TReK WS “voice”loop Science Operations Center Science Operations Center TReK WS Video distribution External Communications PC Farm GSE GSE NRT Data Processing Primary storage Archiving Distribution Science Analysis AMS Data, NASA data, metadata Buffer data Retransmit To SOC GSE Production Farm MC production D S A e T r A v e r AMS Remote center Analysis Facilities Data Server MC production Data mirror archiving Analysis Facilities AMS Station AMS Station AMS Station AMS Ground Centers AMS TIM, MIT, Jan 22-25 2002
AMS Production Farm (requirements) Complex system that consists of computing components including I/O nodes, worker nodes, data storage and networking switches. It should perform as a single system. Requirements : • Reliability – High (24h/day, 7days/week) • Performance goal – process data “quasi-online” (with typical delay < 1 day) • Disk Space – 12 months data “online” • Minimal human intervention (automatic data handling, job control and book-keeping) • System stability – months • Scalability • Price/Performance AMS TIM, MIT, Jan 22-25 2002
AMS Production Farm (considerations) Considerations based on AMS01 data processing experience and MC production Y2000-2001 : • Uniform node architecture ( dual-CPU Pentiums and AMDs) • Uniform Operating System (RedHat Linux) • Computing capacity equivalent to 400x450MHz PII processors (including 20% contingency and reprocessing) • Total of 10 Tbyte data stored online • Two types of computers : “Processing node” with cheap IDE disks used for transient data storage “Server node” with IDE and SCSI RAID disks for persistent data storage AMS TIM, MIT, Jan 22-25 2002
Y2001 milestones HW evaluation to make a choice of platform and architecture (“official” AMS02 simulation/reconstruction code been used for the benchmarking) • Functional Goal : AMS01 STS91 Data Rerun and AMS02 MC production using production farm prototype and SW AMS TIM, MIT, Jan 22-25 2002
AMS02 Benchmarks 1) Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 AMS TIM, MIT, Jan 22-25 2002
AMS01 STS91 Data Rerun (performance) AMS TIM, MIT, Jan 22-25 2002
AMS02 Benchmarks (summary) • α-ev68 866MHz and AMD Athlon MP 1800+ have nearly the same performance and are the best candidates for “AMS processing node” (the price of system based on α-ev68 is twice higher than the similar one based on AMD Athlon) • Though PIV Xeon has lower performance, resulting 15% overhead comparing with AMD Athlon MP 1800+, the requirements of high reliability for “AMS server node” dictates the choice of Pentium machine. • SUN and COMPAQ SMP might be the candidates for AMS analysis computer (the choice is postponed up to L-12 months) Conclusion : The total power of AMS02 processing farm must be equivalent to 50 AMD Athlon MP 1800+ computers. AMS TIM, MIT, Jan 22-25 2002
Processor Chip set Memory System Disk Disk Controller Disks (transient storage) Ethernet Adapters “public” “AMS private” dual-CPU 1.5+GHz currently AMD 1 GB RAM LVD SCSI 3Ware IDE RAID 6x120+GB IDE 100 Mbit/sec 2x1 GBit/sec Production Farm (“AMS processing node” architecture) AMS TIM, MIT, Jan 22-25 2002
Processor Chip set Memory System Disk Disk Controller Disks (permanent storage) Disk Controller Disks (transient storage) Ethernet Adapters “public” “AMS private” dual-CPU 1.4+GHz currently Intel 1 GB RAM LVD SCSI IPC SCSI RAID 8x180+GB SCSI 3Ware IDE RAID 7x120+GB IDE 100 Mbit/sec 2x1 GBit/sec Production Farm (“AMS server node” architecture) AMS TIM, MIT, Jan 22-25 2002
Production Farm HW • Tape Drive (“raw” data backup) • IBM LTO Ultrium (connected to “server node” prototype) • data transfer (write) RAID 5 array -> tape 11MByte/sec • data transfer (read) tape -> Null device 19MByte/sec • tape -> RAID 5 array 11MByte/sec • tape capacity 200GB • (see also http://cscct.home.cern.ch/cscct/ultrium) AMS TIM, MIT, Jan 22-25 2002
Production Farm AMS Science Operation Center Computing Facilities Tape Server Tape Server Disk Server PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ Gigabit Switch (1 Gbit/sec) Archiving and Staging #8 #2 PC Linux Server 2x2GHz, SCSI RAID Cell #1 Gigabit Switch(1 Gbit/sec) MC Data Server AMS data NASA data metadata PC Linux 2x2GHz+ PC Linux 2x2GHz+ Disk Server Disk Server 2xSMP, (Q, SUN) Disk Server Disk Server Gigabit Switch (1 Gbit/sec) Data Server Simulated data Analysis Facilities A.Klimentov Jan 15,2002
AMS Computing Y2001 (SW) • AMS production process/process communication and control SW (PPCC) and monitoring • Data Handling ORACLE DB to store metadata and catalogues (M.Boschini, A.Klimentov) • Data transmission package Client/Server Corba technology (V.Choutko) Process Monitoring package (M.Boschini, V.Choutko, A.Klimentov) Based on bbftp (A.Elin, A.Klimentov AMS note 2001-11-02) AMS TIM, MIT, Jan 22-25 2002
AMS Production Highlights • Excellent HW stability ( uptime more than 3 months) • AMS01 STS91 data rerun (10 Linux boxes, 19 CPUs) • Average efficiency 95% (cpu time/elapsed time) • Processes communication and control via Corba • LSF for process submission • Oracle server on AS4100 Alpha and Oracle clients on Linux. • Oracle RDBMS • Tag DB with 100M entries • Conditions DB with 100K entries • Bookkeeping • Production status • Runs history • File catalogues AMS TIM, MIT, Jan 22-25 2002
Data Transmission SW 1) • High Rate Data Transfer between MSFC and POCC/SOC, POCC and SOC, SOC and MasterCopy repositary(s) will become a paramount importance(tests with TReK between MIT and CERN, TReK is the best candidate for AMS commanding and transferring of data samples) • What should be used for the bulk data transfer ? • Why not FileTransferProtocol (ftp) or ncftp , etc ? to speed up data transfer to encrypt sensitive data and not encrypt bulk data to run in batch mode with automatic retry in case of failure • … starting to look around and came up with bbftp in September (bbftp developed in BaBar and used to transmit data from SLAC to IN2P3@Lyon) adapted it for AMS, wrote service and control programs A.Elin, A.Klimentov AMS note 2001-11-02 P.Fisher, A.Klimentov AMS Note 2001-05-02 AMS TIM, MIT, Jan 22-25 2002
Server copy data files between directories (optional) scan data directories and make list of files to be transmitted purge successfully transmitted files and do book-keeping of transmission sessions Client periodically connect to server and check if new data available bbftp new data and update transmission status in the catalogues. Data Transmission SW (the inside details) AMS TIM, MIT, Jan 22-25 2002
Data Transmission SW (tests) 1) Server and client – dual-CPU Intel PIII , Linux OS. bbftp release 2.1.2 Transmit AMS01 “raw” data and AMS01 data summary files (Ntuples) Duration 12-24h 1) M.Boschini installed bbftp in INFN Milano AMS TIM, MIT, Jan 22-25 2002
AMS Computing Y2001 Y2001 milestones are fulfilled AMS TIM, MIT, Jan 22-25 2002
AMS Computing Y2002 • Build AMS02 “ production cell ” and use it for MC production • Build AMS02 “ analysis cell ” • AMS02 process and data control SW (migrate from OpenSource Corba to the licensed version) • “bbftp” tests between MIT and CERN, GSC@MSFC and MIT/CERN • Evaluate archiving and staging system for AMS (Jan 2002 - 4TB) AMS TIM, MIT, Jan 22-25 2002
AMS Computing Y2002 (“production cell”) Processing Nodes 1-5 Dual-CPU Athlon 1900+ 1GB RAM 3Ware IDE Raid 6x120GB Western Digital 1 Gbit/sec ethernet 2x100MBit/sec ethernet “Processing Node #1” “Processing Node #5” I D E R A I D I D E R A I D Dual-CPU AMD Dual-CPU AMD 1Gbit/ses AMS private 100 Mbit/sec CERN backbone Server Node 1 Dual-CPU Xeon or PIII 1GB RAM 3Ware IDE Raid 7x120GB Western Digital IPC SCSI Raid 8x160GB WD disks 1 Gbit/sec ethernet 2x100MBit/sec ethernet “ Server Node #1 ” I D E R A I D I D E R A I D Analysis programs S CS I R A I D Dual-CPU Intel AMS TIM, MIT, Jan 22-25 2002
AMS Computing Y2002 (“analysis cell”) • 2 dual-CPU AMD Athlon dedicated for AMS analysis and Geant4 simulation. • Architecture is similar to “AMS processing node” (but 4 channels IDE RAID controller with 4x120GB WD HDD) AMS TIM, MIT, Jan 22-25 2002
Y2002 Milestones • AMS computers upgrade (1Q) • AMS “production cell” (1Q) • AMS “analysis cell” (2Q) • Data transmission tests (2Q) • Evaluation of archiving and staging systems (technical meeting with CASPUR Feb/Mar, system choice 3Q) • AMS data handling and PPCC SW, Licensed CORBA package (3Q) AMS TIM, MIT, Jan 22-25 2002
Growth of computers and data storage in Science Operation Center AMS TIM, MIT, Jan 22-25 2002