100 likes | 245 Views
Summary of the “one day data challenge” for the heavy ion run. Alberto Pace, for the IT-DSS group. Current situation. The ALICE and CMS disks pools have been extended in size to ensure to be able to keep all the data on disk for the entire duration of the run
E N D
Summary of the “one day data challenge” for the heavy ion run Alberto Pace, for the IT-DSS group
Current situation • The ALICE and CMS disks pools have been extended in size to ensure to be able to keep all the data on disk for the entire duration of the run • In addition to the tape copy, this gives multiple independent copies at CERN, as the tier1 replication may be delayed. • Current pool sizes (TB) • ALICE: 2,412 (T0) + 2,542 (ALICEDISK) • CMS: 1,205 (T0) + 1,005 (T0STREAMER) + 1,727 (CMSCAF) + 1,618 in various additional pools • ATLAS: 4,603 in various pools • LHCb: 1,301 in various pools
The HI test • Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference. • ALICE: • Test lasted 24 hours as expected (from 15:30 on 21/10 until 16:00 on 22/10) • Sustained data rate of2.5 - 3 GB/s, with peaks at 7 GB/s (intrapool replications) • Average file size of 3 GB, efficient use of tape drives • CMS • Test started late and due to intra pool replication delays, data arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours • data rate of 1.2 GB/s, with peaks at 2.6 GB/s • Average file size of 30 GB, efficient use of tape drives • CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues
The HI test • Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference. • ALICE: • Test lasted 24 hours as expected (from 15:30 on 21/10 until 16:00 on 22/10) • Sustained data rate of2.5 - 3 GB/s, with peaks at 7 GB/s (intrapool replications) • Average file size of 3 GB, efficient use of tape drives • CMS • Test started late and due to intra pool replication delays, data arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours • data rate of 1.2 GB/s, with peaks at 2.6 GB/s • Average file size of 30 GB, efficient use of tape drives • CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues ALICE
The HI test • Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference. • ALICE: • Test lasted 24 hours as expected (from 15:30 on 21/10 until 16:00 on 22/10) • Sustained data rate of2.5 - 3 GB/s, with peaks at 7 GB/s (intrapool replications) • Average file size of 3 GB, efficient use of tape drives • CMS • Test started late and due to intra pool replication delays, data arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours • data rate of 1.2 GB/s, with peaks at 2.6 GB/s • Average file size of 30 GB, efficient use of tape drives • CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues ALICE CMS
Seen from the TAPE Side ALICE Only 2 hours of overlap CMS
The TAPE subsystem • The aggregated throughput was well received by the tape subsystem at sustained rated of 4 GB/s (you need to add the red and blue lines) • We used less than 50 drives, meaning an average performances exceeding 80MB/s/drive, with peaks at 110-120MB/s
Validated Castor 2.1.9 features • The number of READ mounts was not interfering with production as we now have ceilings for concurrent reads. • The plot below shows the read mounts in blue limited to 40 concurrent since the CMS upgrade on the 19th • The write mounts related to the test are in grey Concurrent read ceiling since 19/10
A new record for the tape subsystem • 250 TB of raw data written in a single day !
Conclusion • From the Tier 0 perspective, the test was successful • However the full validation is not complete: • The two tests had a little time overlap: only 2 hours • The data rate from CMS was less than what requested. Is this what we should expect from CMS or this will increase during production ? • ATLAS was not sending data • The data we received was not white noise and we have achieved a higher compression factor on tape. • On 22/10, we wrote 153 tapes for 250 TB