140 likes | 440 Views
LFC Replication Tests. LCG 3D Workshop Barbara Martelli. Objectives of LFC Replication Tests. Understand if and how the Streams replication impacts LFC behaviour. Understand if the throughput achievable in terms of number of entries inserted per second is suitable for LHCb needs.
E N D
LFC Replication Tests LCG 3D Workshop Barbara Martelli
Objectives of LFC Replication Tests • Understand if and how the Streams replication impacts LFC behaviour. • Understand if the throughput achievable in terms of number of entries inserted per second is suitable for LHCb needs. • Understand if the sustained rate achievable in terms of number of entries inserted per second is suitable for LHCb needs. • Mesure the delay of replication for a particular entry. • Mesure the max throughput achievable in our configuration. • Mesure the max sustained rate achievable in our configuration. • Compare the read performances between present setup and Streamed setup (hope they’ll improve with a replica).
LHCb Access Pattern on LFC • At the moment LFC is used for • DC06 • MC production • Stripping • Analysis • Really difficult to estimate access pattern for the future, but we can make a snapshot of what happens today • Read access (end 2006) • 10M PFNs expected, read access mainly for analisis, one average user starts O(100) jobs. • Each job contacts LFC twice: once for DIRAC optimization, once at the aim of creating an XML POOL slice that will be used by the application to access data. • Every 15 minutes 1000 users are expected to submit jobs contacting the LFC 200 times. • 24*4*1000*200 ~ 20M LFC requests for analisis. 200Hz Read Only Requests. • Write access (today) • MC Production: 10-15 inserts per day • DC06: About 40MB/s transfers from CERN to T1s, file size is about 100MB -> one replicated file every 3 seconds. Every 30 files processed, 2 are created. • So we can expect about 1Hz for Write Access.
LFC Local Test Description (Feasibility test) • 40 LFC clients, 40 LFC daemons threads, streams pool. • Client’s actions • Control if LFN exists into the database • Select from cns_file_metadata • If yes -> add a sfn for that lfn • Insert sfn into cns_file_replica • If not -> add both lfn and sfn • Insert lfn into cns_file_metadata • Insert sfn into cns_file_replica • For each lfn 3 sfn are inserted
LFC Master HW Configuration Gigabit Switch GigabitSwitch • Dual Xeon 3,2GHz,4GB memory • 2nodes-RAC on Oracle 10gR2 • RHEL 4 kernel 2.6.9-34.ELsmp • 14 Fibre Channel disks (73GB each) • HBA Qlogic Qla2340 – Brocade FC Switch • Disk storage managed with Oracle ASM (striping and mirroring) Private LHCB link Private LHCB link rac-lhcb-02 rac-lhcb-01 ASM Dell 224F 14 x 73GB disks
LFC Slave Configuration • LFC Read only replica • Dual Xeon 2.4, 2GB RAM • Oracle 10gR2 (oracle RAC but used as single instance) • RHEL 3 kernel 2.4.21 • 6 x 250GB disks in RAID 5 • HBA Qlogic Qla2340 – Brocade FC Switch • Disk storage formatted with OCFS2
Performance • About 75 transactions per second on each cluster node. • Inserted and replicated 1700k entries in 4 hours (118 insert per second). • Almost real-time replica with Oracle Streams without significant delays (<< 1s).
CERN to CNAF LFC Replication • At CERN: 2 LFC servers connected to the same LFC Master DB Backend (single instance). • At CNAF: 1 LFC server connected to the replica DB Backend (single instance). • Oracle Streams send entries from the Master DB at CERN to the replica DB at CNAF. • Population Clients: python script which starts N parallel clients. The clients write entries and replicas into the Master LFC at CERN. • Read Only Clients: python script which reads entries from the master and from the replica LFC.
LFC Replication Testbed Read Only Clients Population Clients Population Clients LFC Read-Only Server LFC R-W Server LFC R-W Server lfc-streams.cr.cnaf.infn.it lxb0717.cern.ch lxb0716.cern.ch rls1r1.cern.ch LFC Oracle Server LFC Oracle Server lfc-replica.cr.cnaf.infn.it Oracle Streams Replica DB Master DB WAN
Test 1: 40 Parallel Clients • 40 parallel clients equally divided between the two LFC master servers. • Inserted 3700 replicas per minute during the first two hours. • Very good performance at the beginnig, but after few hours the master fall into a Flow Control state. • Flow Control means that the master is notified by the client that the update rate is too fast. Master slows down to avoid Spill Over at client side. • Spill Over means that the buffer of the Streams queue is full, so Oracle has to write the entries into the disk (persistent part of the queue). This decreases performances. • Apply side of Streams replication (slave) is usually slower than the master side, we argue that is necessary to decrease the insert rate to achieve good sustained performance.
Test 2: 20 Parallel Clients • 20 parallel clients equally divided between the two LFC master servers. • Inserted 3000 replicas per minute, 50 replicas per second. • Apply parallelism enhanced: 4 parallel apply processes on the slave. • After some hours the rate decreases, but reaches a stable state at 33 replicas per second. • Achieved sustained rate of 33 replicas per second. • No flow control on the master has been detected.
Conclusions • Even this test setup is less powerful than the production one, sustained insertion rate is even higher than LHCb needs. • Need to test read random access to understand if and how the replication impacts the response time. • Could be interesting understand which is the best replication rate achievable whith this setup, even if not requested by the experiments.