TAG Catalog Replication Using Streams Florbela Viegas, CERN ADP

TAG Catalog Replication UsingStreams Florbela Viegas, CERN ADP

The TAG Information System is composed of several databases scattered across Europe and America and a suite of services used to access the data The event data is composed of POOL Collections, where a run from a collection directly maps to a merged TAG dataset produced at Tier-0 and Tier-1s. The TAG catalog maintains the information of what data is available where. It is accessed by the ELSSI Suite services, at each application site, presently CERN, BNL and TRIUMF. The TAG catalog is replicated from CERN to the other sites using materialized views. It presently resides in the TAG databases next to the data. Overview of the TAG Information System

TAG Data Architecture CERN DESY PIC COMA DB COMA DB COMA DB TASK DB ELSSI Suite All Data except Monte Carlo Monte Carlo & Otherrecent Data Most Recent Data (no MC) BNL TRIUMF RAL COMA DB COMA DB COMA DB TASK DB ELSSI Suite TASK DB ELSSI Suite SeptemberReprocessing All 2010 Data except Monte Carlo DecemberReprocessing

The TAG catalog master is installed at CERN ATLARC database. In the event of failure, recovery can and will take days. The consequence of failure at CERN is that Tier-0 upload must completely stop and a backlog will quickly build up on busy periods. Conceptually there is no reason why the catalog must be in the same database of the data. In fact, the catalog should continue to be available for writing if one of the data sites goes down, including CERN. So, to address this, I propose to move the TAG catalog master to ATLR, as a first step. Presentweaknesses of the catalog

The TAG catalog will very soon include a « Service catalog » which will keep updated the state of all the services in the TAG Information systems This will enable to make decisions at run time about failover and load balancing. For this need, a smaller latency between the replicas is needed than is offered by simple materialized views. I propose to replicate the TAG catalog from ATLR to the Tier-1 3D databases using Streams. I don’t see caveats for this situation, as the transaction volume is very small, and the size of the catalog is 38MB. I’d like your input for any issues that might arise with this decision. Replication of the catalog

TAG Data Architecture – after move CERN ATLR TRIUMF 3D BNL 3D Tier-0 COMA DB Streams TASK DB TASK DB TASK DB CERN ATLARC BNLTAGS TRIUMFTAGS COMA DB COMA DB COMA DB ELSSI Suite ELSSI Suite ELSSI Suite All Data except Monte Carlo Reprocessed Data (no MC) All Data except Monte Carlo

TAG Catalog Replication Using Streams Florbela Viegas, CERN ADP