140 likes | 686 Views
Database Replication and Monitoring in ATLAS Computing Operations Suijian Zhou LCG Database Readiness Workshop Rutherford, UK March.23,2006. The ATLAS Tiers and roles.
E N D
Database Replication and Monitoring in ATLASComputing Operations Suijian Zhou LCG Database Readiness Workshop Rutherford, UK March.23,2006
The ATLAS Tiers and roles • Tier0: 1). Calibration and alignment 2). First-pass ESD,AOD production and TAG production 3). Archiving and Distribution of RAW, ESD, AOD and TAG data • Tier1: 1). Storage of RAW,ESD, calibration data, meta-data, analysis data, simulation data and databases 2). Perform reprocessing of RAWESD • Tier2: 1). Data processing for calibration and alignment tasks 2). Perform Monte Carlo simulation and end-user analysis – batch and interactive.
The ATLAS Databases • Detector production, detector installation • Survey data • Detector geometry • Online configuration, online run book-keeping, run conditions (DCS and others) • Online and offline calibrations and alignments • Offline processing configuration and book- keeping • Event tag data
Conditions Database of ATLAS • It refers to nearly all the non-event data produced during the operation of the ATLAS detector, and also those required to perform reconstruction and analysis • Varies with time, characterized by “Interval of validity” (IOV) • It includes: 1). data ahchived from ATLAS detector control system(DCS) 2). online book-keeping data, online and offline calibration and alignment data 3). Monitoring data charactering the performance of the detector
ATLAS DB Replication Task • Conditions DB should be distributed worldwide to support the data processing tasks at Tier-1s and Tier-2s • Conditions DB updates (e.g. improved calibration constants) generated worldwide should be brought back to the central CERN-based DB servers, for subsequent distribution to all sites that require them • To avoid overloading the central Tier0 server at CERN (thousands of jobs requiring the database at the same time may exhaust the resources of a single DB server or even crash it), slave DB servers need to be deployed on at least 10 Tier-1s
The Conditions DB--COOL • Interval-of-Validity (IOV) based storage and retrieval expressed as a range of absolute times or run and event numbers • Data is stored in folders which are arranged in a hierarchical structure of foldersets • Implements using Relational Access Layer (RAL), makes it possible for COOL database to be stored in Oracle, MySQL or SQLite technology
ATLAS DB Replication Strategies (1) • Conditions data in POOL ROOT format can be replicated using the standard tools of the ATLAS Distributed Data Management (DDM) system DQ2 • Small database such as Geometry DB using MySQL and SQLite technologies. • Native Oracle Streams replication from Tier-0Tier-1s, where data are replicated ‘real-time’ from master to slave databases. (any Oracle data, also event TAG data etc.)
ATLAS DB Replication Strategies (2) • COOL API-Level replication from OracleSQLite. The PyCoolCopy tool in PyCoolUtilities (Python-based COOL Utilities) enables subsets of COOL folder trees copied from one database to another. Currently ‘static’, will be ‘dynamic’ in the future. • CORAL Frontier-based replication. It translate SQL database requests into http protocol request at the client. A Tomcat web server interacting with an Oracle database backend will return the query results to the client as html pages. Setup squid web-proxy cache servers at Tier-0,Tier-1s.
The Octopus Replicator for the Database Replication(1) • It can work between different database backends as long as they contain equivalent schemas (e.g. Atlas GeometryDatabase, Tag database, etc.) • It is configured to replicate between Oracle, MySQL and SQLite. It also works on other database and files: Access, MSQL, CJDBC, EXCEL, Informix, PostgreSQL, XML etc. • Other functions as: Database backup/restore, and Database synchronization
The Octopus Replicator for the Database Replication(2) • The Octopus Replicator works in two steps: 1).Generation of database schema description and conversions scripts (generate). 2). The actual database replication itself (load) • Typical configurations for Atlas tasks are considered: Geometry Database: • Oracle MySQL • Oracle SQLite Tag Database: • MySQL MySQL • MySQL Oracle • Oracle MySQL
Database replication monitoring(1) • Dedicated machine “atlmysql04” for database replicationmonitoring and test is being set up. Currently: mysql-standard-4.0.26 MonALISA v1.4.14 MonAMI v0.4 are installed on this server. • A “farm_name” of “atlasdbs” on MonALISA is given to this server
Database replication monitoring (2) • Using MonALISA and MonAMI to monitor the DB replication activities(e.g. from Tier0Tier1s DB servers) • System information of DB servers (Load, free memory etc.) • Network information (traffic, flows, connectivity, topology etc.) • The MonAMI (by Paul Millar etc.) monitoring daemon uses a plugin architecture to talk between the “monitoring targets” (a MySQL database, an Apache webserver etc.) and the “reporting targets” (MonAlisa, ganglia etc.)
Next tasks: • Support from MonAMI for plugins to monitor Oracle database. • Deploy and test the monitoring as soon as possible.