1 / 14

Database Replication and Monitoring in ATLAS Computing Operations Suijian Zhou

Database Replication and Monitoring in ATLAS Computing Operations Suijian Zhou LCG Database Readiness Workshop Rutherford, UK March.23,2006. The ATLAS Tiers and roles.

chana
Download Presentation

Database Replication and Monitoring in ATLAS Computing Operations Suijian Zhou

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Replication and Monitoring in ATLASComputing Operations Suijian Zhou LCG Database Readiness Workshop Rutherford, UK March.23,2006

  2. The ATLAS Tiers and roles • Tier0: 1). Calibration and alignment 2). First-pass ESD,AOD production and TAG production 3). Archiving and Distribution of RAW, ESD, AOD and TAG data • Tier1: 1). Storage of RAW,ESD, calibration data, meta-data, analysis data, simulation data and databases 2). Perform reprocessing of RAWESD • Tier2: 1). Data processing for calibration and alignment tasks 2). Perform Monte Carlo simulation and end-user analysis – batch and interactive.

  3. The ATLAS Databases • Detector production, detector installation • Survey data • Detector geometry • Online configuration, online run book-keeping, run conditions (DCS and others) • Online and offline calibrations and alignments • Offline processing configuration and book- keeping • Event tag data

  4. Conditions Database of ATLAS • It refers to nearly all the non-event data produced during the operation of the ATLAS detector, and also those required to perform reconstruction and analysis • Varies with time, characterized by “Interval of validity” (IOV) • It includes: 1). data ahchived from ATLAS detector control system(DCS) 2). online book-keeping data, online and offline calibration and alignment data 3). Monitoring data charactering the performance of the detector

  5. ATLAS DB Replication Task • Conditions DB should be distributed worldwide to support the data processing tasks at Tier-1s and Tier-2s • Conditions DB updates (e.g. improved calibration constants) generated worldwide should be brought back to the central CERN-based DB servers, for subsequent distribution to all sites that require them • To avoid overloading the central Tier0 server at CERN (thousands of jobs requiring the database at the same time may exhaust the resources of a single DB server or even crash it), slave DB servers need to be deployed on at least 10 Tier-1s

  6. The Conditions DB--COOL • Interval-of-Validity (IOV) based storage and retrieval expressed as a range of absolute times or run and event numbers • Data is stored in folders which are arranged in a hierarchical structure of foldersets • Implements using Relational Access Layer (RAL), makes it possible for COOL database to be stored in Oracle, MySQL or SQLite technology

  7. ATLAS DB Replication Strategies (1) • Conditions data in POOL ROOT format can be replicated using the standard tools of the ATLAS Distributed Data Management (DDM) system  DQ2 • Small database such as Geometry DB using MySQL and SQLite technologies. • Native Oracle Streams replication from Tier-0Tier-1s, where data are replicated ‘real-time’ from master to slave databases. (any Oracle data, also event TAG data etc.)

  8. ATLAS DB Replication Strategies (2) • COOL API-Level replication from OracleSQLite. The PyCoolCopy tool in PyCoolUtilities (Python-based COOL Utilities) enables subsets of COOL folder trees copied from one database to another. Currently ‘static’, will be ‘dynamic’ in the future. • CORAL Frontier-based replication. It translate SQL database requests into http protocol request at the client. A Tomcat web server interacting with an Oracle database backend will return the query results to the client as html pages. Setup squid web-proxy cache servers at Tier-0,Tier-1s.

  9. The Octopus Replicator for the Database Replication(1) • It can work between different database backends as long as they contain equivalent schemas (e.g. Atlas GeometryDatabase, Tag database, etc.) • It is configured to replicate between Oracle, MySQL and SQLite. It also works on other database and files: Access, MSQL, CJDBC, EXCEL, Informix, PostgreSQL, XML etc. • Other functions as: Database backup/restore, and Database synchronization

  10. The Octopus Replicator for the Database Replication(2) • The Octopus Replicator works in two steps: 1).Generation of database schema description and conversions scripts (generate). 2). The actual database replication itself (load) • Typical configurations for Atlas tasks are considered: Geometry Database: • Oracle  MySQL • Oracle  SQLite Tag Database: • MySQL  MySQL • MySQL  Oracle • Oracle  MySQL

  11. Database replication monitoring(1) • Dedicated machine “atlmysql04” for database replicationmonitoring and test is being set up. Currently: mysql-standard-4.0.26 MonALISA v1.4.14 MonAMI v0.4 are installed on this server. • A “farm_name” of “atlasdbs” on MonALISA is given to this server

  12. Database replication monitoring (2) • Using MonALISA and MonAMI to monitor the DB replication activities(e.g. from Tier0Tier1s DB servers) • System information of DB servers (Load, free memory etc.) • Network information (traffic, flows, connectivity, topology etc.) • The MonAMI (by Paul Millar etc.) monitoring daemon uses a plugin architecture to talk between the “monitoring targets” (a MySQL database, an Apache webserver etc.) and the “reporting targets” (MonAlisa, ganglia etc.)

  13. The MonAlisa monitoring system

  14. Next tasks: • Support from MonAMI for plugins to monitor Oracle database. • Deploy and test the monitoring as soon as possible.

More Related