COOL deployment in ATLAS

COOL deployment in ATLAS • A brief overview to give flavour of COOL activities in ATLAS • COOL usage online and offline • Current COOL database instances • Conditions database deployment model • Testing - from online to Tier-n • Some ATLAS feedback Richard Hawkings (CERN) LCG COOL meeting, 03/7/06 Richard Hawkings

COOL usage in ATLAS • COOL is now widely deployed as ATLAS conditions database solution • Only some legacy 2004 combined testbeam data in Lisbon MySQL - migrating… • COOL usage in online software • CDI - interface between online distributed information system and COOL • Archiving IS ‘datapoints’ into COOL - track history - run parameters, status, monitoring • PVSS2COOL - application for copying selected DCS data from PVSS Oracle archive into COOL, for use in offline/external analysis • Interfaces between TDAQ ‘OKS’ configuration database and COOL (oks2cool) • Direct use of COOL and CORAL APIs in subdetector configuration code • COOL in offline software (Athena) • Fully integrated for reading (DCS, calibration, …) and calibration data writing • Using inline data payloads (including CLOBs), and COOL -> ref to POOL files • Supporting tools developed: • COOL_IO - interface to text and ROOT files, new AtlCoolCopy tool • Use of PyCoolConsole and PyCoolCopy • Use of Torre’s web browser, new Lisbon and Orsay JAVA/Athena-plugin browsers Richard Hawkings

COOL database instances in ATLAS • Now have around 25 GB of COOL data on production RAC • Most ATLAS subdetectors have something, largest volume from DCS (PVSS) • Current effort to understand how best to use PVSS smoothing/filtering techniques to reduce data volume without reducing information content • Data split between several database instances for offlline production (data challenges / physics analysis, …), hardware commissioning and 2004 combined testbeam • Mostly using COOL 1.3, but some COOL 1.2 data from ID cosmic tests • Gymnastics using replication to SQLite files to allow this to be read in offline (COOL1.3) • Some of this data is replicated nightly out of Oracle to SQLite files • 6 MB of COOL SQLite file data used in offline software simulation/reconstruction • These files are included in release ‘kits’ shipped to outside locations - for this ‘statically’ replicated data, no need to access CERN central Oracle servers from outside world • ATLAS COOL replica from ID cosmics is 350 MB SQLite file - still works fine • (but takes 10-15 minutes to produce replica using C++ version of PyCoolCopy) Richard Hawkings

Conditions data deployment model Tier-1 replica Computer centre Outside world Calib. updates ATLAS pit Streams replication Tier-1 replica Offline Oracle master CondDB Online OracleDB Online / PVSS / HLT farm Streams replication Tier-0 SQLite replication Dedicated 10Gbit link Tier-0 farm ATCN/CERN GPN gateway At present, all data on ATLAS RAC Introduce separate online server soon, once tests are complete Richard Hawkings

Conditions database testing • Tests of online database server • Using COOL verification client from David Front, started writing data from pit to online Oracle server - using COOL as an ‘example application’ (other online apps) • Scale: 500 COOL folders, 200 channels/folder, 100 bytes/channel, every 5 minutes • 3 GB/day DCS-type load (would come from PVSS Oracle archive via PVSS2COOL) • Working - will add Oracle Streams to ‘offline’ server soon • Working towards correct network config to bring online Oracle server into production (will be on private ATCN network, not visible on CERN GPN) • Replication for HLT - a challenge • Online HLT farm (level 2 and event filter) needs to read 10-100 MB of COOL conditions data at start of fill to each of O(10000) processes, as fast as possible • Possible solutions under consideration (work getting underway): • Replication of data for required run to SQLite file which is distributed to all hosts • Replication into MySQL slave database servers on each HLT rack fileserver • Running squid proxies on each fileserver and using Frontier (same data for each) • Using a more specialised DBProxy that understands e.g. MySQL protocol and can even do multicast to a set of nodes (worries about local network bandwidth to HLT nodes) Richard Hawkings

Conditions database testing, continued • Replication for Tier-0 • Tier-0 does prompt reconstruction, run-by-run, jobs start incoherently • Several hours per job, data spanning O(1 minute), 1000s jobs in parallel • Easiest solution is to extract all COOL data needed for each run (O(10-100 MB?)) once into SQLite file, distribute that to worker nodes • Solution with SQLite files (and POOL payload data) on replicated AFS being tested now • Replication outside CERN • Reprocessing of RAW data will be done at Tier-1s (Tier-0 busy with new data) • Need replication of all COOL data needed for offline reconstruction • Use Oracle Streams replication - being tested by 3D project ‘throughput phase’ • Once done, do some dedicated ATLAS tests (as in online -> offline), then production • Tier-2s and beyond need subsets (some folders) of conditions data • Analysis, Monte Carlo simulation, calibration tasks, … • Either use COOL API-based dynamic replication to MySQL servers in Tier-2s, or Frontier web-cache-based replication from Tier-1 Oracle • With squids at Tier-1s and Tier-2s, need to solve stale-cache problems (by policy?) • First plans for testing this being made - David Front, Argonne Richard Hawkings

Some ATLAS feedback on COOL • COOL is (at last) being heavily used, both online and offline • Seems to work well, so far so good… • Online applications are stressing the performance, offline less so • The ability to switch between Oracle, (MySQL) and SQLite is very useful • Commonly-heard comments from subdetector users • Why can’t we have payload queries? • This causes people to think about reinventing COOL, or accessing COOL data tables directly, e.g. via CORAL • We would like to have COOL tables holding foreign keys to other tables • Want to do COOL queries that include the ‘join’ to the payload data • Can emulate with 2-step COOL+CORAL, but not efficient for bulk • A headache for replication … • COOL is too slow - e.g. in multichannel bulk insert • We need a better browser (one that can handle large amounts of data) • Why can’t COOL 1.3 read COOL 1.2 schema? • I know the ‘COOL-team’ answers to these questions, but still useful to give them here - feedback from the end-users! Richard Hawkings

COOL deployment in ATLAS

COOL deployment in ATLAS

Presentation Transcript

Summary of ORA-07445 COOL tests for ATLAS

Site issues and deployment of federated XrootD infrastructure in ATLAS

Deployment of federated xrootd infrastructure in ATLAS

Partitioning in COOL

Muons in ATLAS

Discussion of COOL - CORAL - POOL priorities for ATLAS

Cool Cool Cats Cats Cool Cats

Classifiers in Atlas

Development, Deployment and Operations of ATLAS Databases

EvtGen in ATLAS

COOL for Atlas Prompt Reconstruction Further Performance Studies

Jets In ATLAS

Cool Roofs Cool Cities Cool Planet

DEPLOYMENT IN SINGAPORE

A RESTful Web Service Interface to the ATLAS COOL Database

Cool Roofs Cool Cities Cool Planet

ATLAS Database Deployment and Operations Requirements and Plans

EvtGen in ATLAS

Deployment in ExxonMobil