70 likes | 335 Views
COOL deployment in ATLAS. A brief overview to give flavour of COOL activities in ATLAS COOL usage online and offline Current COOL database instances Conditions database deployment model Testing - from online to Tier-n Some ATLAS feedback. Richard Hawkings (CERN). LCG COOL meeting, 03/7/06.
E N D
COOL deployment in ATLAS • A brief overview to give flavour of COOL activities in ATLAS • COOL usage online and offline • Current COOL database instances • Conditions database deployment model • Testing - from online to Tier-n • Some ATLAS feedback Richard Hawkings (CERN) LCG COOL meeting, 03/7/06 Richard Hawkings
COOL usage in ATLAS • COOL is now widely deployed as ATLAS conditions database solution • Only some legacy 2004 combined testbeam data in Lisbon MySQL - migrating… • COOL usage in online software • CDI - interface between online distributed information system and COOL • Archiving IS ‘datapoints’ into COOL - track history - run parameters, status, monitoring • PVSS2COOL - application for copying selected DCS data from PVSS Oracle archive into COOL, for use in offline/external analysis • Interfaces between TDAQ ‘OKS’ configuration database and COOL (oks2cool) • Direct use of COOL and CORAL APIs in subdetector configuration code • COOL in offline software (Athena) • Fully integrated for reading (DCS, calibration, …) and calibration data writing • Using inline data payloads (including CLOBs), and COOL -> ref to POOL files • Supporting tools developed: • COOL_IO - interface to text and ROOT files, new AtlCoolCopy tool • Use of PyCoolConsole and PyCoolCopy • Use of Torre’s web browser, new Lisbon and Orsay JAVA/Athena-plugin browsers Richard Hawkings
COOL database instances in ATLAS • Now have around 25 GB of COOL data on production RAC • Most ATLAS subdetectors have something, largest volume from DCS (PVSS) • Current effort to understand how best to use PVSS smoothing/filtering techniques to reduce data volume without reducing information content • Data split between several database instances for offlline production (data challenges / physics analysis, …), hardware commissioning and 2004 combined testbeam • Mostly using COOL 1.3, but some COOL 1.2 data from ID cosmic tests • Gymnastics using replication to SQLite files to allow this to be read in offline (COOL1.3) • Some of this data is replicated nightly out of Oracle to SQLite files • 6 MB of COOL SQLite file data used in offline software simulation/reconstruction • These files are included in release ‘kits’ shipped to outside locations - for this ‘statically’ replicated data, no need to access CERN central Oracle servers from outside world • ATLAS COOL replica from ID cosmics is 350 MB SQLite file - still works fine • (but takes 10-15 minutes to produce replica using C++ version of PyCoolCopy) Richard Hawkings
Conditions data deployment model Tier-1 replica Computer centre Outside world Calib. updates ATLAS pit Streams replication Tier-1 replica Offline Oracle master CondDB Online OracleDB Online / PVSS / HLT farm Streams replication Tier-0 SQLite replication Dedicated 10Gbit link Tier-0 farm ATCN/CERN GPN gateway At present, all data on ATLAS RAC Introduce separate online server soon, once tests are complete Richard Hawkings
Conditions database testing • Tests of online database server • Using COOL verification client from David Front, started writing data from pit to online Oracle server - using COOL as an ‘example application’ (other online apps) • Scale: 500 COOL folders, 200 channels/folder, 100 bytes/channel, every 5 minutes • 3 GB/day DCS-type load (would come from PVSS Oracle archive via PVSS2COOL) • Working - will add Oracle Streams to ‘offline’ server soon • Working towards correct network config to bring online Oracle server into production (will be on private ATCN network, not visible on CERN GPN) • Replication for HLT - a challenge • Online HLT farm (level 2 and event filter) needs to read 10-100 MB of COOL conditions data at start of fill to each of O(10000) processes, as fast as possible • Possible solutions under consideration (work getting underway): • Replication of data for required run to SQLite file which is distributed to all hosts • Replication into MySQL slave database servers on each HLT rack fileserver • Running squid proxies on each fileserver and using Frontier (same data for each) • Using a more specialised DBProxy that understands e.g. MySQL protocol and can even do multicast to a set of nodes (worries about local network bandwidth to HLT nodes) Richard Hawkings
Conditions database testing, continued • Replication for Tier-0 • Tier-0 does prompt reconstruction, run-by-run, jobs start incoherently • Several hours per job, data spanning O(1 minute), 1000s jobs in parallel • Easiest solution is to extract all COOL data needed for each run (O(10-100 MB?)) once into SQLite file, distribute that to worker nodes • Solution with SQLite files (and POOL payload data) on replicated AFS being tested now • Replication outside CERN • Reprocessing of RAW data will be done at Tier-1s (Tier-0 busy with new data) • Need replication of all COOL data needed for offline reconstruction • Use Oracle Streams replication - being tested by 3D project ‘throughput phase’ • Once done, do some dedicated ATLAS tests (as in online -> offline), then production • Tier-2s and beyond need subsets (some folders) of conditions data • Analysis, Monte Carlo simulation, calibration tasks, … • Either use COOL API-based dynamic replication to MySQL servers in Tier-2s, or Frontier web-cache-based replication from Tier-1 Oracle • With squids at Tier-1s and Tier-2s, need to solve stale-cache problems (by policy?) • First plans for testing this being made - David Front, Argonne Richard Hawkings
Some ATLAS feedback on COOL • COOL is (at last) being heavily used, both online and offline • Seems to work well, so far so good… • Online applications are stressing the performance, offline less so • The ability to switch between Oracle, (MySQL) and SQLite is very useful • Commonly-heard comments from subdetector users • Why can’t we have payload queries? • This causes people to think about reinventing COOL, or accessing COOL data tables directly, e.g. via CORAL • We would like to have COOL tables holding foreign keys to other tables • Want to do COOL queries that include the ‘join’ to the payload data • Can emulate with 2-step COOL+CORAL, but not efficient for bulk • A headache for replication … • COOL is too slow - e.g. in multichannel bulk insert • We need a better browser (one that can handle large amounts of data) • Why can’t COOL 1.3 read COOL 1.2 schema? • I know the ‘COOL-team’ answers to these questions, but still useful to give them here - feedback from the end-users! Richard Hawkings