180 likes | 299 Views
An Overview of ATLAS Databases and Database access (Geometry/conditions) in Athena. Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011. Outline. Motivation: Databases Overview of ATLAS Databases Databases of Athena-based analysis interest
E N D
An Overview of ATLAS Databases and Database access (Geometry/conditions) in Athena Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011
Outline • Motivation: Databases • Overview of ATLAS Databases • Databases of Athena-based analysis interest • Geometry Database • Conditions Database And how they are made accessible on the grid • Some tips for users • Summary and Conclusions Elizabeth Gallas - Databases/COMA
Motivation: Database use in ATLAS • ATLAS “data” – falls into 2 broad categories • Event-wise data: stored in files (RAW, ESD, AOD, TAG …) • Know something about themselves but also have some ‘metadata’ pointers to the bigger picture • Non-event-wise data: Stored in Databases • Enable construction of the ‘bigger picture’ • Important information needed at our fingertips • Usually by diverse clients • Data Base Management Systems (DBMS) provide: • persistent storage • for large/small collections of data of varied complexity • in data structures that provide access flexibility • powerful query language • for data entry, modification and retrieval • transaction management • appearance of isolation • but provides multi-user simultaneous access Elizabeth Gallas - Databases/COMA
Overview – Oracle usage in ATLAS Oracle is used extensively: every stage of data taking, processing, analysis. Some of the more common applications: • Configuration • PVSS – Detector Control System (DCS) Configuration & Monitoring • Trigger – Trigger Configuration (online and simulation data) • OKS – Configuration databases for the TDAQ • Geometry - Detector Description • File and Job management • T0 – Tier 0 processing • DQ2/DDM – distributed file and dataset management • Dashboard – monitor jobs and data movement on the ATLAS grid • PanDa – workload management: production & distributed analysis • Conditions data (non-event data for offline analysis) • Conditions Database • [POOL files in DDM (referenced from the Conditions DB)] • “Metadata” == data about data • AMI (ATLAS Metadata Interface) – Dataset metadata • COMA (COnditions MetadatA) – Configuration/Conditions metadata • TAGs (not an acronym) – Event-level metadata Elizabeth Gallas - Databases/COMA
What does your Athena job need ? What does every Athena job need ? • Data (Events) • Database (Geometry, Conditions) • Efficient I/O (sometime across a network), CPU • (A Purpose and a) Place for Output • Next slides … more details about Geometry and Conditions • What they contain • How Athena accesses them • How they are distributed for access on the grid • User interfaces, documentation, and help • Needs: • Food • Water • Love • Place for output Elizabeth Gallas - Databases/COMA
Geometry Database • Relational DB: Primary Numbers for the ATLAS Detector Description • All data for building GeoModel description in single place • Primary numbers stored in Data Tables (leaf) • Organized by subsystem (branch) • Tagging (versioning) at various levels • Locked tags define distinct detector description • And Globally tagged/locked at higher levels • Associated with Software Releases • Evolution of Geometry tags is set up such that • Each new tag is compatible with older Releases • Location and Distribution: • Master copy: in Oracle server at CERN • Up to now: Copy of entire database dumped into SQLite file • Delivered to sites using DB Release technology with each Software Release • Future … more diverse distribution model being tested (Frontier) • Update: (Vakho Tsulaia) in upcoming Software/Computing workshop Elizabeth Gallas - Databases/COMA
Geometry DB Browser http://atlas.web.cern.ch/Atlas/GROUPS/OPERATIONS/dataBases/DDDB Elizabeth Gallas - Databases/COMA
“Conditions” LHC ZDC “Conditions” – general term for information which is not ‘event-wise’ reflecting the conditions or states of a system – conditions are valid for an ‘interval of validity’ (IOV) ranging from very short to infinity. IOV’s can be expressed as a range: in timestamps or Run/LumiBlocks. Any conditions data needed for offline processing and/or analysis must be stored in the ATLAS Conditions Database (aka: COOL) or in its referenced POOL files (DDM) TDAQ DCS OKS DQ ATLAS Conditions Database Elizabeth Gallas - Databases/COMA
Relies on considerable infrastructure: COOL, CORAL, Athena (developed by ATLAS and CERN IT) -- generic schema design which can store / accommodate / deliver a large amount of data for diverse set of subsystems. IOV ‘interval of validity’ DB in relational DB tables Data organized into folders … foldersets By schema (subdetector) By instance (for real data and MC) Stores data ‘inline’ but can have references to external POOL files (managed by DDM) Athena / Conditions DB data maps to transient C++ objects, which are accessible to Athena at run time through the Transient Store COOL Tag (version) - distinct sets of Conditions making specific computations reproducible Used at many stages of data taking and analysis From online calibrations, alignment, monitoring, to offline … processing … more calibrations … further alignment… reprocessing … analysis …to luminosity and data quality Conditions DB infrastructure in ATLAS Elizabeth Gallas - Databases/COMA
Conditions: User interfaces Command line interface: • https://twiki.cern.ch/twiki/bin/view/Atlas/AtlCoolConsole Conditions TAG Browser: • https://atlas-coolbrowser.web.cern.ch/atlas-coolbrowser/ Elizabeth Gallas - Databases/COMA
Oracle Distribution of Conditions data Outside world Tier-1 replica Calibration updates Computer centre Tier-1 replica Offline master CondDB Online CondDB Tier-0 farm Isolation / cut • Oracle stores a huge amount of essential data ‘at our fingertips’ • But ATLAS has many… many… many… fingers • May be looking for oldest to newest data • Conditions in Oracle – Master copy at Tier-0 • Replicated to many Tier-1 sites • Running jobs at Oracle sites (direct access) performs well • But direct Oracle access on the grid from remote sites: • Even after tuning, direct access requires many back/forth network transactions – RTT (Round Trip Time) multiplies … SLOW • Cascade effect: Jobs hold connections longer, prevents starting new jobs • Use alternative technologies, especially over WAN (Wide Area Network): • “caching” Conditions from Oracle when possible Simplified Diagram ! Elizabeth Gallas - Databases/COMA
Technologies for Conditions “caching” • “DB Release”: make a system of files containing all data ‘needed’. • Used in reprocessing campaigns and for MC processing/analysis • Includes: • SQLite replicas: “mini” Conditions DB • with specific Folders, IOV range, CoolTag • (a ‘slice’ – small subset of all rows in Oracle tables) • And associated POOL files and a PFC (file catalog) • “Frontier”: store results in a web cache. • Developed by Fermilab (used by CDF, further refined for CMS) • Basic Idea: Frontier / Squid servers located at/near Oracle RAC • negotiate transactions between grid jobs and Oracle DB • reduce the load on Oracle by caching results of repeated queries • reduce latency observed connecting to Oracle over the WAN. • Additional Squid servers at remote sites help even more • Used by default for user analysis jobs. • Picture on next slide Elizabeth Gallas - Databases/COMA
Conditions DB access via Frontier Frontier for distributed database access • Used by default for user analysis jobs. Main components • Frontier server • Communicates directly with Oracle server • Includes data caching • Provides data to Squids • Squid • Communicates with Frontier server over http • Caches retrieved data locally for its clients ATLAS: Frontier in operation late in 2009 • Frontier servers at T1 sites on replication • ~60 Squids all over the world • Mostly T2, some T3 too Tier 2 Tier 1 Elizabeth Gallas - Databases/COMA
DB Access in Athena • Athena applications access conditions and geometry DBs using LCG software libraries POOL, COOL and CORAL • Allows for transparent usage of various technologies (Oracle, SQLite, FroNTier/Squid) Elizabeth Gallas - Databases/COMA
Tips for Users (1) • What Global Conditions and Geometry tags to use? • Autoconfigure your job • Have job read global tags from its input file (ESD, AOD) • In job options: from RecExConfig.RecFlags import rec rec.AutoConfiguration=['everything'] • In job transforms: Command line parameter 'autoConfiguration=everything' https://twiki.cern.ch/twiki/bin/view/Atlas/RecExCommonAutoConfiguration Slide: V.Tsulaia Elizabeth Gallas - Databases/COMA
Tips for Users (2) • How to configure my environment to access • FroNTier/Squid? • Conditions payload POOL files? • DB Release for geometry (and MC conditions if needed)? • All that is done for you automatically... … just sit back and enjoy the ride! Slide: V.Tsulaia Elizabeth Gallas - Databases/COMA
Tips for Users (3) If things go wrong … and it seems to be related to database access Useful information on TWiki: • Athena DB Access: https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaDBAccess • COOL Troubles: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/CoolTroubles • Atlas DB Release: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasDBRelease These TWiki documents should be able to help you in narrowing down the problem and then you'll be in position to • Either ask your site admin • Or send email to Database Operations<hn-atlas-DBOps@cern.ch> Slide: V.Tsulaia Elizabeth Gallas - Databases/COMA
Conclusions: Databases and DB Access from Athena • Databases are used extensively in ATLAS • At every stage of data taking, processing, analysis • Scratch the surface of many interactive user applications • And you will find a Database ! • I’ve attempted to give an overview of the issues and considerations in DB access from Athena • The need to provide database information • In a variety of access patterns • With potentially widely varying data volumes • From diverse clients makes Athena access to ATLAS non-event-wise databases (Conditions and Geometry) complex. • Supporting different technologies • allows us to optimally meet the various needs. • A lot of effort has gone into making DB access for user analysis as transparent as possible … • More details can be found: • See V.Tsulaia slides • Software Workshop in Tbilisi Oct 26, 2010 • On various TWiki pages Elizabeth Gallas - Databases/COMA