200 likes | 212 Views
Explore the context, structure, and impact of CORAL framework in CERN experiments. Learn about the components, backends, and security issues tackled by this system.
E N D
EGI- FORUM Vilnius 2001 CORAL A Relational Abstraction Layer for C++ and Python applications Raffaello Trentadue
Outline • The contextwhere CORAL wasprojected and implemented • CORAL INTRODUCTION • What’s CORAL • Why CORAL? • CORAL STRUCTURE • Organization of the pakages • CORAL back-ends • CORAL IMPACT • CORAL WORK PLAN • Maintanance • New features • CONCLUSIONS Raffaello Trentadue
LHC challenge • The enormousamount of data to beprocessed and analyzed (hundredpetabytes over the wholelifetime). • Impossible the implementation of a unique CERN analysisfacility • Boosted the development of the gridcomputing infrastructure and technology. • A distributedanalysis model implies an inhomogeneity in the data storing infrastructure across the different institutes and across the long LHC lifetime. • The computing infrastructure has to beeaslymaintainable and adaptable.
Persistencyframework In threeof the experiments (ATLAS, CMS and LHCb), some types of data are stored and accessed using the software developed by the Persistency Framework (PF) within the Application Area (AA) of the LHC Computing Grid (LCG) to find common solutions for the LHC experiments. • POOL is a generic hybrid store for C++ objects, metadata catalogues and collections, using streaming and relational technologies. • COOL provides specific software to handle the time variation and versioning of conditions data.
CORAL: introduction What’s CORAL ? Common Relational Abstraction Layer (CORAL) is a C++ framework to access data in a relationaldatabases. The C++ API of CORAL consists of a set of SQL-free abstract interfaces that isolate the user code from the database implementation technology. Python bindings of the API are also available. Python bindings of the API are also available.
Why CORAL? In the distributed data analysis model, thereis an inhomogeneity in the data storing (database) infrastructure and securitypoliciesacross the differentinstututes. Why CORAL ? • CORAL provides a set of C++ libraries for severaldatabasebackends: • local access to SQLitefiles; • direct client access to Oracle and MySQLservers; • read-only access to Oracle through the FroNTier/Squid web server/cache system. • Users write the same code for all backends • not required a detailed knowledge of the many SQL flavors • the SQL commands specific to each backend are executed by the relevant CORAL libraries, which are loaded at run-time by a special plugin infrastructure.
CORAL packages Relational Service RelationalAccess CoralBase CoralCommon CoralKernel ConnectionService Environment Authentication Service Access point to CORAL API are implemented as abstract classes XML Authentication Service XML Lookup Service LFC Replica Service CORAL backends Oracle Access MySQL Access SQLite Access Coral Server Frontier Access PyCoral Tests
A CORAL application A CORAL application includes the following main components: the data analysis logic; the appropriate CORAL database backend access plugin, loaded in runtime; the third-party database backend access client libraries linked to the plugin. The third-party libraries directly connect to the database servers, through the institute firewall.The database servers must be visible to the Internet SECURITY ISSUE ! !
ConnectionSevice The input is an identification string that identifies a databaseschema and contains all the information to start a database session through the followingsteps: The system loads a databaselookup plugin thatinterprets the string; Loads the corresponding database access plugin; Loads the plugin that handles the security schema; Establishes a connection to the database server, by calling the interface methods of the security schema handling plugin; Starts an updating database session, by using the access plugininterface; Returns a handle to the open database session. The handle completely hides the details above.
Oracle OCI Access The Oracle Call Interface (OCI) is an applecationprogramming interface (API) whichallowsdevelopers to build applications usinglow-level C and C++ funcion calls to access an Oracle database server. Similarly, not onlydoes OCI allowusers to control all the aspects of SQL statementexecution, but italsofully supports the datatypes, calling conventions, sintax, and semantics of C and C++. OracleAccess
OracleAccess limitations • Password vulnerability and exposure of database ports on the public network for remote users because of the only authentication mechanism currently available for experiment jobs consists in providing user names and passwords at connection time. • Difficult maintenance of the authentication infrastructure to enable all the collaboration members the database access. • Very inefficient use of server resources performance bottlenecks for high rate of connections such as for many jobs launched at the same time that access the same schema simultaneously. • Dependency of user applications on the Oracle client software, as this is linked to the CORAL plugin loaded at runtime. Any update to a new version of the Oracle client implies the redistribution to all Grid nodes of this software, and in most cases also of new CORAL libraries rebuilt against it and packaged as a new software release.
CORAL Server A middle-tier server introduced between CORAL client applications and the Oracle database servers. Client applications connect to the “CORAL server” through a new dedicated CoralAccessplugin, while the CORAL server, itself a CORAL application, would connect to the Oracle database server inside the site firewall.
CORAL Server advantages The Oracle credentials could be stored on the CORAL server, which would use them for clients authenticated and authorized using their proxy certificates. Access from several concurrent clients to the same Oracle schemas could be multiplexed through a single connection from the CORAL server to the database, reducing the load on the latter. Client applications would also no longer depend on the Oracle client software, which is only needed in the CORAL server. The model above has then been extended by introducing an additional “CORAL server proxy” tier between the clients and the CORAL server, to provide data caching and further multiplexing for read-only use cases.
Frontier Access The FronNTier client converts the SQL into an HTTP GET and sends it over the network to the ForNTierserver. The FroNTier server unpacks the SQL request, sends it to the DB server, and retrieves the needed data. The data isoptionallycompressed, thenpackedinto an HTTP formattedstream and sent back to the client. Squid proxy/caching servers between the FroNTier server and the client caches requestedobjects, significantlyimproving performance and greatlyreducing the load on the central database.
CORAL Impact The CORAL frameworkisat the moment crucial for reading the condition data of almost all the LHC experiments. The reason of itssuccessis in: Itscapacity to beintegrated in a griddistributedanalysis model 2. itscapacityto shield the user applications from the SQL dialectsused to accessanyrelationaldatabase. 3. The user does not need to be expert on all the possible databaseoptimization techniques, as they are alreadyimplemented in the backendaccessing the plugins. 4. Itsflexibility to differentback-ends Thesefeaturesmake CORAL able to fit needs of anyenvironment out of High EnergyPhysicswheresafeaccess to a relationaldatabaseisrequired, specially if in the context of a distributedanalysis.
CORAL work plan The software is by now mature in its development cycle, but a large development and support effort is still required for user support, service operation and maintenance tasks. Regular production releases are prepared whenever one of the LHC experiments demands it, leading to one release per month on average. This is generally motivated either by urgent bug fixes and functionality enhancements in the Persistency software, or by upgrades in the versions of the ‘external’ dependencies (ROOT, Boost, Python, Oracle...). These external versions, which are different from those installed on a predefined O/S version, vary quite frequently because they must match those chosen by the three experiments for their frameworks (Gaudi for LHCb, Athena for ATLAS and CMSSW for CMS), into which the Persistency packages are linked to build data-processing client applications.