230 likes | 322 Views
Databases in the Grid. A New Data Source Oriented CE for GRID Taffoni Giuliano INAF - OATS. Overview. What is a G-DSE An overview of the GDSE Some practice. People: Edgardo Amborsi Giuliano Taffoni Andrea Barisani Claudio Vuerli Antonia Ghiselli. The Database crisis.
E N D
Databases in the Grid A New Data Source Oriented CE for GRID Taffoni Giuliano INAF - OATS
Overview • What is a G-DSE • An overview of the GDSE • Some practice • People: • Edgardo Amborsi • Giuliano Taffoni • Andrea Barisani • Claudio Vuerli • Antonia Ghiselli
The Database crisis • I have a DB and I want to USE it from my GRID. • I have a number of DBs and I want to USE all of them. • Move the execution to the data and not data to the code. • Fully compliant with gLite
Grid resource definition • The Grid limit: it is able to execute binary code or shell scripts and stores files; • DB in the Grid? Extension of the existing Resource Manager of Globus for providing transparent access to heterogeneous DS and DSE
Blueprint for a Query Element • The Grid Resource Framework Layer, Information System and Data Model is extended so that a software virtual machine as a Data Source Engine becomes a valid instance for a Grid computing model. • A new Grid component(G- DSE) that enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and Resource Broker is defined • A new Grid Element, the Query Element, can be built on top of the G-DSE component.
Blueprint for a Query Element • Modify the Job Management component to access new kind of resources • Integrate the Information system with the “description” of the new resource; • Use the Grid Security Infrastructure • No modification on the client and server side: if I can submit a job I can also submit a query! • No modification on the Brokering/Workflow systems: if I can direct the CE I can direct also a QE.
Extending the Grid capabilities • Provide a proper extension of the Grid to care a new resource • Security GSI: no need to extend but to use! • First theory (Grid ASM) then…application. “A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam 2nd IEEE/ACM (CCGRID'02)
Globus G-DSE integration GIS GRAM gatekeeper MDS GRIS ldif JobManger QueryManger Ldap query plug-in Scheduler p-in Grid Providers (snmp) Query DB specific driver Pbs/LFS JobProcess QueryProcess RDBMS RDBMS
GRAM services Globus4 Integration Local Db control GRAM Adapter query “plug-in” Delegation QueryProcess GridFTP RFT Remote SE GridFTP
G-DSE Grid formalization • New Grid component: • Integrated within the Grid Information System • May be integrated in the WMS • New Grid Element on top of the G-DSE component the Query Element
query The Query Element CE code QE
QE implementation • Runs on any linux/unix flavor: GT>=2.4.3 • Backbends: any DB vendor (MySQL, Oracle, PostgreSQL etc…) + flat files • Two protocols: GRAM or WS • API: C, C++, python, Java, perl • If it works with Globus it works with G-DSE ora GRAM GDSE psql SOAP file
QE Authorization • Access control using GSI and VOMS • The certificate + roles identify the user permissions on DB Super user: crate, modify, admin, grant and revoke users…. ANYTHING!!! Standard user: select+ insert Simple user: select
QE Authorization VOMS roles and groups mapping with db user: Attribute:/vo/dbuser/ROLE=astrouser/CAPABILITY=select
More than one statement QE language • UI/QE interactions trough a STANDARD LANGUAGE • RSL(SQL) > globus-job-run g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;” -------------- | a | b | -------------- | Uno | 001 | | Due | 002 | | Tre | 003 | --------------
QE language Off line access > globus-job-submit g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;” https://g.dse.host/20001/23297/113699980234 >globus-job-status https://g.dse.host/20001/23297/113699980234 DONE >globus-job-get-output https://g.dse.host/20001/23297/113699... -------------- | a | b | -------------- | Uno | 001 | | Due | 002 | | Tre | 003 | --------------
The Information System • QE publishes its presence to the GRID • Software computing machine load and memory space etc.. • We use MIB rdms information: • More than 250 parameters … we are not using all of them!!! • rdbmsSrvInfoFinishedTransactions 1.3.6.1.2.1.39.1.6.1.2 • rdbmsSrvInfoDiskReads 1.3.6.1.2.1.39.1.6.1.3 • rdbmsSrvInfoLogicalReads 1.3.6.1.2.1.39.1.6.1.4 • rdbmsSrvInfoDiskWrites 1.3.6.1.2.1.39.1.6.1.5 • Based on snmp or direct access.
gLite implementation • GRAM + site bdii + top BDII • Based on information provides • Static information • Dynamic information Dynamic providers Static providers snmpquery ODBCquery ldif snmp snmp snmp odbc odbc odbc ORACLE MYSQL POSTGRESQL
QE BDII > ldapsearch -LLL -x -H g.dse.host -b "mds-vo-name=site,o=grid” dn:GlueDSEUniqueID=g.dse.host:2119/dbmanager-ODBC, mds-vo-name=local,o=grid objectClass: GlueCETop objectClass: GlueCE objectClass: GlueDSE objectClass: GlueDSETop objectClass: GlueKey GlueDSEName: TESTDB GlueDSEStateStatus: Production GlueDSEInfoLRMSType: Postgresql GlueDSEInfoLRMSVersion:7.3
QE and the WMS • New job wrapper for dbmanager QueryManger gatekeeper query plug-in Query DB specific driver RB QueryProcess QueryWrapper RDBMS
An Example Type = "Job"; JobType = "Normal"; Executable = ”select A from table;"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-xml"; RetryCount = 1; $ glite-job-submit -r gdse.oats.inaf.it:2119/dbmanager-odbc-test1 sqltest.jdl Selected Virtual Organisation name (from proxy certificate extension): inaf Connecting to host arquimedes.rediris.es, port 7772 Logging to host arquimedes.rediris.es, port 9002 ================================ edg-job-submit Success =========== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job - https://arquimedes.rediris.es:9000/75hD3nNHxbYRDAL3GmiIug The edg_jobId has been saved in the following file: /home/madrid01/jlvpjobid ========================================================================
Summing up • G-DSE supports Data Source (DS) and DSE indexing, monitoring, management and recovery through a rich set of Meta-Data bound to standard GIS. • DS have their core engine into G-DSE, that provides a framework for activity and task management. • A RSL/JDL Transaction/Query permits a number of tasks to be specified, together with their parameters, inputs, outputs and control flow. • The response to a request is generated by the GDSE within a JobQueryManager Session. The GDSE analyses incoming Task and conducts authentication and authorisation • The standard Grid WorkLoad Manager constructs an optimised execution graph. • GIS will monitor a DS’s and DSE’s status digest produced by its internal monitor. • The GDSE has been designed to support dynamic configuration, sessions, transactions, recovery and concurrency.
End of Presentation Thank you for your attention