140 likes | 260 Views
A. Lyon, Fermilab (for the SAMGrid Team). SAMGrid Database Server. Outline. Introduction Issues Addressed with Redesign Redesign Goals New DB Server Design/Features Outstanding Issues Integration with SBIR II Concluding Remarks. Introduction: The SAMGrid System.
E N D
A. Lyon, Fermilab (for the SAMGrid Team) SAMGrid Database Server
Outline • Introduction • Issues Addressed with Redesign • Redesign Goals • New DB Server Design/Features • Outstanding Issues • Integration with SBIR II • Concluding Remarks
Introduction: The SAMGrid System • SAMGrid: general data-handling system designed to work for experiments with peta-byte sized datasets and widely distributed production/analysis facilities • Offers a wide variety of services, including those for: • data transfer, storage and management • process bookkeeping on distributed systems • Used by D0 and CDF, being tested for use by MINOS and CMS
Introduction: DB Server Role/Usage • SAMGrid uses central Oracle RDBMS • Most of the communication with the DB handled by the CORBA-based DB Server • Services provided: • Cataloguing services (file metadata, event catalog, replica catalog • Dataset services • Process accounting • Runtime support for the SAMGrid station services • Usage: About 250 million DB queries over the recent 3 month period
Issues Addressed with Redesign • Large code base: more than 27000 lines of python code, 350 CORBA IDL methods implemented – more than 60% obsolete • Single threaded code => performance issues • Removing/modifying old code is very difficult => maintenance problems, hard to adapt to the DB schema changes (the latest change resulted from the CDF adoption of the SAMGrid system was very complex)
Redesign Goals • Update treatment of file metadata, align it with the latest DB schema • Improve code maintainability • Easier new development • Improve server performance
New DB Server Design/Features • DB Server Generator • taken from the old infrastructure • handles automatic generation of the core (DB-derived) classes – each of those correspond to one table in the DB • CORBA wrapper classes: layer of code on top of the ORB-generated structs end exceptions with the purpose of shielding developers from having to manipulate those structs/exceptions directly • promote code maintainability/re-use (e.g., SAMGrid python API uses the same code as the Db Server) • easier development
New DB Server Design/Features • CORBA interfaces • redesigned and reorganized so that they closely match services which the DB server provides • File metadata • described as dictionaries • each file type has a certain set of required parameters • flexible/configurable system • Multithreading • should minimize performance problems
Outstanding Issues • Impact of the new CORBA infrastructure with respect to the server performance (issue for large lists) • We have not completely finished transferring all functionality of the existing code into the new server
Deployment Path • Major changes in the core software component => deployment into production is not easy • Upgrade will be incremental, so that its impact on both users and the DH system itself should be minimal. • Plan for deployment in three phases • Upgrade both experiments DB to the latest schema (completed in June ’04, required patching of the old code) • Deploy new db server in parallel to the old one, install new clients, start testing (ongoing now) • Start gradually upgrading main production stations
Integration with SBIR II • SBIR II strives to provide access to distributed databases with a single query • This would remove the SAMGrid dependence on the centralized DB • We are working on interfaces which will allow us to plug different query mechanisms into our code
Concluding Remarks • SAMGrid DB Server, one of the most critical components of the system, was completely re-designed • New architecture promotes code maintainability, easier development, and better performance • New treatment of the file metadata: flexible and configurable • Deployment into production and necessary system upgrades will be done incrementally to minimize impact on users/DH system