200 likes | 338 Views
NOVA N etworked O bject-based En V ironment for A nalysis. P. Nevski, A. Vaniachine, T. Wenaus
E N D
NOVANetworked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics analysis components, adaptable to different experiments. A job configuration manager uses a scripting interface to provide web-based editing, submission and cataloguing of analysis jobs, both user-level and experiment-wide, centrally managed in a database. A client/server system distributed over compute nodes provides job submission and monitoring across facilities, which may span several sites. A file catalog records production relationship of data files generated by an experiment. NOVA provides database tools for geometry and parameter object storage. A NOVA web-based browser navigates a relational database storing hierarchically structured dataObjects. Clients may access database information from the code or through a CORBA-specified interface. NOVA components have been tested and deployed in the STAR and ATLAS environments. February 7, 2000
Outline • Goals • Requirements • Architecture • Components • Details • Summary CHEP in Padova
Motivations • Unprecedented data volume and software complexity in new large High Energy and Nuclear Physics experiments at RHIC (BNL) and LHC (CERN) • New approaches to analysis and data handling software • Distributed computing environment (DCE) is vital and increasingly powerful • Experience in developing DCE solutions for STAR • Build on experience to develop DCE tools for use in similarly challenging environments CHEP in Padova
Goals • Develop software tools for • coordination and control of widely distributed analysis development and physics analysis activity • distributed management and analysis of very large datasets • enhanced robustness, reusability and maintainability of analysis software • For application in many global computing environments (ATLAS, STAR, …) • generic tools not tied to specific implementation choices • select, templatable implementations provided such that NOVA components can be used in a baseline framework CHEP in Padova
Requirements • Support wide area data intensive analysis • Define middleware services are required to permit analysis applications to effectively run over wide area networks • Provide a rich set of features that applications can select and use to obtain the level of service they need to operate • Define the features and the API's necessary to allow the application and middleware to communicate • Integrate the middleware API's with the applications CHEP in Padova
Design Approach • Small, modular components; application-neutral interfaces • Can be used as a coherent framework or in isolation to extend existing analysis systems • Focused on support for C++ based analysis • Used for all RHIC, LHC, other large experiments • Emphasis on user participation in iterative development; real-world prototyping and testing (STAR, ATLAS) • Extensive use of existing tools and technologies • Must be readily available, true or de facto standards, well supported, widely used or showing good growth CHEP in Padova
Component-based Architecture CHEP in Padova
Tools and Technologies • Third party tools and technologies used in NOVA: • MySQL: relational database for catalogs, state information and simple objects: C-structs • Perl: Unix scripting and web development tool • Apache: customizable (Perl & PHP) web server for communication and monitoring • CORBA: low-volume interprocess data exchange • ROOT: visualization and analysis tools CHEP in Padova
Components NOVA components fall into four domains • Regional Center • Central management and execution of analysis • Remote Client • Mobile Analysis • Middleware Components • Data exchange and navigation tools • Client/Server object request brokerage • Data Management • Data repository, catalogue, and interface • Data model for simple objects (C-structs) CHEP in Padova
Dynamic Binding • Problem: • A user has a new idea that was not foreseen at the beginning. User modifies the structure of one object in his application. Application stores new objects in the database. • Remote applications unaware of a new functionality may request objects in old format. • Solution: • Application: provides metadata request (name, time, selectors...) and the application dataObject dictionary • Database server: provides dataObject and the dictionary • Object Request Broker module: converts dataObject according to the application dictionary CHEP in Padova
Application DataObject Application Dictionary Database DataObject Database Dictionary Dynamic Object Broker Object Request Broker Middleware Services Remote Application Clients Parameters Repository Central Database Server CHEP in Padova
Forward Compatibility • Benefits: • Separation of database and analysis applications • Robust interface (via built-in type checking) • Dictionary built from C-header files or IDL-files • Database access is independent of application code version: user can read new dataObjects with an old executable • Usage: • Parameters data management (versioned geometry and reconstruction constants support) CHEP in Padova
Static Binding • Problem: • Remote application (web browser) navigates current database hierarchy. • Solution: • Object Request Broker at the Regional Center serves dynamic HTML dataObjects in format tailored according to application ID: Netscape or MS Internet Explorer CHEP in Padova
Static Object Broker NOVA Browser Application DataObject Apache Web Server Application ID Database API Module Database DataObject Middleware Services Database API Call Parameters Repository Regional Center Database Server Remote Application Client CHEP in Padova
Layered Interface CHEP in Padova
Data Model CHEP in Padova
Job configuration manager Cataloguing Analysis Workflow Job monitoring system fileCatalog CHEP in Padova
GCA Interface gcaClient FileCatalog IndexFeeder Grand Challenge Interface GC System STAR Components Query Estimator StIOMaker Query Monitor database fileCatalog Cache Manager tagDB Index Builder CHEP in Padova
& GCA-dependent • gcaClient interface • Experiment sends queries and get back filenames through the gcaClient library calls Limiting Dependencies Experiment-specific • IndexFeeder server • IndexFeeder read the “tag database” so that GCA “index builder” can create index • FileCatalog server • FileCatalog queries the “file catalog” database of the experiment to translate fileID to HPSS & disk path CHEP in Padova
Summary What is NOVA? • Framework components for distributed computing What are NOVA components? • Configuration manager for analysis jobs • Distributed job submission and monitoring system • Analysis workflow catalog • Database for versioned dataObjects • Brokered extraction of dataObjects • Web-based database navigation tool CHEP in Padova