220 likes | 334 Views
Distributed Data Access and Resource Management in the D0 SAM System. Terekhov, Fermi National Accelerator Laboratory, for the SAM project: L.Carpenter, L.Lueking, C.Moore, J.Trumbo, S.Veseli, M.Vranicar, S.White, V.White. Plan of Attack. The domain D0 overview and applications
E N D
Distributed Data Access and Resource Management in the D0 SAM System Terekhov, Fermi National Accelerator Laboratory,for the SAM project:L.Carpenter, L.Lueking, C.Moore, J.Trumbo, S.Veseli, M.Vranicar, S.White, V.White
Plan of Attack • The domain • D0 overview and applications • SAM as a Data Grid • Metadata • File replication • Initial resource management • SAM and generic Grid technologies • Comprehensive resource management
D0: A Virtual Organization • High Energy Physics (HEP) collider experiment, multi-institutional • Collaboration of 500+ scientists, 72+ institutions, 18+ countries • Physicists generate, analyze data • Coordinated resource sharing (networks, MSS, etc) for common problem (physics analysis) solving
Applications and Data Intensity • Real data taking from the detector • Monte-Carlo data simulation • Reconstruction • Analysis • The gist of experimental HEP • Extremely I/O intensive • Recurrent processing of datasets: caching highly beneficial
Data Handling as the Core of D0 Meta-Computing • HEP Applications are data-intensive (see below) • Computational Economy is extremely data-centric b/c costs are driven by DH resources • SAM: primarily and historically a DH system: a working Data Grid prototype • Job control inclusion is in the Grid context (the D0-PPDG project)
SAM as a Data Grid High Level Services Replication Cost Estimation Replica Selection Data Replication Comprehensive Resource Management Generic Grid Services Core Services Resource Management Mass Storage Systems Metadata (External to SAM) Based on: A.Chervenak, I.Foster, C. Kesselman, C.Salisbury, S.Tuecke, The Data Grid: Towards an Architecture for the Distributed Management And Analysis of Large Scientific Datasets, To appear in Journal of Network and Computer Applications
Standard Grid Metadata • Application metadata • creation info and processing history • data types (tiers, streams, etc D0-specific) • files are self-describing • Replica metadata • each file has zero or more locations • volume ID’s and location details for RM - part of the interface with Mass Storage System
Standard Grid Metadata, cont’d • System Configuration Metadata • HW configuration: locations and capacities of disks and tapes (network and disk bandwidths) • resource ownership and allocation: • partition of disk, MSS bandwidths, etc by group • fair shares parameters for resource allocation and job scheduling (FSAS) • cost criteria (weight factors) for FSAS
Advanced Metadata • Dataset management (to a great advantage of the user) • Job history (crash recovery mechanisms) • File replica access history (used by RM) • Resource utilization history (persistency in RM and accountability) • See our complete data model for more details
Data Replica Management • Processing Station is a (locally distributed, semi-autonomous) collection of HW resources (disk, CPU, etc). A SW component • Local data replication for parallel processing in a single batch system - within a Station • Global data replication - worldwide data exchange among Stations and MSS’s
Local Data Replication • consider a cluster, physically distributed disk cache • logical partitioning by research groups • each group executes independent cache replacement algorithm (FIFO, LRU, many flavors) • Replica catalog is updated in the course of the cache replacement • Access history of each local replica is maintained persistently in the MD
Local Data Replication, cont’d • While Resource Managers strive to have jobs and their data being in proximity (see below), the Batch System does not always dispatch jobs wherever the data lies • Station executes intra-cluster data replication on demand, fully user-transparently
User (producer) Routing + Caching = Global Replication Mass Storage System Station Site Replica WAN Data flow
Principles of Resource Management • Implement experiment policies on prioritization and fair sharing in resource usage, by user categories (access modes, research group etc) • Maximize throughput in terms of real work done (i.e. user jobs and not system internal jobs such as data transfers)
Fair Sharing • Allocation of resources and scheduling of jobs • The goal is to ensure that, in a busy environment, each abstract user gets a fixed share of “resources” or gets a fixed share of “work” done
FS and Computational Economy • Jobs, when executed, incur costs (through resource utilization) and realize benefits (through getting work done) • Maintain a tuple (vector) of cumulative costs/benefits for each abstract user and compare them to his allocated fair share to set priority higher/lower • Incorporated all known resource types and benefit metrics, totally flexible
The Hierarchy of Resource Managers Sites Connected by WAN Global RM Experiment Policies, Fair Share Allocations, Cost Metrics Stations And MSS’s Connected By LANs Site RM Batch queues and disks Station – Local RM
Job Control: Station Integration with the Abstract Batch System Sam submit Job Manager (Project Master) Local RM (Station Master) invoke Client jobEnd submit setJobCount/stop Sam condition satisfied Process Manager (SAM wrapper script) Batch System User Task dispatch invoke resubmit
Cached data, File transfer queues, Site RM weather conditions SAM as a Data Grid High Level Services Replication Cost Estimation DH-Batch system integration, Fair Share Allocation, MSS access control Network access control Replica Selection Preferred locations Caching, Forwarding, Pinning Data Replication Comprehensive Resource Management Core Services Mass Storage Systems Metadata Resource Management Replica catalog, System configuration, Cost/Benefit metrics (External to SAM) Batch System internal RM MSS internal RM (External to SAM)
SAM Grid Work (D0-PPDG) • Enhance the system by adding Grid services (Grid authentication, replica selection, etc) • Adapt the system to generic Grid services • Replace proprietary tools and internal protocols with those standard to the Grid • Collaborate with Computer Scientists to develop new Grid technologies, use SAM as a testbed for testing/validating them
Initial PPDG Work: Condor/SAM Job Scheduling, Preliminary Architecture Job Management Condor MMS Condor Condor/SAM-Grid adapter CondorG Standard Grid Protocols Costs of job placements? Schedule Jobs SAM/Condor-Grid adapter SAM Data Management Data and DH Resources Sam submit SAM Abstract Batch System
Conclusions • D0 SAM is not only a production meta-computing system, but a functioning Data Grid prototype, with data replication and resource management being in advanced/mature stage • Work continues to fully Grid-enable the system • Some of our components/services will hopefully be of interest to the Grid community