SAM for D0 - a Fully Distributed Data Access System

SAM for D0 - a Fully Distributed Data Access System I.Terekhov, FNAL For the SAM Project: L. Lueking, V. White, L. Carpenter, H. Schellman, I. Terekhov, J. Trumbo, M. Vranicar, S. Veseli, S. White

Introduction • Sequential Access Model • The data access for the D0 Run II experiment at FNAL • 500 Tbytes/year, total 1 Pb • Raw detector data 250 KB/event, 1GB files, plus processed data • 550 scientists in 65 institutions, ++ • Data (I/O) intensive applications

The Distributed nature of SAM • All the data access entities, (files, events, … resource usage rules) are in a relational database. Meta-data. • The metadata is served via CORBA IDL interfaces: • The Database Server in Python • Universally defined structures, exceptions • Possibility of alternate implementations (online system, remote installations) • Multi-tiered architecture • Hierarchical collection of servers, IDL • Pure clients at the user end

Distributed Caching • User app always reads from/writes to local disk, SAM takes care of the rest. • A user pushes file into (pulls from) SAM and doesn’t know how or when the system relocates the file. • Disk allocation en route to/from MSS. • Every transfer requires authorization from the resource manager: network contention, MSS bandwidth, etc. • SAM cache managers (rather than physical machines) form a network. • Global file routing/replicating

Fermilab SAM Station 6-30Tbytes Disk + Tape Store Nikhef SAM Station ~300 Gbytes Disk Analysis Tapes Central Analysis Servers MonteCarlo Data Analysis Desktop Lyon (Computer Center) SAM Station ~5Tbytes Disk

Distributed Caching:file retrieval in SAM • A Station is a collection of resources (CPU, disk, network connections), possibly a cluster. • Station Master (SM) is a distributed cache manager. SM’s form a global network. • SM also runs Projects, (the activity of processing a dataset, related to, but not is, a user job in the batch system) • Projects coordinate multiple consumers each having multiple processes (threads of execution). Local network, Farm.

Distributed Caching:file storage in SAM • A file storage server is the part of the station responsible for importing data into SAM (online, MC, processed) • FSS accepts user request, finds a route to the final destination. Global FSS network. • Intermediate locations are in general used because: • “Final” destination is not directly accessible • Desirable to keep on “nearby” disk for subsequent retrieval • Each interim location is part of the cache (allocation subject to local use policy, etc)!

Example of Global Data Movement • IN2P3 collaborators from France import a file into SAM MSS • The file was produced on a desktop PC • Want also to keep a copy locally at Lyon for analysis • Also want a copy at D0’s central analysis station at FNAL site. • Two-three transfers. Robustness, retrial.

SAM and the Grid Spirit

SAM is a Real System • Over 100 registered D0 users • Online system stores calibration data • Terabytes of MC data have been imported and continues • Data was reconstructed several times on the farms (diff. Versions) • Data is being analyzed. Cache contains hundreds of GB’s of files. • By the time of detector commissioning, Spring 2001, (nearly) all the components must be functional.

Summary • SAM is being developed as a fully distributed system: • Scalability • Robustness • Flexibility: implementation order is driven by D0 priorities, but general, Grid-compatible design applicable beyond D0 • Other actively worked issues, not covered: • Other MSS’s (remote institutions) • Uniform error reporting in a distributed, heterogenous system

SAM for D0 - a Fully Distributed Data Access System

SAM for D0 - a Fully Distributed Data Access System

Presentation Transcript

Emerging Tools for Distributed Data Access and Collaborations

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Bigtable : A Distributed Storage System for Structured Data

Google Bigtable A Distributed Storage System for Structured Data

PeerDB: A P2P-based System for Distributed Data Sharing

Distributed Data Access and Resource Management in the D0 SAM System

The Data Access Layer for D0 Run II Design and Features of SAM

NDG Security: Distributed Governance, Distributed Access Control, Distributed Data.

A Fully Distributed, Fault-Tolerant Data Warehousing System

D0 Muon System

“A Service-enabled Access Control Model for Distributed Data”

Bigtable : A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data

SAM and D0 Grid Computing

Bigtable : A Distributed Storage System for Structured Data

D0 SAM – status and needs

PeerDB: A P2P-based System for Distributed Data Sharing