1 / 11

SAM for D0 - a Fully Distributed Data Access System

This paper discusses the design and implementation of the Fully Distributed Data Access System (SAM) for the D0 Experiment at FNAL. SAM manages the access to 500 Tbytes/year of data, serving 550 scientists in 65 institutions. It utilizes a multi-tiered architecture with distributed caching and global file routing. The system ensures data reliability, scalability, and flexibility while supporting data-intensive applications. SAM is a real system serving over 100 registered users and enabling efficient data movement and analysis. The paper highlights SAM's key features, challenges, and future directions for enhancing its capabilities.

carinac
Download Presentation

SAM for D0 - a Fully Distributed Data Access System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAM for D0 - a Fully Distributed Data Access System I.Terekhov, FNAL For the SAM Project: L. Lueking, V. White, L. Carpenter, H. Schellman, I. Terekhov, J. Trumbo, M. Vranicar, S. Veseli, S. White

  2. Introduction • Sequential Access Model • The data access for the D0 Run II experiment at FNAL • 500 Tbytes/year, total 1 Pb • Raw detector data 250 KB/event, 1GB files, plus processed data • 550 scientists in 65 institutions, ++ • Data (I/O) intensive applications

  3. The Distributed nature of SAM • All the data access entities, (files, events, … resource usage rules) are in a relational database. Meta-data. • The metadata is served via CORBA IDL interfaces: • The Database Server in Python • Universally defined structures, exceptions • Possibility of alternate implementations (online system, remote installations) • Multi-tiered architecture • Hierarchical collection of servers, IDL • Pure clients at the user end

  4. Distributed Caching • User app always reads from/writes to local disk, SAM takes care of the rest. • A user pushes file into (pulls from) SAM and doesn’t know how or when the system relocates the file. • Disk allocation en route to/from MSS. • Every transfer requires authorization from the resource manager: network contention, MSS bandwidth, etc. • SAM cache managers (rather than physical machines) form a network. • Global file routing/replicating

  5. Fermilab SAM Station 6-30Tbytes Disk + Tape Store Nikhef SAM Station ~300 Gbytes Disk Analysis Tapes Central Analysis Servers MonteCarlo Data Analysis Desktop Lyon (Computer Center) SAM Station ~5Tbytes Disk

  6. Distributed Caching:file retrieval in SAM • A Station is a collection of resources (CPU, disk, network connections), possibly a cluster. • Station Master (SM) is a distributed cache manager. SM’s form a global network. • SM also runs Projects, (the activity of processing a dataset, related to, but not is, a user job in the batch system) • Projects coordinate multiple consumers each having multiple processes (threads of execution). Local network, Farm.

  7. Distributed Caching:file storage in SAM • A file storage server is the part of the station responsible for importing data into SAM (online, MC, processed) • FSS accepts user request, finds a route to the final destination. Global FSS network. • Intermediate locations are in general used because: • “Final” destination is not directly accessible • Desirable to keep on “nearby” disk for subsequent retrieval • Each interim location is part of the cache (allocation subject to local use policy, etc)!

  8. Example of Global Data Movement • IN2P3 collaborators from France import a file into SAM MSS • The file was produced on a desktop PC • Want also to keep a copy locally at Lyon for analysis • Also want a copy at D0’s central analysis station at FNAL site. • Two-three transfers. Robustness, retrial.

  9. SAM and the Grid Spirit

  10. SAM is a Real System • Over 100 registered D0 users • Online system stores calibration data • Terabytes of MC data have been imported and continues • Data was reconstructed several times on the farms (diff. Versions) • Data is being analyzed. Cache contains hundreds of GB’s of files. • By the time of detector commissioning, Spring 2001, (nearly) all the components must be functional.

  11. Summary • SAM is being developed as a fully distributed system: • Scalability • Robustness • Flexibility: implementation order is driven by D0 priorities, but general, Grid-compatible design applicable beyond D0 • Other actively worked issues, not covered: • Other MSS’s (remote institutions) • Uniform error reporting in a distributed, heterogenous system

More Related