1 / 20

The Sequential Access Model for Run II Data Management and Delivery

The Sequential Access Model for Run II Data Management and Delivery. Lee Lueking , Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998.

darva
Download Presentation

The Sequential Access Model for Run II Data Management and Delivery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998

  2. What is The Sequential Access Model: SAM? • Sequential events: Data is stored in files as sequential events. • Data Tiers: Each event is stored in each of several data tiers. • The Event Data Unit (EDU) is the unit of data stored in each tier. • Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera. • Physical streaming (clustering): Data categories based on Trigger or reconstruction information • Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.

  3. Data Organization User and physics group (derived) data File & Event Database Event Information Tiers Warm Cache Physical Clustering

  4. How Do I Access Data? • Pipelines: Data access channels tailored for particular processing and analysis patterns. • Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk. • Example access modes: • Database:Access to event, trigger & other FEDB info. • Thumbnail: Disk resident sketch of each event. • Freight Train: Large data stream file server. • Event Picking: Random event selection from any data tier. • Small Data-set:One or a few files from any data tier.

  5. Data Access Mass Storage Pipeline Consumers File&EventDB Thumbnail Freight Train Pick Event User File =Group of Users =Data flow =File =Disk Storage =Tape Storage =Pipeline Name =Single User =Event File&EventDB

  6. D0 Specifications • Data sizes • Further details • 10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information. • 10% warm (tape or disk) caches of Raw and Medium EDU data. • Possible on-demand reconstruction.

  7. Will SAM Scale to Run II?

  8. Exclusive Streaming See Talk #182: Heidi Schellman, “Assurance of Data Integrity in a Petabyte Data Sample”

  9. Data Handling System Buffer and Cache

  10. SAM Design Details • Network distributed. • Easily scalable. • Works for all access modes. • Uses CORBA interfaces between modules. • Modules being written in JAVA, Python and C++. • File, Event and Processing Database uses ORACLE 8. • Not tightly coupled to: • Tape Mass Storage System. • CPU availability or Batch processing facilities on Farm or Analysis machines. • The D0 event data model.

  11. Main Components • File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” ) • Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities. • Station: Management for a set of processing resources, including buffer and Data I/O. • Project Master: Responsible for managing projects which are lists of files to process. • Consumer/producer: Actual data processing • GUI and API user interfaces: Allow users to access data and administrators to control the system.

  12. Components of SAM Consumer/ Producer User & Admin. Interface (API and GUI) Consumer/ Producer Station F Consumer/ Producer Station A Station E Consumer/ Producer Project Master DB and Information Servers Mass Storage System Consumer/ Producer Global Optimizer Station D Station B Station C

  13. File and Event Database Run Volume Data Tier Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Files ID Name Format Size # Events Physical Data Stream Trigger Configuration Project Event-File Catalog Processing Info

  14. (Mass Storage System Needs) • Provide access to data through file-level semantics. • Manage all tape activity within the ATL(S) and to/from shelf. • Allow data to be physically clustered in tape groupings or “file families”. • A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities. • System must optimize the use of resources such as arm time and tape mounts. • Retry and fail-over features for failed tape read/write activities. • Open tape format to allow removal of tapes and exchange of data with other sites. • Reliable and unattended operation. See ENSTORE presentation #126: Don Patravic, “ENSTORE - An Alternative Data Storage System”

  15. Access to Data through SAM • User or group defines a “project” by sending a list of constraints or file list to the Database Server. • DB Server returns a summary of the project (number of files, size and availability). • User is provided a list of possible “stations” where the project might run. He chooses one. • User registers with the station for a given (new or existing) project. He is given a unique “key” to use. • User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.

  16. Consumer- Read from Storage

  17. Producer - Write to Storage

  18. SAM Prototype • Status: Being built, ready early October. • Goals: • Populate and exercise the SAM database. • Specify projects - data to be accessed for processing or analysis. • Attach to a ‘Station’ which makes files for that Project accessible. • Interface to ENSTORE - get/put files - using SAM “Global Optimizer”. • Build Analysis programs using D0 framework. • Demonstrate multiple Stations, Projects, Analysis consumers . • Testing: Further testing in fall with SAM PC test-bed. • Beta version: Plan to make MC data available through SAM late ‘98.

  19. SAM Prototype PC test-bed Example configuration Enstore Warehouse Network HUB SAM Station Servers Consumers/Producers Main Backbone To Database Server

  20. Summary • Dzero plans to use a file based Sequential Access Model for run II data access. • The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB. • A SAM prototype is being built now and will be ready in Early October. • Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system. • We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.

More Related