Data Access Models: Can remote groups compete?

IFAE-Atlas WorkshopDec 21st 2005 Data Access Models: Can remote groupscompete? 1 ATLAS raw event size will be 1.6 MB + Raw events will flow from the Event Filter at a rate of 200 Hz = ATLAS will produce 320 MB/s = 27.6 TB/day Andreu Pacheco / IFAE Release 6

Some questions from Ilya • Will grid work? really? when? • Will PIC be of any help for the IFAE-Atlas group? • Should the IFAE-Atlas group deploy additional storage and processing? • Should we all go to CERN instead? • What data access model will we use?

Minimal introductionto the grid • The grid we plan to use today is a galaxy of linux clusters. • In order for a linux cluster to join the grid a set of software packages and configuration files must be installed. • In our case and at Dec 2005 the set of software packages is known as “LCG version 2.6”

Components of alinux cluster • User interfaces – Desktops and public login machines. • Computer Element – Server holding the batch queue manager. • Worker nodes – Server executing the batch jobs • Storage Elements – Disk servers.

Common grid components for an experiment • VOMS Server – Authorizes users to become members of the project or experiment and use its resources. • Resource Broker – Manages the generic batch queues pointing to several computer elements. • Catalog Server – Maintains a common directory catalog for all data files tracking data locations and replicas.

Common components for a country • Certification Authority Server – Needed to obtain the digital certificate and to maintain the list of valid certificates. • Grid Monitoring Server – Needed to check that all clusters are functionally working. • User support server – Needed to follow user problems.

What Atlas users can do now with the grid? • Get a digital certificate. • Register to an experiment (Atlas) • Store, retrieve and replicate data files. • Register data files in the Atlas catalog. • Submit jobs to a specific site or to all free worker nodes of the sites. • Use a tool for submitting and trace a massive number of jobs.

Current issues with the grid • The commands and instructions how to use the grid will change. • The stability of grid services is difficult to achieve with young code and a geographically very distributed set of computing resources. • Procedures for adding users, security, operations,... are incipient. • Training and support.

LCG-Atlas Computing Model • Several years ago it was clear that the computing infrastructure required for the LHC experiments did not exist and had to be created. • Four types of roles for computing facilities: Tier-0, Tier-1, Tier-2 and Tier-3. • At IFAE all the computing effort was focused to create the PIC (3 years) as the Spanish Tier-1.

Simplest Tier definition • Tier-0 is CERN. The big one. • The Tier-1’s are the distributed storage facilities for experimental and Monte Carlo data with computing resources to reprocess and calibrate data. • The Tier-2’s are the distributed analysis and Monte Carlo generation facilities of the experiments open to Atlas users. • The Tier-3’s are the private computing resources of the IFAE-Atlas group.

Clarification on Tier-1 and Tier-2 • Tier-1: Hosting raw data, ESD, AOD and TAG datasets. Hosting Monte Carlo data produced by the Tier-2. These facilities will provide the reprocessing of the data and will run the calibration jobs. • Tier-2: Hosting of some of the AOD data and full TAG samples. These facilities will provide simulation and analysis capacity for the physics working groups.

Tier-2 Tier-1 MSU CIEMAT IFCA UB Cambridge Budapest Prague Taipei TRIUMF UAM IFIC IFAE Legnaro USC Krakow NIKHEF small centres desktops portables RAL IN2P3 FNAL CNAF FZK PIC ICEPP BNL • Tier 0 – CERN • Tier 1 – PIC (~33% for ATLAS) • Tier 2 – We have a federated one in Spain (IFIC-50%+UAM-25%+IFAE-25%) • Tier 3 – There are one at each spanish Atlas Group.

Atlas resources in Bellaterra for Atlas (from 1st Jan 2008) • Note: One Pentium IV CPU we use in the desktops is around 1.5 kSI2000, one rack server can hold 6 kSI2000 (dual-core dual-cpu).

What will be our data access model? • Data (raw and Monte Carlo) will be in the Tier-0 and Tier-1’s. • We’ll have Replicas some of the data needed in the Tier2 and Tier-3. • The MC data we will generate and reconstruct will be copied to the Tier-1’s. • We will analyze the data sending the jobs to the Tier-2 and Tier-3 from our desktops or public login machines.

What will happen with the Tier-1 computing power? • At the long term the Tier-1 batch queues will be used for large data processing jobs: re-reconstruction of data and jobs will be under control of a production manager. • At the short term the Tier-1’s are being used for distributed analysis and Monte Carlo production at low priority. This will probably change during 2006/2007.

What we have to do at IFAE? • In order to fund the Tier-2 and Tier-3 computing facilities a 2-year project (2006-2007) has been approved by the Spanish HEP program. • We need to deploy the grid at the desktops in IFAE, install a local cluster in IFAE building (tier-3) and install a grid cluster in the PIC computer room (Tier-2).

What else we have to do at IFAE? • The IFAE tier-2 and tier-3 facilities must be able to be reused for Event Filter tests. • The IFAE tier-2 and tier-3 facilities must be optimised to be used by the IFAE group working in Tilecal and the scpecific IFAE activities. • We must identify and train a good Data Manager, key to success...

Conclusions • Atlas users can use the grid now. • However, grid will not be stable after Sep 2006 as earliest. • The IFAE Atlas group must deploy their Tier-2 and Tier-3 in cooperation with PIC as Tier-1. • We should start to have talks or meetings how to use grid computing for Atlas at IFAE.

Data Access Models: Can remote groups compete?

Data Access Models: Can remote groups compete?

Presentation Transcript

QuickBooks Hosting| QuickBooks Remote | QuickBooks Terminal

SMS Remote control Tool - Unleashed

70-270, 70-290 MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003

VB .NET Database Access

Anonymized Data: Generation, Models, Usage

Social Sub-groups

Data Warehousing: Data Models and OLAP operations

Lattice-Based Access Control Models

Lesson 11-Remote Access

General Structural Equation (LISREL) Models Week 4 #1

Econometric Analysis of Panel Data

ACCESS TO SUCCESS IN AMERICA: What do the data tell us?

CMSC424: Database Design

Smoothing N-gram Language Models

CMSC424: Database Design

Micro Data For Macro Models

Data Security E2002, Lecture 1 August 30, 2002

Securing Access for Remote Users and Networks

Relational Databases

Implementing Continuous Process Improvement in Our Schools

BASICS OF REMOTE SENSING

RESULTS FROM THE DATA ACQUISITION IN SOME GIG-LM RESERVOIRS BY SATELLITE REMOTE SENSING