290 likes | 419 Views
The ALICE Grid The beat of a different drum. L.Betev, P.Buncic, A.Peters, P.Saiz, S.Bagnasco, P.Mendez-Lorenzo, C.Cistoiu, C.Grigoras Presented by F.Carminati April 23, 2007 ACAT - Amsterdam. ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~ 80 Institutes.
E N D
The ALICE GridThe beat of a different drum L.Betev, P.Buncic, A.Peters, P.Saiz, S.Bagnasco, P.Mendez-Lorenzo, C.Cistoiu, C.Grigoras Presented by F.Carminati April 23, 2007 ACAT - Amsterdam
ALICE Collaboration • ~ 1/2 ATLAS, CMS, ~ 2x LHCb • ~1000 people, 30 countries, ~ 80 Institutes Total weight 10,000t Overall diameter 16.00m Overall length 25m Magnetic Field 0.4Tesla 8 kHz (160 GB/sec) level 0 - special hardware 200 Hz (4 GB/sec) level 1 - embedded processors 30 Hz (2.5 GB/sec) level 2 - PCs 30 Hz (1.25 GB/sec) data recording & offline analysis fca @ ACAT07
Start The ALICE Grid (AliEn) • There are millions lines of code in OS dealing with GRID issues • Why not using them to build the minimal GRID that does the job? • Fast development of a prototype, can restart from scratch etc etc • Hundreds of users and developers • Immediate adoption of emerging standards • AliEn by ALICE (5% of code developed, 95% imported) 2001 2002 2003 2004 2005 2006 2007 20% Data Challenge WLCG integration 10% Data Challenge Physics Performance Report (mixing & reconstruction) First production (distributed simulation) Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis fca @ ACAT07
GAPI WM DM TQ FTQ ACE PM FC JW (JA) CE SE L JC L RC CR (LSF,..) SRM Middleware Services in AliEn gLite Middleware Services API GAS Grid Access Service WM Workload Mgmt DM Data Mgmt RB Resource Broker TQ Task Queue FPS File Placement Service FQ File Transfer Queue PM Package Manager ACE AliEn CE (pull) FC File Catalogue JW Job Wrapper JA Job Agent LRC Local Replica Catalogue ? Local Job Catalogue SE Storage Element CE Computing Element SRM Storage Resource Mgr CR Computing Resource (LSF, PBS,…) AliEn Exp specific services (AliEn for ALICE) Exp specific services EGEE, ARC, OSG… AliEn arch + LCG code EGEE LCG EDG fca @ ACAT07
Design criteria • Minimize intrusiveness • Limit the impact on the host computer centres • Use delegation • Where possible acquire “capability” to perform operation, no need to verify operation mode at each step • Centralise information • Minimise the need to “synchronise” information sources • Decentralise decisions • Minimise interactions and avoid bottlenecks • Virtualise resources • Automatise operations • Provide extensive monitoring fca @ ACAT07
Submits job User ALICE central services Site Registers output Yes No Asks work-load Close SE’s & Software Matchmaking Updates TQ Receives work-load Sends job result Retrieves workload packman Submits job agent Sends job agent to site Job submission in LCG VO-Box LCG ALICE Job Catalogue ALICE File Catalogue User Job ALICE catalogues Optimizer Env OK? Execs agent Die with grace CE WN Computing Agent RB fca @ ACAT07
Status of the VOBOX, ALICE and WLCG services are monitored through ML Sites are encouraged to check the status through these pages Alarm system established VO-Box monitoring • Standard SAM tests to check LCG services availability are incorporated in the VO-box • Available to Grid Support and ALICE (via ML) fca @ ACAT07
Job submission • Minimize intrusiveness • Job submission is realised using existing Grid MW if possible or directly to CE otherwise • Centralise information • Jobs are held in a single central queue handling priorities, and quotas • Decentralise decisions • Sites decides which jobs to “pull” • Virtualise resources • Job agents are run to providing a standard environment (job wrapper) across different systems • Automatise operations • Provide extensive monitoring fca @ ACAT07
The AliEn FC • Hierarchical structure (like a UNIX File system) • Designed in 2001 • Provides mapping from LFN to PFN • Built on top of several distributed databases • Possible to add another database • Possible to move directories to another table • Transparent for the end user • Metadata catalogue on the LFN • Triggers • GUID to PFN mapping in the central catalogue • No “local catalogue” • Possibility of automatic PFN construction • Store only the GUID and Storage Index and the SE builds the PFN from the GUID • Two independent catalogues: LFN->GUID and GUID->PFN • Possible to add databases to one or the other • We could drop LFN->GUID mapping if not used anymore fca @ ACAT07
Benchmarks • Users: • Register their files in their home directories • PackMan • Definition of the packages (VO & user) • Production user • Register data • AliEn TaskQueue: • Register the output of the jobs • Tests done on: • Dual Pentium CPU 3.4 GHz • 3.2 GB RAM • DB, writers, reader and soap servers running on the same machine Insertion fca @ ACAT07
Other features • Size • LFN tables: 130 bytes/entry • GUID: 300 (Innodb), 210 (MyISAM), 120 (no PFN) • Binary log files: 1000 bytes/entry! • Needed for database replication • Automatically cleaned by mysql • The current database could contain 7.5 billion entries! • Two QoS for SE • Custodial: File has low probability of disappearing • Replica: File has high probability of disappearing • User specifies QoS when registering a file • Still to do: quotas • Entries in the LFN catalogue can have expiration time • The entry will disappear regardless of QoS of SE and is removed from storage • A GUID not referenced by any LFN will also disappear fca @ ACAT07
File Catalogue v2-13 LFN->GUID GUID->PFN Index Index LFN Catalogue GUID Catalogue fca @ ACAT07
VOBOX::SA xrootd (manager) Storage strategy Disk DPM SRM Being deployed Available xrootd (worker) SRM xrootd (worker) Castor SRM Prototype being validated WN xrootd (worker) MSS dCache SRM DPM, CASTOR, dCache are LCG-developed SEs xrootd emulation (worker) Being deployed MSS fca @ ACAT07
Xrootd architecture Redirector (Head Node) A open file X Redirectors Cache file location go to C Who has file X? 2nd open X B go to C I have open file X C Client Data Servers Cluster Client sees all servers as xrootd data servers fca @ ACAT07
xrootd serving several VO’s GSI auth ALICE catalogue priv key Catalogue auth pub key proxy sec env proxy xrootd server sec env client fca @ ACAT07
When available Index builder Bitmap Index Selection List of ev#guid’s Tag architecture Reconstruction Selection GRID/PROOF#1 List of ev#guid’s guid#{ev1…evn} GRID/PROOF#2 guid#{ev1…evn} GRID/PROOF#3 guid#{ev1…evn} … Analysis job GRID/PROOF#N guid#{ev1…evn} Interactive Job#1 Batch Job#2 Job#3 fca @ ACAT07 … Job#N
How to select data • A dataset list is created via queries to the metadata • Key/value pairs • Run, file, and tag MD • Run Meta Data • Stored as (Directory) Meta Data in the File Catalogue • Contains parameters describing conditions during the run • File Meta Data • No physics information • Sanity, permission & location of Files fca @ ACAT07
Output file 1 Distributed analysis File Catalogue query User job (many events) Data set (ESD’s, AOD’s) Job output Job Optimizer Grouped by SE files location Sub-job 1 Sub-job 2 Sub-job n Job Broker Submit to CE with closest SE CE and SE CE and SE CE and SE processing processing processing processing processing Output file 2 Output file n File merging job fca @ ACAT07
Grid data challenge - PDC’06 • The longest running Data Challenge in ALICE • A comprehensive test of the ALICE Computing model • Running already for 9 months non-stop: approaching data taking regime of operation • Participating: 55 computing centres on 4 continents: 6 Tier 1s, 49 T2s • 7MSI2k • hours 1500 CPUs running continuously • 685K Grid jobs total • 530K production • 53K DAQ • 102K user !!! • 40M evts, 0.5PB generated, reconstructed and stored • User analysis ongoing • FTS tests T0->T1 Sep-Dec • Design goal 300MB/s reached but not maintained • 0.7PB DAQ data registered fca @ ACAT07
AliEn CE AliEn CE Cluster Monitor Cluster Monitor AliEn IS AliEn Optimizers AliEn Job Agent AliEn Job Agent AliEn Brokers ApMon ApMon AliEn TQ ApMon ApMon ApMon ApMon AliEn SE AliEn SE ApMon ApMon ApMon ApMon MySQL Servers ApMon ApMon ApMon CastorGrid Scripts AliEn Job Agent AliEn Job Agent AliEn Job Agent AliEn Job Agent ApMon ApMon ApMon ApMon ApMon API Services ApMon MonALISA LCG Site MonALISA @CERN MonALISA @Site Monitoring, monitoring, monitoring… http://pcalimonitor.cern.ch:8889/ job slots net In/out run time cpu time free space processes load jobs status vsz sockets rss migrated mbytes active sessions Aggregated Data nr. of files open files Queued JobAgents job status MonaLisa Repository cpu ksi2k Long History DB disk used MyProxy status fca @ ACAT07 LCG Tools
Back to the future… But now… Memory and disk space is cheap Virtual Machines running on commodity hardware on Open Source OS are promising to deliver what we lost some time ago Why? The infrastructure can evolve independently from the application Now we can Start, Stop, Pause, Migrate VM Software running inside a VM can not affect the environment Perfect process and file sandboxing (re)use a lot of code which was previously is system/kernel domain IBM-VM 360 mainframe, 1988 • Once upon a time… • statically linking, running in a VM • prefect isolation! • Then, things changed.. • Unix, PC, commodity computing, shared libraries, dynamical linking, plugins • Fuzzy application boundary! fca @ ACAT07
Virtual Appliances Example rPath: Software Appliance Company • Virtual Software Appliance = Application + Virtual Machine + Simple UI that combines • Minimal operating environment • Specialized application functionality • Designed to run under various virtualization technologies • VMware , Xen, Parallels, Microsoft Virtual PC, QEMU, User mode Linux, CoLinux, Virtual Iron… • Allieviate the deployment in a traditional server environment • Complex configuration • Maintenance fca @ ACAT07
AliEn External Dependencies busybox (system tools) ggbox System devices Kernel Practical exercise: AliEn Appliance + + = Grid Appliance fca @ ACAT07
AliEnX • AliEn Linux – minimal guest OS capable of running AliEn services and hosting Grid applications • http://alien.cern.ch/twiki/bin/view/AliEnX • http://alien.rpath.org • Built using rPath tools (rBuilder and Conary package manager) • AliEn Appliance Version 0.4 • x86 Mountable Filesystem (Xen Virtual Appliance) • x86_64 Mountable Filesystem (Xen Virtual Appliance) • x86 VMware (R) ESX Server Virtual Appliance • x86 Installable CD/DVD • x86_64 Parallels, QEMU (Raw Hard Disk) • x86 Parallels, QEMU (Raw Hard Disk) • Already usable as User Interface • Generic, can be customized for other purposes • To do: Run Grid Jobs in, VM 3 GHz Pentium D, 1GB RAM, AliRoot fca @ ACAT07
Use cases for Virtual Machines ? • Grid • Sandbox environment for job execution on WN • Enhanced site security • VO box • Enhanced Scalability • User Interfaces • Separation of Grid and system environment • Reducing Grid initiation threshold • Specialized environments • PROOF/CAF • process migration • kernel modules to enable fancy user space file systems • P2P like object sharing and caching • Training setups • Make sure that everyone has the same environment when they walk in training room • Testing environments • Easy to setup, saving time and money fca @ ACAT07
A cloud over the Grid? fca @ ACAT07 http://www.rpath.com/corp/amazon.html
Conclusions • AliEn has allowed ALICE to exploit its distributed computing resources achieving different objectives, potentially contradictory • Make maximum usage of the existing Grid MW • A stable and uniform environment for processing and analysing ALICE data • A lean environment for development and test of new technologies • The AliEn MW has been tested in production and we are confident it provides a solid framework for ALICE computing • A promising area that we are exploring now with AliEn is VM • Coming back as viable technology • Potential benefits for users and resource providers • Technology and business model are catching up fast • They may not solve all our problems, but they can make solutions faster and easier fca @ ACAT07