270 likes | 422 Views
The High Energy Physics Community Grid Project Inside D-Grid ACAT 07 Torsten Harenberg - University of Wuppertal harenberg@physik.uni-wuppertal.de. D-Grid organisational structure. Communities. User API. Portal (GridSphere based). UNICORE. GAT API. Grid services. D-Grid Services.
E N D
The High Energy PhysicsCommunity Grid Project Inside D-Grid ACAT 07Torsten Harenberg - University of Wuppertal harenberg@physik.uni-wuppertal.de
Communities User API Portal (GridSphere based) UNICORE GAT API Grid services D-Grid Services Nutzer LCG/gLite Scheduling und Workflow Management Monitoring Accounting undBilling Globus Toolkit V4 Core services Security and VOmanagement Data management I/O D-Grid resources Distributed computing resources Distributed data services Daten/ Software network technical infrastructure
Mar-Sep pp run Okt. HI run LCG R&D WLCG Ramp-up ... 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 HEP Grid effords since 2001 today EDG EGEE EGEE 2 EGEE 3 ? GridKa / GGUS DGI DGI 2 ??? D-Grid Initiative ??? HEP CG
LHC Groups in Deutschland • Alice: Darmstadt, Frankfurt, Heidelberg, Münster • ATLAS: Berlin, Bonn, Dortmund, Dresden, Freiburg, Gießen, Heidelberg, Mainz, Mannheim, München, Siegen, Wuppertal • CMS: Aachen, Hamburg, Karlsruhe • LHCb: Heidelberg, Dortmund
German HEP institutes participating in WLCG • WLCG:Karlsruhe (GridKa & Uni), DESY, GSI, München, Aachen, Wuppertal, Münster, Dortmund, Freiburg
HEP CG participants: • Participants: Uni Dortmund, TU Dresden, LMU München, Uni Siegen, Uni Wuppertal, DESY (Hamburg & Zeuthen), GSI • Associated partners: Uni Mainz, HU Berlin, MPI f. Physik München, LRZ München, Uni Karlsruhe, MPI Heidelberg, RZ Garching, John von Neumann Institut für Computing, FZ Karlsruhe, Uni Freiburg, Konrad-Zuse-Zentrum Berlin
HEP Community Grid • WP 1: Data management (dCache) • WP 2: Job Monitoring and user support • WP 3: distributed data analysis (ganga) • ==> Joint venture between physics and computer science
WP 1: Data managementcoordination: Patrick Fuhrmann • An extensible metadata catalogue for semantical data access: • Central service for gauge theory • DESY, Humboldt Uni, NIC, ZIB • A scaleable storage element: • Using dCache on multi-scale installations. • DESY, Uni Dortmund E5, FZK, Uni Freiburg • Optimized job scheduling in data intensive applications: • Data and CPU Co-scheduling • Uni Dortmund CEI & E5
WP 1: Highlights • Establishing a metadata catalogue for the gauge theory • Production service of a metadata catalogue with > 80.000 documents. • Tools to be used in conjunction with LCG data grid • Well established in international collaboration • http://www-zeuthen.desy.de/latfor/ldg/ • Advancements in data management with new functionality • dCache could become quasi standard in WLCG • Good documentation and automatic installation procedure helps to provide useability for small Tier-3 installations up to Tier-1 sites. • High troughput for large data streams, optimization on quality and load of disk storage systems, giving high performant access to tape systems
dCache.ORG dCache based scaleable storage element - thousands of pools - >> PB Disk Storage - >> 100 File transfers/ sec - < 2 FTEs • dCache project well established • New since HEP CG: • Professional product management, i.e. code versioning, packaging, user support and test suits. - single host - ~ 10 TeraBytes - Zero Maintenance
dCache.ORG P dCache Controller protocol Engines Information Prot. Backend Tape Storage Managed Disk Storage Storage Control SRM EIS Streaming Data HSM Adapter (gsi)FTP http(g) Posix I/O xRoot dCap dCache: principle
OUT- SITE IN - SITE Information System Firewall File Transfer Service Storage Element Compute Element Storage Resource Manager Protocol SRM FTS Channels gsiFtp dCap/rfio/root gsiFtp dCache: connection to the Grid world
dCache: achieved goals • Development of the xRoot protocol for distributed analysis • Small sites: automatic installation and configuration (dCache in 10mins) • Large sites (> 1 Petabyte): • Partitioning of large systems. • Transfer optimization from / to tape systems • Automatic file replication (freely configurable)
dCache: Outlook • Current usage • 7 Tier I centres with up to 900 Tbytes on disk (pre center) plus tape system. (Karlsruhe, Lyon, RAL, Amsterdam, FermiLab, Brookhaven, Nordu Grid) • ~ 30 Tier II centres, including all US CMS in USA, planned for US ATLAS. • Planned usage • dCache is going to be included in the Virtual Data Toolkit (VDT) of the Open Science Grid: proposed storage element in the USA. • Planned US Tier I will break the 2 PB boundary end of the year.
HEP Community Grid • WP 1: Data management (dCache) • WP 2: Job Monitoring and user support • WP 3: distributed data analysis (ganga) • ==> Joint venture between physics and computer science
WP 2: job monitoring and user support co-ordination: Peter Mättig (Wuppertal) • Job monitoring- and resource usage visualizer • TU Dresden • Expert system classifying job failures: • Uni Wuppertal, FZK, FH Köln, FH Niederrhein • Job online steering: • Uni Siegen
Job Execution Monitor in LCG • Motivation • 1000s of jobs each day in LCG • Job status unknown while running • Manual error detection: slow and difficult • GridICE, ...: service/hardware based monitoring • Conclusion • Monitor job while running • JEM • Automatical error detection needed • expert system
gLite/LCG Workernode Pre-execution test Bash Script monitoring Python Information exchange: R-GMA Visualization: e.g. GridSphere Experten system for classification Integration into ATLAS Integration into GGUS ? post D-Grid I: ... JEM:Job Execution Monitor
JEM - status • Monitoring part ready for use • Integration into GANGA (ATLAS/LHCb distributed analysis tool) ongoing • Connection to GGUS planned • http://www.grid.uni-wuppertal.de/jem/
HEP Community Grid • WP 1: Data management (dCache) • WP 2: Job Monitoring and user support • WP 3: distributed data analysis (ganga) • ==> Joint venture between physics and computer science
WP 3: distributed data managementCo-ordination: Peter Malzacher (GSI Darmstadt) • GANGA: distributed analysis @ ATLAS and LHCb • Ganga is an easy-to-use frontend for job definition and management • Python, IPython or GUI interface • Analysis jobs are automatically splitted into subjobs which are sent to multiple sites in the Grid • Data management for in- and output. Distributed output is collected. • Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid) • Developed in the context of ATLAS and LHCb • Implemented in Python
catalog files Storage queues query jobs data file splitting myAna.C merging final analysis manager outputs submit GANGA schema
files catalog Storage scheduler query PROOF query: data file list, myAna.C feedbacks final outputs (merged) MASTER PROOF schema
HEPCG: summary DESY, Dortmund Dresden, Freiburg, GSI, München, Siegen, Wuppertal Dortmund, Dresden, Siegen, Wuppertal, ZIB, FH Köln, FH Niederrhein Physics Departments Computer Sciences D-GRID: Germany‘s contribution to HEP computing: dCache, Monitoring, distributed analysis Effort will continue, 2008: Start of LHC data taking challenge for GRID Concept ==> new tools and developments needed