220 likes | 334 Views
Jefferson Lab: Experimental and Theoretical Physics Grids. Andy Kowalski. SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005. Jefferson Lab. Who are we? Thomas Jefferson National Accelerator Facility Department of Energy Research Laboratory
E N D
Jefferson Lab: Experimental and Theoretical Physics Grids Andy Kowalski SURA Cyberinfrastructure Workshop Georgia State University January 5–7, 2005
Jefferson Lab • Who are we? • Thomas Jefferson National Accelerator Facility • Department of Energy Research Laboratory • Southeastern Universities Research Association • What do we do? • High Energy Nuclear Physics • quarks and gluons • Operate a 6.07 GeV continuous electron beam accelerator • Free-Electron Laser (10 kW)
Data and Storage • Three experimental halls • HallA and HallC • 100’s GB/day each • HallB – CLAS • 1-2.5 TB/day (currently up to 30MB/sec) • Currently store and manage 1 PB of data on tape. • Users around the world want access to the data
Computing • Batch Farm • 200 dual CPU nodes • ~358,060 SPECint2000 • Moves 4-7 TB/day • Reconstruction • Analysis • Simulations (CLAS – large) • Lattice QCD Machine • 3 clusters • 128, 256, 384 nodes
Need for Grids • 12 GeV Upgrade • HallB – CLAS data rates increase to 40-60 MB/sec • Will export 50% or more of the data • Import data from simulations done at Universities • This can be a rather large amount • HallD – GlueX • Same scale as the LHC experiments • 100 MB/sec - 3 PB of data per year • 1 PB of raw data at JLab • 1 PB for analysis (JLAb and offsite) • 1 PB for simulations (offsite) • Lattice QCD • 10 TF machine • A significant amount of data • Users around the world want access to the data
JLab: Theory and Experimental Grid Efforts • Similarities • Focus on Data Grids • Desire interfaces definitions for interoperability • Chose web services for implementation • WSDL defines the interface • Theory • ILDG and PPDG • SRM • Replica Catalog • Experimenal • PPDG and pursuing OSG • SRM • Job submission interface
Web Services Meta Data Catalog Meta Data Catalog Meta Data Catalog Replica Catalog Replica Catalog Replica Catalog Replica Catalog Replica Catalog Replica Catalog Replication Service Replication Service Replication Service Replication Service Replication Service File Client SRM Service SRM Service SRM Service SRM Service SRM Service SRM Service SRM Service File Server(s) File Server(s) File Server(s) File Server(s) File Server(s) File Server(s) File Server(s) (Consistency Agent) Storage (disk, silo) Storage (disk, silo) Storage (disk, silo) Storage (disk, silo) Storage (disk, silo) Storage (disk, silo) Storage (disk, silo) Single Site ILDG: Data Grid Services Web Services Architecture * Slide from Chip Watson, ILDG MiddlewareProject Status
ILDG: A Three Tier Web Services Architecture Application Web Browser Authenticated connections XML to HTML servlet Web Service Web Server (Portal) Web Service Web Service Remote Web Server Web Service Web services provide a standard API for clients, and intermediary servlets allow use from a browser (as in a portal) Local Backend Services (batch, file, etc.) Storage system Catalogs * Slide from Chip Watson, ILDG MiddlewareProject Status
Web Services Meta Data Catalog Replica Catalog Replication Service File Client SRM Service File Server(s) SRM Listener Storage Resource Single Site Components: Meta Data Catalog Hold metadata for files • Hold metadata for a set of files (data set) • Process query lookup • Queries return (sets of) GFN, (Global File Name = key), and optionally full metadata for each match * Slide from Chip Watson, ILDG MiddlewareProject Status
Web Services Meta Data Catalog Replica Catalog Replication Service File Client SRM Service File Server(s) SRM Listener Storage Resource Single Site Components: Replica Catalog Track all copies of a file / data set • Get replicas • Create replica • Remove replica • Prototypes exist at • Jefferson Lab • Fermilab * Slide from Chip Watson, ILDG MiddlewareProject Status
Web Services Meta Data Catalog Replica Catalog Replication Service File Client SRM Service File Server(s) SRM Listener Storage Resource Single Site Components: Storage Resource Manager Manage storage system • Disk only • Disk plus tape • 3 party file transfers • Negotiate protocols for file retrieval (select a file server) • Auto stage a file on get (asynchronous operation) • Version 2.1 defined (collaboration) * Slide from Chip Watson, ILDG MiddlewareProject Status
ILDG Components MetaData Catalog (MDC) • Each collaboration deploys one • A mechanism (not defined yet, under discussion) exists for searching all (a virtual MDC) Replica Catalog (RC) • (same comments) Storage Resource Manager (SRM) • Each collaboration deploys one or more • At each SRM site, there are one or more file servers: http, ftp, gridftp, jparss, bbftp, … * Slide from Chip Watson, ILDG MiddlewareProject Status
JLab: Experimental Effort • PPDG (Particle Physics Data Grid) • Collaboration of computer scientists and physicists • Developing and deploying production Grid systems for experiment-specific applications • Now supporting OSG (Open Science Grid) • SRM (Storage Resource Manager) • A common/standard interface to mass storage systems • In 2003 FSU used SRM v1 to process monte-carlo for 30 million events • In 2004 deployed a v2 implementation for testing • Required for production in February 2005 • Already working with LBL, Fermi, CERN to define v3 • Job Submission • PKI Based authentication to Auger (JLab job submission system) • Investigated uJDL (a user level job description language) • BNL leading this effort
SRM v2 • Implemented SRM version 2.1.1. • Interface to Jasmine via the HPC Disk/Cache Manager. • JLab SRM is a Java Web Service. • Uses Apache Axis as SOAP Engine • Uses Apache Tomcat as Servlet Engine. • Uses GridFTP for file movement • Testing with CMU • Production service required by February 2005. • Had a hard time using GT3 • Cannot just take components that one wants (it is all or nothing)
SRM v2 Server Deployment • Requires Tomcat, MySQL, SRM worker daemon • Firewall configuration: SRM port 8443 GRIDFTP ports 2811, 40000-45000 • Currently only installed at JLab • Testing client access with CMU • Next step: install an SRM server at CMU
SRM v2 Client Deployment • Installed at JLab and CMU • Implements only srmGet and srmPut (permission problem to fix) • Requires specific ant and java versions • Proper grid certificate request and installation a challenge (?) • Use OpenSSL for cert request instead • Globus requires a full installation simply to request a cert and run the client • Just need grid-proxy-init • Note: Curtis' notes are at http://www.curtismeyer.com/grid_notes • Currently the only SRM v2 server and client
Long-Term SRM Work • We are considering how the next SRM version could become the primary interface to Jasmine and the primary farm file mover. • Use for Local and Remote Access • Goal: 25TB/day from tape through SRM. • Balancing classes of requests/prioritizing types of data transfers becomes essential. • Farm interaction use cases must be modeled: farm input, farm output, scheduling. • We are already looking at what SRM v3 will look like. • SRM Core Features and Feature Sets (ideas from the last SRM meeting)
Job Submission • uJDL • Is this really needed? • Is a standard job submission interface what is really needed? • Is that Condor-G? • Auger interface • Uses java web services • Uses PKI authentication for authentication • Not GSI
Grid3dev - OSG • JLab development effort is limited • Grid3 proved successful • Atlas and CMS were the major users • JLab plans to join Grid3dev as a step toward OSG-INT/OSG • We cannot develop everything we need • VO management tools, monitoring, etc. • Testing and evaluation • Integration with facility infrastructure • Determine what we need and can use for others
References • http://www.ppdg.net • http://www.opensciencegrid.org • http://www.lqcd.org/ildg • http://sdm.lbl.gov/srm • http://www.globus.org