250 likes | 358 Views
STAR Grid Activities, OSG and Beyond. D. Olson a for the STAR Collaboration The STAR Grid Team: W. Betts b , L. Didenko b , T. Freeman c , P. Jakl b , L. Hajdu b , E. Hjort a , K. Keahey c , J. Lauret b , D. Olson a , A. Rose a , I. Sakrejda a , A. Sim a a LBNL, b BNL, c ANL. Abstract.
E N D
STAR Grid Activities, OSG and Beyond D. Olsona for the STAR Collaboration The STAR Grid Team: W. Bettsb, L. Didenkob, T. Freemanc, P. Jaklb, L. Hajdub, E. Hjorta, K. Keaheyc,J. Lauretb, D. Olsona, A. Rosea, I. Sakrejdaa, A. Sima aLBNL, bBNL, cANL
Abstract We will present the ongoing grid efforts of the STAR experiment within the Open Science Grid (OSG) and beyond, as well as the integration of resources in Europe, Asia and South America. STAR is a founding member of the OSG Consortium and has several functioning resources on OSG, its main facilities at BNL/RCF and LBNL/NERSC as well as universities, Wayne & Birmingham. Additional resources are in process of connecting to OSG. Numerous distributed resources used by STAR collaborators are employing grid or grid-inspired technologies. Common examples are the usage of grid job submission tools with the STAR standard workload service called SUMS and the use of data handling and transfer tools across grids. Maximizing on heterogeneity of resources while minimizing in-house platform support efforts, evaluation of the dynamic deployment of reliable data analysis framework via STAR validated software stack with Xen virtual machine is being thoroughly investigated, leveraging advanced VM technologies and research from the CEDPS project. Olson, STAR Grid Activities, ISGC2008
Contents • Background/History • Open Science Grid Deployments and Usage • Other Distributed Computing Usage • Asian Activities • Workload Scheduling (SUMS) • Virtualization & Cloud Computing • Conclusion Olson, STAR Grid Activities, ISGC2008
Background/History • STAR has been participating in the U.S. grid activities since the early days of the Particle Physics Data Grid (1999) and a founding member of the Open Science Grid. • Starting with involvement of LBNL and BNL, activities now include collaborators also at Wayne State, MIT, Univ. Chicago, Birmingham, Sao Paolo, Prague and ANL. • Additionally • SUN Grid, 2007 • MIT Xgrid, 2006+ • Xen, Amazon EC2, 2007+ Olson, STAR Grid Activities, ISGC2008
Wayne State University University of Birmingham Brookhaven National Lab PDSF Berkeley LAB Fermi Lab STAR Grid STAR Grid = 90% of Grid resources part of the OpenScience Grid
Amazon.com NPI, Czech Republic MIT X-grid SunGrid STAR is also outreaching other grid resources & projects Interoperability / outreach Virtualization VDT extension SRM / DPM / EGEE
Resources used by STAR 6 main dedicated sites (STAR software fully installed) • BNL Tier0 • NERSC/PDSF Tier1 • WSU (Wayne State University) Tier2 • BHAM (Birmingham, England) Tier2 • UIC (University of Illinois, Chicago) Tier2 Incoming • Prague Tier2 Other resources • FermiGrid - non STAR dedicated ; simulation production 10% level • SunGrid – commercial (free for STAR) ; event generation 1-2% level • MIT Xgrid cluster – analysis mainly ; working on Globus GK for Mac OSX • Amazon.com EC2 cluster (Elastic Computing Cloud) ; event generation for now ; exercise on Xen based virtualization 1-2% level Olson, STAR Grid Activities, ISGC2008
BeStMan SRMBerkeley Storage Manager • SRM interface with caching for data transfer • We use for bulk data transfer as well as asynchronous data placement in job workflow. • Expect to deploy BeStMan-Xrootd interface http://datagrid.lbl.gov/bestman/ Olson, STAR Grid Activities, ISGC2008
OSG usage Usage - Process Hours / Week Olson, STAR Grid Activities, ISGC2008
Proof of Principle Initial Successes and Benefits from OSG • Year 1 OSG Milestone for STAR: • Migration of 80% or more of the simulation production to OSG based operation • Simulation production - 97% efficiency achieved • Exceeds expectations (we targeted a satisfactory level between 75% to 85% success) • Site used are not necessarily STAR dedicated (FermiGrid) • Especially: STAR received help from Fermi resources and the FNAL team in June 2007 • several k CPU hours loaned on emergency request • as small as it seems, this help made the difference • This part of resource loan worked and is an important proof of principle of OSG benefit Efficiency of job execution via OSG infrastructure. After resubmission Before resubmission. Olson, STAR Grid Activities, ISGC2008
Other grid/distributed activities • Xgrid at MIT • Adam Kocoloski, Michael MillerLeve Hajdu • Mac OS X, 50 desktops • Scavenging spare cycles • Doing STAR data analysis via SUMS so same UI for analysis • Xgrid/Globus job manager in test • Prague, EGEE Tier2 site • Michal Zerola, Pavl Jakl • High-performance data transfer using multiple srmcp to DPM in Prague (next slide) • SUN Grid • Production of STAR Geant simulations on SUN utility computing resources. Olson, STAR Grid Activities, ISGC2008
Data transfer to Prague:parallel srmcp to DPM storage element, 700 Mbps – 20 threads Olson, STAR Grid Activities, ISGC2008
STAR Asian institutions • China • IHEP, Beijing (2) • Institute of Modern Physics, Lanzhou (6) • USTC, Beijing (14) • Shanghai Institute of Applied Physics (11) • Tsinghua University (9) • Institute of Particle Physics, Wuhan (12) • India • Institute of Physics, Bhubaneswar (4) • Indian Institute of Technology, Mumbai (5) • University of Jammu (15) • Panjab University (5) • University of Rajasthan (3) • Variable Energy Cyclotron Centre, Kolkata (14) • Korea • Pusan National University (4) • KISTI (in progress as CS collaborator) Olson, STAR Grid Activities, ISGC2008
Asian Activities • Many collaborators in Asia • Planning for Tier2-like facility at PNU • Discussions with KISTI of possible Tier1-like facility for Asia region • Anxious to see how we can better interface/integrate with our Asian collaborators on computational aspects Olson, STAR Grid Activities, ISGC2008
Gloriad • 10 Gb all the way through NY • Would allow for immediate full data transfer • Would allow later year ½ dataset transfer • Possibly more depending on Gloriad expansion Olson, STAR Grid Activities, ISGC2008
SUMS • STAR Unified Meta Scheduler • A single user interface and framework for submitting to all STAR resources, local and grid flavors • Optimizes resource utilization 25K jobs/day Olson, STAR Grid Activities, ISGC2008
Why Xen? Virtualization? • SIMULATION = EVENT GENERATION IS EASY … • We can all do it … • BEYOND THAT, the reality • Complex experimental application codes • Developed over more than 10 years, by more than 100 scientists, comprises ~2 M lines of C++ and Fortran code • Require complex, customized environments • Rely on the right combination of compiler versions and available libraries • Dynamically load external libraries depending on the task to be performed • Environment validation • To ensure reproducibility and result uniformity across environments • Regression tests cannot be done on all OS flavors due to simple manpower considerations) Olson, STAR Grid Activities, ISGC2008
Why Xen? Virtualization? • Solution? Use Virtual Machines (Xen) • Bring your environment with you • Fast to deploy, enables short-term leasing • Excellent enforcement, performance isolation • Very good security isolation • Minimize experiment team’s efforts • Activity ↔ Development effort leveraged though CEDPS SciDAC partner project Olson, STAR Grid Activities, ISGC2008
Deploying OSG Cluster as Workspaces OSG CE image as gatekeeper Worker node images with application environment. Pool node Pool node Pool node VWS Service Cluster manager can deploy gatekeeper and workernodes in ~ 30 min. Pool node Pool node Pool node Pool node Pool node Pool node Application workload submitted to cluster as to any other OSG CE. Pool node Pool node Pool node Cluster can be retired after workload finishes, freeing resources for other applications. Olson, STAR Grid Activities, ISGC2008
Virtual Machine activities • “Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.” • Work so far: • Xen image with OSG 0.6.0 CE on SL 4.4 • Xen image with OSG 0.6.0 WN on SL 4.4 • Use Globus Workspaces to deploy gatekeeper and workernodes on EC2 • Can launch 100 node cluster in ~ 30 min. • Have run Hijing event generator simulations on EC2. • Have prepared Xen image with full STAR software environment on SL4.4, currently being validated • Next steps: • Run event reconstruction of simulations on EC2 and Teraport cloud Olson, STAR Grid Activities, ISGC2008
Nersc PDSF ENC2 Amazon.com WSU Accelerated display of a workflow job state Y = job number, X = job state Olson, STAR Grid Activities, ISGC2008
VM image build/maintenance • We are working with rPath, Inc. in an SBIR project to use rBuilder to efficiently build and maintain OS and application images. • From the inventors of RPM,rBuilder • http://www.rpath.com/rbuilder • “rBuilder is the first and only development tool that simplifies and automates the creation of software appliances and virtual appliances. rBuilder combines powerful features with innovative packaging techniques to yield a repeatable appliance creation process. “ Olson, STAR Grid Activities, ISGC2008
Near term plans • We MUST prepare for real data production on OSG • And take ANY shortcut necessary to accomplish it BY 2009 • onset of DAQ1000, one order of magnitude higher data acquisition rate than today will require additional resources for real-data processing • Virtualization appears to us as one development helping to easily deploy & run a 2 Million line framework (software) for data mining • UCM job tracking (SBIR with Tech-X) is maturing • Essential to engage discussion on integration – we MUST monitor our application • We have to consolidate our sites • More resources are available in STAR but not-fully used (BHAM, UIC for example) • We will ramp up in infrastructure support to achieve this • We hope leveraging OSG efforts in the US (UIC for example) • We have efforts in integrating Mac OS-X resources from MIT • Initial work was uniquely started in STAR • Is there a path forward? Depends on priorities … Olson, STAR Grid Activities, ISGC2008
Longer term needs • Requirements driven by demanding data processing • https://twiki.grid.iu.edu/twiki/bin/view/UserGroup/VOApplicationsRequirements#STAR • We will need to efficiently share resources • Concerned about what happens when LHC has ramped up data taking. • Will there be any cycles left to be had? • Additional • STAR is expanding its pool of sites • Interest in sites possibly shared by EGEE - OSG interoperability (especially China) • Hoping for help from OSG to understand policy as well as technology issues. • We believe virtualization is “a” path forward to • Simple deployment of experimental software • Allowing experimental software developer’s team to concentrate on science and a minimal OS version support • Globus workload management needed Olson, STAR Grid Activities, ISGC2008
Conclusion • STAR Grid usage is expanding geographically and functionally. • Upgrades at STAR and RHIC are driving a significant increase in computational needs beginning next year which means we MUST push more workload onto the grid. • The emergence (and convergence?) of VM, cloud computing and grid make very powerful paradigm for scientific computing. • We want (and need) to have greater involvement with our Asia-Pacific colleagues which is enabled with new trans-Pacific networks. Olson, STAR Grid Activities, ISGC2008