260 likes | 270 Views
FutureGrid Design and Implementation of a National Grid Test-Bed. David Hancock – dyhancoc@indiana.edu HPC Manager - Indiana University Hardware & Network Lead - FutureGrid. IU in a nutshell. $1.7B Annual Budget, >$100M annual IT budget Recent credit upgrade to AAA One university with
E N D
FutureGridDesign and Implementation of a National Grid Test-Bed David Hancock – dyhancoc@indiana.edu HPC Manager - Indiana University Hardware & Network Lead - FutureGrid
IU in a nutshell • $1.7B Annual Budget, >$100M annual IT budget • Recent credit upgrade to AAA • One university with • 8 campuses • 107,000 students • 3,900 faculty • Nation’s 2nd largest school of medicine • Serious HPC since 1990’s • Research staff increased from 30-120 since 1995 • 50/50 Split in base and grant funding • Large scale projects: TeraGrid, Open Science Grid (ATLAS Tier2 center), PolarGrid, Data Capacitor • New Data Center opened in 2009 Research Technologies
NSF Track Overview • Track 1 – NCSA Blue Waters • Track 2a – TACC Ranger • Track2b – NICS Kraken • Track 2d • Data Intensive High Performance System (SDSC) • Experimental High Performance System (GaTech) • Experimental High Performance Test-Bed (IU)
FutureGrid • The goal of FutureGrid is to support the research on the future of distributed, grid, and cloud computing. • FutureGrid will build a robustly managed simulation environment and test-bed to support the development and early use in science of new technologies at all levels of the software stack: from networking to middleware to scientific applications. • The environment will mimic TeraGrid and/or general parallel and distributed systems – FutureGrid is part of TeraGrid and one of two experimental TeraGrid systems (other is GPU) • This test-bed will succeed if it enables major advances in science and engineering through collaborative development of science applications and related software. • FutureGrid is a (small 5400 core) Science/Computer Science Cloud but it is more accurately a virtual machine based simulation environment
FutureGrid Partners • Indiana University (Architecture, core software, Support) • Purdue University (HTC Hardware) • San Diego Supercomputer Center at University of California San Diego (INCA, Performance Monitoring) • University of Chicago/Argonne National Labs (Nimbus) • University of Florida (ViNe, Education and Outreach) • University of Southern California Information Sciences Institute (Pegasus to manage experiments) • University of Tennessee Knoxville (Benchmarking) • University of Texas at Austin/Texas Advanced Computing Center (Portal) • University of Virginia (OGF, User Advisory Board) • Center for Information Services and GWT-TUD from Technische Universtität Dresden. (VAMPIR) • Blue institutions host FutureGrid hardware
Other Important Collaborators • Early users from an application and computer science perspective and from both research and education • Grid5000 and D-Grid in Europe • Commercial partners such as • Eucalyptus …. • Microsoft (Dryad + Azure) • Application partners • NSF • TeraGrid – Tutorial at TG10 • Open Grid Forum – Possible BoF • Possibly Open Nebula, Open Cirrus Testbed, Open Cloud Consortium, Cloud Computing Interoperability Forum. IBM-Google-NSF Cloud, and other DoE/NSF/… clouds
FutureGrid Timeline • October 2009 – Project Starts • November 2009 – SC09 Demo • January 2010 – Significant Hardware installed • April 2010 – First Early Users • May 2010 – FutureGrid network complete • August 2010 – FutureGrid Annual Meeting • September 2010 – All hardware, except shared memory system, available • October 2011 – FutureGrid allocatable via TeraGrid process – first two years by user/science board
FutureGrid Usage Scenarios • Developers of end-user applications who want to create new applications in cloud or grid environments, including analogs of commercial cloud environments such as Amazon or Google. • Is a Science Cloud for me? Is my application secure? • Developers of end-user applications who want to experiment with multiple hardware environments. • Grid/Cloud middleware developers who want to evaluate new versions of middleware or new systems. • Networking researchers who want to test and compare different networking solutions in support of grid and cloud applications and middleware. • Education as well as research • Interest in performance testing requires that bare metal images areimportant
Storage Hardware • FutureGrid has a dedicated network (except to TACC) and a network fault and delay generator • Experiments can be isolated by request • Additional partner machines may run FutureGrid software and be supported (but allocated in specialized ways)
System Milestones • New Cray System (xray) • Delivery: January 2010 • Acceptance: February 2010 • Available for Use: April 2010 • New IBM Systems (india) • Delivery: January 2010 • Acceptance: March 2010 • Available for Use: May 2010 • Dell System (tango) • Delivery: April 2010 • Acceptance: June 2010 • Available for Use: July 2010 • Existing IU iDataPlex (sierra) • Move to SDSC: January 2010 • Available for Use: April 2010 • Storage Systems (Sun & DDN) • Delivery: December 2009 • Acceptance: January 2010
Network Impairments Device • Spirent XGEM Network Impairments Simulator for jitter, errors, delay, etc • Full Bidirectional 10G w/64 byte packets • up to 15 seconds introduced delay (in 16ns increments) • 0-100% introduced packet loss in .0001% increments • Packet manipulation in first 2000 bytes • up to 16k frame size • TCL for scripting, HTML for manual configuration
Network Milestones • December 2009 • Setup and configuration of core equipment at IU • Juniper EX 8208 • Spirent XGEM • January 2010 • Core equipment relocated to Chicago • IP addressing & AS # • February 2010 • Coordination with local networks • First Circuits to Chicago Active • March 2010 • Peering with TeraGrid & Internet2 • April 2010 • NLR Circuit to UFL (via FLR) Active • May 2010 • NLR Circuit to SDSC (via CENIC) Active
Global NOC Background • ~65 total staff • Service Desk: proactive & reactive monitoring 24x7x365, coordination of support • Engineering: All operational troubleshooting • Planning/Senior Engineering: Senior Network engineers dedicated to single projects • Tool Developers: Developers of GlobalNOC tool suite
Supported Projects REN-ISAC OmniPoP
FutureGrid Architecture • Open Architecture allows to configure resources based on images • Managed images allows to create similar experiment environments • Experiment management allows reproducible activities • Through our modular design we allow different clouds and images to be “rained” upon hardware. • Will support deployment of preconfigured middleware including TeraGrid stack, Condor, BOINC, gLite, Unicore, Genesis II
Software Goals • Open-source, integrated suite of software to • instantiate and execute grid and cloud experiments. • perform an experiment • collect the results • tools for instantiating a test environment • TORQUE, Moab, xCAT, bcfg, and Pegasus, Inca, ViNE, a number of other tools from our partners and the open source community • Portal to interact • Benchmarking http://futuregrid.org
Command line • fg-deploy-image • host name • image name • start time • end time • label name • fg-add • label name • framework hadoop • version 1.0 • Deploys an image on a host • Adds a feature to a deployed image http://futuregrid.org
FG Stratosphere • Objective • Higher than a particular cloud • Provides all mechanisms to provision a cloud on a given FG hardware • Allows the management of reproducible experiments • Allows monitoring of the environment and the results • Risks • Lots of software • Possible multiple path to do the same thing • Good news • We worked in a team, know about different solutions and have identified a very good plan • We can componentize Stratosphere http://futuregrid.org
Dynamic Provisioning • Change underlying system to support current user demands • Linux, Windows,Xen/KVM, Nimbus, Eucalyptus • Stateless images • Shorter boot times • Easier to maintain • Stateful installs • Windows • Use Moab to trigger changes and xCAT to manage installs http://futuregrid.org
xCAT and Moab • xCAT • uses installation infrastructure to perform installs • creates stateless Linux images • changes the boot configuration of the nodes • remote power control and console • Moab • meta-schedules over resource managers • TORQUE and Windows HPC • control nodes through xCAT • changing the OS http://futuregrid.org
Experiment Manager • Objective • Manage the provisioning for reproducible experiments • Coordinate workflow of experiments • Share workflow and experiment images • Minimize space through reuse • Risk • Images are large • Users have different requirements and need different images http://futuregrid.org
Acknowledgements • FutureGrid - http://www.futuregrid.org/ • NSF Award OCI-0910812 • NSF Solicitation 08-573 • http://www.nsf.gov/pubs/2008/nsf08573/nsf08573.htm • ViNe - http://vine.acis.ufl.edu/ • Nimbus - http://www.nimbusproject.org/ • Eucalyptus - http://www.eucalyptus.com/ • VAMPIR - http://www.vampir.eu/ • Pegasus - http://pegasus.isi.edu/