180 likes | 295 Views
TeraGrid Introduction. John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications University of Illinois. Our Vision of TeraGrid. Three part mission: support the most advanced computational science in multiple domains
E N D
TeraGrid Introduction John Towns Chair, TeraGrid Forum Director, Persistent Infrastructure National Center for Supercomputing Applications University of Illinois
Our Vision of TeraGrid • Three part mission: • support the most advanced computational science in multiple domains • empower new communities of users • provide resources and services that can be extended to a broader cyberinfrastructure • TeraGrid is… • an advanced, nationally distributed, open cyberinfrastructure comprised of supercomputing, storage, and visualization systems, data collections, and science gateways, integrated by software services and high bandwidth networks, coordinated through common policies and operations, and supported by computing and technology experts, that enables and supports leadingedge scientific discovery and promotes science and technology education • a complex collaboration of over a dozen organizations and NSF awards working together to provide collective services that go beyond what can be provided by individual institutions
TeraGrid: greater than the sum of its parts… • Single unified allocations process • Single point of contact for problem reporting and tracking • especially useful for problems between systems • Simplified access to high end resources for science and engineering • single sign-on • coordinated software environments • uniform access to heterogeneous resources to solve a single scientific problem • simplified data movement • Expertise in building national computing and data resources • Leveraging extensive resources, expertise, R&D, and EOT • leveraging other activities at participant sites • learning from each other improves expertise of all TG staff • Leadership in cyberinfrastructure development, deployment and support • demonstrating enablement of science not possible without the TeraGrid-coordinated human and technological resources
Diversity of Resources (not exhaustive) • Very Powerful Tightly Coupled Distributed Memory • Ranger (TACC): Sun Constellation, 62,976 cores, 579 Teraflops, 123 TB memory • Kraken (NICS): Cray XT5, 66,048 cores, 608 teraflops, over 1 petaflop later in 2009 • Shared Memory • Cobalt (NCSA): Altix, 8 TF, 3 TB shared memory • Pople (PSC): Altix, 5 Tflop, 1.5 TB shared memory • Clusters with Infiniband • Abe (NCSA): 90 Tflops • Lonestar (TACC): 61 Tflops • QueenBee (LONI): 51 Tflops • Condor Pool (Loosely Coupled) • Purdue- up to 22,000 cpus • Visualization Resources • TeraDRE (urdue): 48 node nVIDIA GPUs • Spur (TACC): 32 nVIDIA GPUs • Various Storage Resources
Resources to come… • Track 2c @ PSC • large shared memory system in 2010 • Track 2d being competed • data-intensive HPC system • experimental HPC system • pool of loosely coupled, high throughput resources • experimental, high-performance grid test bed • Blue Waters (Track 1) @ NCSA: • 1 Pflop sustained on serious applications in 2011
How is TeraGrid Organized? • TG is set up like a large cooperative research group • evolved from many years of collaborative arrangements between the centers • still evolving! • Federation of 12 awards • Resource Providers (RPs) • Grid Infrastructure Group (GIG) • Centrally coordinated by the TeraGrid Forum • made up of the PI’s from each RP and the GIG • led by the TG Forum Chair, who is responsible for coordinating the group (elected position) • John Towns – TG Forum Chair • responsible for the strategic decision making that affects the collaboration TeraGrid Annual Review, April 6-8, 2009
TeraGrid Participants TeraGrid Annual Review, April 6-8, 2009
The “Grid Infrastructure Group” manages local area directors (AD’s) who direct project activities across multiple RP’s manages rich sub-award structure involving most RP and some additional sites responsible for coordinating and maintaining TeraGrid central services that benefit the user community helps facilitate any joint RP activities that could benefit the entire collaboration and the user community GIG Management GIG Director – Ian Foster (interim) Deputy Director Matthew Heinzel Area Directors Software Integration Lee Liming/J.P. Navarro Gateways Nancy Wilkins-Diehr User Services Sergiu Sanielevici Advanced User Support Amit Majumdar Data, Vis, Scheduling Kelly Gaither Network, Ops, Security Von Welch EOT Scott Lathrop What is the GIG? TeraGrid Annual Review, April 6-8, 2009
Strategic Objectives • Objectives determined from considering numerous inputs • user input via various mechanisms • surveys, user contacts, advisory bodies, review panels, etc. • technical input from TG staff • Planning for PY5 started by identifying 5 high level project strategic objectives • Enable science that could not be done without TeraGrid • Broaden the user base • Simplify users lives • Improve Operations • Enable connections to external resources TeraGrid Annual Review, April 6-8, 2009
Allocations Process • National peer-review process • allocates computational, data, and visualization resources • makes recommendations on allocation of advanced direct support services • Managed by TeraGrid • GIG and RP Participants in reviews • CORE Services award to manage shared responsibilities • TACC: Meeting coordination • SDSC: TG Central DB • NCSA: POPS, TG Allocations group • Currently awarding >1B Normalized Units of resources
Clusters: ~204TF NCSA: 100TF Abe: Dell PE1950 Blades, 90TF Mercury: IBM IA-64, 10TF TACC: 62TF Lonestar: Dell PE1955, 62TF Purdue: 20.6TF Lear: DELL EM64T, 6.6TF Condor Pool: 14TF LONI: 10.7TF Queen Bee: Dell PE1950 Blades, 10.7TF Indiana: 7TF Quarry: IBM HS21, 7TF SDSC: 3.1TF TG Cluster: IBM IA-64, 3.1TF ANL: 0.61TF TG Cluster: IBM IA-64, 0.61 TF ORNL: 0.34TF NSTG: IBM IA-32, 0.34TF MPPs: ~1,245TF UTK/NICS: 608TF Kraken: Cray XT5, 608TF TACC: 579TF Ranger: Sun Constellation, 579TF Indiana: 30.7TF Big Red: IBM JS21, 30.7TF PSC: 21.3TF Big Ben: Cray XT3, 21.3 TF NCAR: 5.7TF Frost: IBM BlueGene/L, 5.7TF SMPs: ~12TF NCSA: 6.6TF Cobalt: SGI Altix, 8TF PSC: 5TF Pople: SGI Altix, 5TF Other: ~47TF NCSA: 47 TF Lincoln: Dell PE1955+NVIDIA GPUs, 47TF TG Compute Resources: ~1.5PF
TG Data Resources • Data Collections, Database and Local Disk Space • Indiana: 100TB • SDSC: 400TB • NCSA: 220TB (Projects Filesystems) • Remote/Wide Area Disk Space • Indiana: 535TB (Data Capacitor) • SDSC: 150TB (GPFS-WAN) TeraGrid Annual Review, April 6-8, 2009
TG Visualization Resources • TACC: • Spur • ???? Sun E25K, Nvidia Quadro FX 3000G (32) • ????serial and parallel, remote visualization applications • ANL: • IA-32 Viz Cluster • IBM IA-32, Nvidia GeFORCE 6600GT (96) • Purdue: • TeraDRE • Condor Pools, Nvidia GeForce 6600 GT (48) • graphics rendering TeraGrid Annual Review, April 6-8, 2009
HPC User Community is Growing JohnT trying to update for this presentation TeraGrid Annual Review, April 6-8, 2009 Source: TeraGrid Central Database
TeraGrid HPC Usage, 2008 3.8B NUs in Q4 2008 • In 2008, • Aggregate HPC power increased by 3.5x • NUs requested and awarded quadrupled • NUs delivered increased by 2.5x Kraken, Aug. 2008 Ranger, Feb. 2008 3.9B NUs in 2007
CY2007 Usage by Discipline Advanced Scientific All 19 Others Atmospheric Computing 4% Sciences 2% 3% Earth Sciences 3% Molecular Chemical, Thermal Biosciences Systems 31% 5% Materials Research 6% Astronomical Sciences 12% Chemistry 17% Physics 17% JohnT trying to update for this presentation 3.95B NUs delivered in CY2007 TeraGrid Annual Review, April 6-8, 2009
TeraGrid: greater than the sum of its parts… • Single unified allocations process • Single point of contact for problem reporting and tracking • especially useful for problems between systems • Simplified access to high end resources for science and engineering • single sign-on • coordinated software environments • uniform access to heterogeneous resources to solve a single scientific problem • simplified data movement • Expertise in building national computing and data resources • Leveraging extensive resources, expertise, R&D, and EOT • leveraging other activities at participant sites • learning from each other improves expertise of all TG staff • Leadership in cyberinfrastructure development, deployment and support • demonstrating enablement of science not possible without the TeraGrid-coordinated human and technological resources