1 / 39

NCAR’s Response to upcoming OCI Solicitations

NCAR’s Response to upcoming OCI Solicitations. Richard Loft SCD Deputy Director for R&D. Outline. NSF Cyberinfrastructure Strategy (Track-1 & Track-2) NCAR generic strategy for NSFXX-625’s (Track-2) NCAR response to NSF05-625 NSF Petascale Initiative Strategy

ronna
Download Presentation

NCAR’s Response to upcoming OCI Solicitations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCAR’s Response to upcoming OCI Solicitations Richard Loft SCD Deputy Director for R&D

  2. Outline • NSF Cyberinfrastructure Strategy (Track-1 & Track-2) • NCAR generic strategy for NSFXX-625’s (Track-2) • NCAR response to NSF05-625 • NSF Petascale Initiative Strategy • NCAR response to NSF Petascale Initiative

  3. NSF’s Cyberinfrastructure Strategy • The NSF’s HPC acquisition strategy (through FY10) for HPC is for three Tracks: • Track 1: High End O(1 PFLOPS sustained) • Track 2: Mid level system O(100 TFLOPS) NSFXX-625 • First instance (NSF05-625) submitted Feb 10, 2006 • Next instances due: • November 30, 2006 • November 30, 2007 • November 30, 2008 • Track 3: Typical University HPC O(1-10 TFLOPS) • The purpose of the Track-1 system will be to achieve revolutionary advancement and breakthroughs in science and engineering.

  4. Solicitation NSF05-625:Towards a Petascale Computing Environment for Science and Engineering • Award: September 2006 • System in production by May 31, 2007 • $30,000,000 or $15,000,000. • Operating costs funded under separate action. • RP serves the broad science community - open access. • Allocations by LRAC/MRAC or “their successors” • Two 10 Gb/s TeraGrid links

  5. NCAR’s Overall NSFXX-625 Strategy • Leverage NCAR/SCD expertise in production HPC. • Get a production system - • No white box Linux solutions. • Stay on path to usable petascale systems • NCAR is a Teragrid outsider - must address two areas: • Leverage experience with general scientific users • Lack of Grid consulting experience • Emphasize, but don’t over emphasize, geosciences. • In proposing, NCAR has a facility problem • Minimize costs - power, administrative staff, level of support. • Creative plan for remote user support and education.

  6. NSF05-625 Partners • Facility Partner • End-to-End System Supplier • User Support Network - • NCAR Consulting Service Group • University partners

  7. NSF05-625 Facility Partner • NCAR ML Facility after ICESS is FULL. • Key Points: • A new datacenter is needed whether NCAR wins the NSF05-625 solicitation or not. • Because of the short timeline, new datacenter never factors into the strategy for NSFXX-625. • Identified a colocation facility • facility features • local (Denver-Boulder area) • State of the Art, High Availability Center • Currently 4 x 2MW generators of power available • Familiar with large scale deployments • Dark Fibre readily available (good connectivity)

  8. NSF05-625 Supercomputer System Details • Two systems: capability + capacity • ~80 Tflops combined • Robotic tape storage system ~12PB

  9. NCAR NSF05-625 User Support Plan • Largest potential differentiator in proposal - let’s do something unique! • System will be used by the generic scientist -support plan must • Be extensible to other domains than geoscience • Address grid user support • Strategy leverages OSCER-lead IGERT proposal- • Combine teaching of computational science with user support • Embed application support expertise in key institutions • Build education and training materials through university partnerships.

  10. Track-1 System Background • Source of funds: Presidential Innovation Initiative announced in SOTU. • Performance goal:1 PFLOPSsustained on “interesting problems”. • Science goal: breakthroughs • Use model: 12 research teams per year using whole system for days or weeks at a time. • Capability system - large everything & fault tolerant. • Single system in onelocation. • Not a requirement that machine be upgradable.

  11. Track-1 Project Parameters • Funds:$200M over 4 years, starting FY07 • Single award • Money is for end-to-end system (as in 625) • Not intended to fund facility. • Release of funds tied to meeting hw and sw milestones. • Deployment Stages: • Simulator • Prototype • Petascale system operates: FY10-FY15 • Operations funds FY10-15 funded separately.

  12. Two Stage Award Process Timeline • Solicitation out: May, 2006 (???) • [ HPCS down-select: June, 2006 ] • Preliminary Proposal due: August, 2006 • Down selection (invitation to 3-4 to write Full Proposal) • Full Proposal due: January, 2007 • Site visits: Spring, 2007 • Award: Sep, 2007

  13. NSF’s view of the problem • NSF recognizes the facility (power, cooling, space) challenge of this system. • Therefore NSF welcomes collaborative approaches: • University & Federal Lab • University & commercial data center • University & State Government • University consortium • NSF recognizes that applications will need significant modification to run on this system. • User support plan • Expects proposer to discuss needs in this area with experts in key applications areas.

  14. The Cards in NCAR’s Hand • NCAR … • Is a leader in making the case that geoscience grand challenge problems need petascale computing. • Has many grand challenge problems to offer itself. • Has experience at large processor counts. • Has recently connected to the TeraGrid, and is moving towards becoming a full-fledged Resource Provider.

  15. NCAR Response Options • Do Nothing • Focus on Petascale Geoscience Applications • Partner with a lead institution or consortium • Lead a Tier-1 proposal

  16. NCAR Response Options • Do Nothing • Focus on Petascale Geoscience Applications • Partner with a lead institution or consortium • Lead a Tier-1 proposal

  17. Questions, Comments?

  18. The Relationship Between OCI’s Roadmap and NCAR’s Datacenter project Richard Loft SCD Deputy Director for R&D

  19. Projected CCSM Computing Requirements Exceed Moore’s Law Thanks to Jeff Kiehl/Bill Collins

  20. NSF’s Cyberinfrastructure Strategy • The NSF’s HPC acquisition strategy (through FY10) for HPC is for three Tracks: • Track 1: High End O(1 PFLOPS sustained) • Track 2: Mid level system O(100 TFLOPS) NSFXX-625 • First instance (NSF05-625) submitted Feb 10, 2006 • Next instances due: • November 30, 2006 • November 30, 2007 • November 30, 2008 • Track 3: Typical University HPC O(1-10 TFLOPS) • The purpose of the Track-1 system will be to achieve revolutionary advancement and breakthroughs in science and engineering.

  21. NCAR strategic goals: • NCAR will stay in the top echelon of geoscience computing centers. • NCAR’s immediate strategic goal is to be a Track-2 center. • To do this, NCAR must be integrated with NSF’s cyberinfrastructure plans. • This means both connecting and ultimately operating within the Teragrid framework. • The Teragrid is evolving, so this is a moving target.

  22. NCAR new-facility • NCAR ML Facility after ICESS is FULL. • Key Points: • A new datacenter is needed whether NCAR wins the NSF05-625 solicitation or not. • Because of the short timeline, a new datacenter never factors into the strategy for NSFXX-625. • Right now, we can’t handle a modest budget augmentation for computing with the current facility.

  23. Mesa Lab is full after the ICESS procurement • ICESS = Integrated Computing Environment for Scientific Simulation • We’re sitting at 980 kW right now. • Deinstall of bluesky will give us back 450 kW. • This leaves about 600 kW of head-room. • The ICESS procurement is expected to deliver a system with a maximum power requirement of 500-600 kW of power. • This is not enough to house $15M-$30M of equipment from NSF05-625, for example.

  24. We’re fast running out of power… Max power at the Mesa Lab is 1.2 MW!

  25. Preparing for the Petascale Richard Loft SCD Deputy Director for R&D

  26. What to expect in HEC? • Much more parallelism. • A good deal of uncertainty regarding node architectures. • Many threads per node. • Continued ubiquity of Linux/Intel systems. • There will be vector systems • Emergence of exotic architectures. • Largest (petascale) system likely to have special features • Power aware design (small memory?) • Fault tolerant design features • Light-weight compute node kernels • Custom networks

  27. Top 500:Speed of Supercomputers vs Time

  28. Top 500:Number of Processors vs Time

  29. HEC in 2010 • Based on history, should expect 4K-8K CPU systems to be commonplace by the end of the decade. • The largest systems on the Top500 list should be 1-10 PFLOPS. • Parallelism in largest system - estimate (2010). • Assume a clock speed of 5 GHz a double FMA CPU delivers 20 GFLOPS peak • 1 PFLOPS peak = 50K CPU’s. • 10 PFLOPS peak = 500K CPU’s • Large vector systems (if they exist) will still be highly parallel. • To justifying using the largest systems, must use a sizable fraction of the resource.

  30. Range of Plausible Architectures: 2010 • Power issues will slow rate of increase in clock frequency. • This will drive trend towards massive parallelism. • All scalar system with have multiple CPU’s per socket (chip). • Currently 2 CPU’s per core, by 2008, 4 CPU’s per socket will be common place. • 2010 scalar architectures will likely continue this trend. 8 CPU’s are possible - Cell Chip already has 8 synergistic processors. • Key unknown is which architecture for a cluster on a chip will be most effective. • Vector systems will be around, but at what price? • Wildcards • Impact of DARPA HPCS program • Exotics: FPGA’s, PIM’s, GPU’s.

  31. How to make science staff aware ofcoming changes? • NCAR must develop a science driven plan for exploiting petascale systems at the end of the decade. • Briefed NCAR Director, DD, CISL and ESSL Directors • Meetings (SEWG at CCSM Breckenridge) • Organizing NSF workshops on petascale geoscience benchmarking scheduled at DC (June 1-2) and NCAR (TBD) • Have initiated internal petascale discussions • CGD-SCD joint meetings • Peta_ccsm mail list. • Peta_ccsm Swiki site. • Through activities like this. NSA should take leadership role.

  32. What must be done to secure resources to improve scalability? • Must help ourselves. • Invest judiciously in computational science where possible. • Leverage application development partnerships (SciDAC, etc.) • Write proposals. • Support for applications development for the Track-1 system can be built into a NCAR partnership deal. • NSF has indicated an independent funding track for applications. NCAR should aggressively pursue those funding sources. • New ideas can help - e.g. POP

  33. POP Space Filling Curves: partition for 8 processors Credit: John Dennis, SCD

  34. POP 1/10 Degree BG/L Improvements

  35. POP 1/10 Degree performance BG/L SFC improvement

  36. Questions, Comments?

  37. Top 500 Processor Types: Intel taking over Today Intel is inside 2/3 of the Top500 machines

  38. The commodity onslaught … • The Linux/Intel cluster is taking over Top500. • Linux has not penetrated at major Weather, Ocean, Climate centers- yet - reasons • System maturity (SCD experience) • Scalability of dominant commodity interconnects • Combinatorics (Linux flavor, processor, interconnect, compiler) • But it affects NCAR indirectly because… • Ubiquity = Opportunity • Universities are deploying them. • NCAR must rethink services provided to the Universities. • Puts strain on all community software development activities.

More Related