1 / 29

Ralph Roskies Scientific Director, Pittsburgh Supercomputing Center Jan 30, 2009

Data- Driven Computational Science and Future Architectures at the Pittsburgh Supercomputing Center. Ralph Roskies Scientific Director, Pittsburgh Supercomputing Center Jan 30, 2009. NSF TeraGrid Cyberinfrastructure. Mission: Advancing scientific research capability through advanced IT

albina
Download Presentation

Ralph Roskies Scientific Director, Pittsburgh Supercomputing Center Jan 30, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data- Driven Computational Science and Future Architectures at the Pittsburgh Supercomputing Center Ralph Roskies Scientific Director, Pittsburgh Supercomputing Center Jan 30, 2009

  2. NSF TeraGrid Cyberinfrastructure • Mission: Advancing scientific research capability through advanced IT • Resources: Computational, Data Storage, Instruments, Network

  3. Now is a Resource Rich Time • NSF has funded two very large distributed memory machines available to the national research community • Trk2a-Texas- Ranger (62,976 cores, 579 Teraflops, 123 TB memory) • Trk2b-Tennessee- Kraken (18048 cores, 166 Teraflops, 18 TB memory) growing to close to a Petaflop • Track 2d: data centric; experimental architecture; … proposals in review • All part of TeraGrid. Largest single allocation this past September was 46M processor hours. • In 2011, NCSA is going to field a 10 PF machine.

  4. Increasing Importance of Data in Scientific Discovery Large amounts from instruments and sensors. • Genomics. • Large Hadron Collider • Huge astronomy data bases • Sloan Digital Sky Survey • Pan Starrs • Large Scale Synoptic Telescope Results of large simulations ( CFD, MD, cosmology,…)

  5. Insight by VolumeNIST Machine Translation Contest • In 2005, Google beat all the experts by exploiting 200 billion words of documents (Arabic to English, UN high quality translation), and looking at all 1- word, 2-word,…5 word phrases, and estimating their best translation. Then applied that to the test text. • No one on the Google team spoke Arabic or understood its syntax! • Results depend critically on the volume of text analyzed. 1 billion words would not have sufficed

  6. What computer architecture is best for data intensive work? Based on discussions with many communities, we believe that a complementary architecture embodying large shared memory will be invaluable • Large graph algorithms (many fields including web analysis, bioinformatics, …) • Rapid assessment of data analysis ideas, using OpenMP rather than MPI, and with access to large data

  7. History of first or early systems 7

  8. PSC Facilities XT3 (BigBen)4136p, 22 TFlop/s Altix (Pople) 768 p, 1.5TB shared memory Visualization Nodes NVidia Quadro4 980XGL Storage Cache Nodes 100 TB Storage Silos 2 PB DMF Archive Server

  9. PSC Shared Memory Systems • Pople- introduced this March 2008 • SGI Altix 4700, 768 Intel cores, 1.5 TB coherent shared memory, Numalink Interconnect • Highly oversubscribed • Already stimulated work in new areas, because of perceived ease of programming in shared memory • Game theory (Poker), • Epidemiological modeling • Social network analysis: • Economics of Internet Connectivity: • fMRI study of Cognition:

  10. Desiderata for New System • Powerful Performance • Programmability • Support for current applications • Support for a host of new applications and science communities.

  11. Proposed Track 2 System at PSC • Combines next generation Intel processors (Nehalem EX) with SGI next generation interconnect technology, (NUMAlink-5) • ~100,000 cores, ~100TB memory, ~1 Pf peak • At least 4TB coherent shared memory components, with full globally addressable memory • Superb MPI and IO performance

  12. Accelerated Performance • MPI Offload Engine (MOE) • Frees CPU from MPI activity • Faster Reductions (2-3x compared to competitive clusters/MPPs) • Order of magnitude faster barriers and random access • NUMAlink 5 Advantage • 2-3× MPI latency improvement • 3× bandwidth of InfiniBand QDR • Special support for block transfer and global operations • Massively Memory-mapped I/O • Under user control • Big speedup for I/O bound apps

  13. Enhanced productivity from Shared Memory • Easier shared memory programming for rapid development/prototyping • Will allow large scale generation of data, and analysis on the same platform without moving- (a major problem for current Track2 systems) • Mixed shared memory/ MPI programming between much larger blocks (e.g. Woodward’s PPM code or example below)

  14. MPI, shmem HybridMPI/OpenMP, MPI/threaded Charm++ PGAS UPC, CAF Coherent SharedMemory OpenMP,pthreads HighProductivity Star-P: parallelMATLAB, Python, R High-Productivity, High-PerformanceProgramming Models The T2c system will support programming models for: • extreme capability • algorithm expression • user productivity • workflows

  15. Programming ModelsPetascale Capability Applications • Full-system applications will run in any of 4 programming models • Dual emphasis on performance and productivity • Existing codes • Optimization for multicore • New and rewritten applications MPI, shmem HybridMPI/OpenMP, MPI/threaded Charm++ PGAS UPC, CAF Coherent SharedMemory OpenMP,pthreads HighProductivity Star-P: parallelMATLAB, Python, R

  16. Programming ModelsHigh Productivity Supercomputing • Algorithm development • Rapid prototyping • Interactive simulation • Also: • Analysis and visualization • Computational steering • Workflows MPI, shmem HybridMPI/OpenMP, MPI/threaded Charm++ PGAS UPC, CAF Coherent SharedMemory OpenMP,pthreads HighProductivity Star-P: parallelMATLAB, Python, R

  17. Programming ModelsNew Research Communities • multi-TB coherent shared memory • Global address space • Express algorithms not served by distributed systems • Complex, dynamic connectivity • Simplify load balancing MPI, shmem HybridMPI/OpenMP, MPI/threaded Charm++ PGAS UPC, CAF Coherent SharedMemory OpenMP,pthreads HighProductivity Star-P: parallelMATLAB, Python, R

  18. Enhanced Service for Current Power UsersAnalyze Massive Data where you produce it • Combines superb MPI performance with shared memory and higher level languages for rapid analysis prototyping,

  19. Analysis of Seismology Simulation Results • Validation across models (Quake: CMU, AWM: SCEC) 4D waveform output at 2Hz (to address civil engineering structures) for 200s earthquake simulations will generate hundreds of TB of output. • Voxel by voxel comparison is not an appropriate comparison technique. PSC developed data-intensive statistical analysis tools to understand subtle differences in these vast spatiotemporal datasets. • required having substantial windowsof both datasets in memory to compare

  20. Design of LSST Detectors • Gravitational lensing can map the distribution of dark matter in the Universe and make estimates of Dark Energy content more accurate. • Measurements are very subtle. • High quality modeling, with robust statistics, is needed for LSST detector design. • Must calculate ~10,000 light cones through each simulated universe. • Each universe is 30TB. • Each light cone calculation requires analyzing large chunks of the entire dataset..

  21. Understanding the Processes that DriveStress-Corrosion Cracking (SCC) • Stress-corrosion cracking affects the safe, reliable performance of buildings, dams, bridges, and vehicles. • Corrosion costs the U.S. economy about 3% of GDP annually. • Predicting the lifetime beyond which SCC may causefailure requires multiscale simulations that couplequantum, atomistic, and structural scales. • 100-300nm, 1-10 million atoms, over 1-5 μs, 1 fstimestep • Efficient execution requires large SMP nodes to minimize surface-to-volume communication, large cache capacity,andhigh-bandwidth, low-latency communications. • expected to achieve the ~1000 timesteps per second needed for realistic simulation of stress-corrosion cracking. Courtesy of PriyaVashishta, USC A crack in the surface of a piece of metal grows from activity of atoms at the point of cracking. Quantum-level simulation (right panel) leads to modeling the consequences (left panel). From http://viterbi.usc.edu/news/news/2004/2004_10_08_corrosion.htm

  22. Analyzing the Spread of Pandemics • Understanding the spread of infectiousdiseases is critical for effective responseto disease outbreaks(e.g. avian flu). • EpiFast: a fast, reliable method for simulating pandemics, based on a combinatorial interpretation of percolation on directed networks • Madhav Marathe, Keith Bisset, et al.,Network Dynamics and SimulationsScience Laboratory (NDSSL) at Virginia Tech • Large shared memory is needed for efficientimplementation of graph theoretic algorithmsto simulate transmission networks that modelhow disease spreads from one individual to the next. • 4TB of shared memory will allow study of world-wide pandemics. From Karla Atkins et al., An Interaction Based Composable Architecture for Building Scalable Models of Large Social, Biological, Information and Technical Systems, CTWatch Quarterly March 2008 http://www.ctwatch.org

  23. Engaging New CommunitiesMemory-Intensive Graph Algorithms • Web analytics • Applications: fight spam, rank importance, cluster information, determine communities • Algorithms are notoriously hard to implement on distributed memory machines. Link • 1010 pages • 1011 links • 40 bytes/link → 4TB web page courtesy Guy Blelloch (CMU)

  24. More Memory-Intensive Graph Algorithms interaction session protein IP packet biological pathways computer security adjacency common receipt item word Also: epidemiology,social networks, … courtesy Guy Blelloch (CMU) analyzing buying habits machine translation

  25. PSC T2c: Summary • PSC’s T2c system, when awarded, will leverage architectural innovations in the processor (Intel Nehalem-EX) and the platform (SGI Project Ultraviolet) to enable groundbreaking science and engineering simulations using both “traditional HPC” and emerging paradigms • Complement and dramatically extend existing NSF program capabilities • Usability features will be transformative • Unprecedented range of target communities • perennial computational scientists • algorithm developers, especially those tackling irregular problems • data-intensive and memory-intensive fields • highly dynamic workflows (modify code, run, modify code again, run again, …) • Reduced concept-to-results time transforming NSF user productivity

  26. Integrated in National Cyberinfrastructure • Enabled and supported by PSC’s advanced user support,application and system optimization, middleware and infrastructure, and leveraging national CyberInfrastructure

  27. Questions?

  28. Predicting Mesoscale Atmospheric Phenomena • Accurate prediction of atmosphericphenomena at the 1-100km scale isneeded to reduce economic lossesand injuries due to strong storms. • To achieve this, we require 20-memberensemble runs of 1 km resolution,covering the Continental US, withdynamic data assimilation inquasi-real time. • Ming Xue, University of Oklahoma • Reaching 1.0-1.5 km resolution is critical.(In certain weather situations, fewerensemble members may suffice.) • Expected to sustain 200 Tf/s for WRF, enabling prediction of atmospheric phenomena at the mesoscale. Fanyou Kong et al., Real-Time Storm-Scale Ensemble Forecast Experiment – Analysis of 2008 Spring Experiment Data, Preprints, 24th Conf. on Severe Local Storm, Amer. Metor. Soc., 27-31 October 2008.http://twister.ou.edu/papers/Kong_24thSLS_extendedabs-2008.pdf

  29. Reliability • Hardware-enabled fault detection, prevention, containment • Enhanced monitoring and serviceability • Numalink automatic retry, various error correcting mechanisms

More Related