140 likes | 296 Views
TeraGrid Forum Meeting June 16, 2010. TeraGrid Coordination Meeting June 10, 2010. The Gordon Sweet Spot. Data Mining De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations.
E N D
TeraGrid Forum Meeting June 16, 2010 TeraGrid Coordination Meeting June 10, 2010
The Gordon Sweet Spot • Data Mining • De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations. • Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. • Predictive Science • Solution of inverse problems in oceanography, atmospheric science, & seismology. • Modestly scalable codes in quantum chemistry & structural engineering. Large Shared Memory; Low Latency, Fast Interconnect; Fast I/O system
Typical HPC I/O has very little Random I/O – which is a sweet spot for SSD’s and Data Intensive Computing • For example, NERSC study * of 50 applications found: • Random access is rare for HPC applications; the I/O access is dominated by Sequential operations. • Applications I/O dominated by append-only writes • The majority of applications have adopted a one-file-per-processor approach to disk-I/O where each process of a parallel applications writes to its own separate file rather than using parallel/shared I/O API’s to write from all of the processors into a single file. * Source: Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark (Shalf, et al, SC ‘08)
Data Intensive WorkshopOctober 26-29, 2010 • Identify "Grand Challenges" in data-intensive science across a broad range of topics • Identify applications and disciplines that will benefit from Gordon's unique architecture and capabilities • Invite potential users of Gordon to speak and participate • Make leaders in data-intensive science aware of what SDSC is doing in this space • Raise awareness among disciplines poorly served by current HPC offerings • Better understand Gordon's niche in the data-intensive cosmos and potential usage modes • Logistics: • ~100 attendees; @SDSC; incl. 1-day hands-on; plenary speakers; astronomy, geoscience, neuroscience, physics, engineering, social science, and data-related technologies
Gordon Highlights • 245TF; 1024 Nodes; 64GB/node (64TB) • Sandy Bridge processor • Dual socket • Core count TBD • 8 flops/clock/core via AVX instruction set • 256TB Enterprise Intel SSD via 64 Nehalem/Westmere I/O Nodes (4TB per node) • Dual rail, QDR 3D torus IB Interconnect • Shared memory supernodes via ScaleMP vSMP Foundation • 32 Compute nodes/supernode • 128 node version launching in fall • Message passing between supernodes coming • 4PB Data Oasis Disk
Gordon Supernode Architecture • 32 Appro GreenBlade • Dual processor Intel Sandy Bridge • 240 GFLOPS • 64 GB/node • # Cores TBD • 2 Appro IO nodes/32 SN • Intel SSD drives • 4 TB ea. • 560,000 IOPS • ScaleMPvSMP virtual shared memory • 2 TB RAM aggregate (64GBx32) • 8 TB SSD aggregate(256GBx32) 4 TB SSD I/O Node 240 GF Comp. Node 64 GB RAM 240 GF Comp. Node 64 GB RAM vSMP memory virtualization
Project Milestones • Dash is now a TeraGrid resource • Allocation processes • Allocated users • Account setup • Application Environment • 16-Way vSMP Acceptance Approved • SDSC is becoming a flash center of excellence in HPC. Working closely with Dr. Steve Swanson in UCSD’s Center for Magnetic Recording Research (CMRR) • Education, Outreach and Training • Data Intensive Workshop set for October 26-29 at SDSC. • NVM Workshop at UCSD in April • SC ‘10 Papers submitted • TeraGrid 2010 papers, tutorial, BOF submitted • Data intensive use cases being developed
Production Dash as of April 1 • Two 16 node virtual clusters • SSD-only • 16 node; Nehalem, dual socket 8 core; 48GB ; 1 TB SSD (16) • SSD’s are local to the nodes • Standard queues available • vSMP + SSD • 16 nodes, Nehalem , dual socket, 8 core, 48GB; 960GB SSB (15) • SSD’s are local to the nodes • Treated as a single shared resource • GPFS-WAN • Additional 32 nodes will be brought online after the vSMP 32-way acceptance testing in July
The Road Ahead • Understanding data intensive applications and how they can benefit from Gordon’s unique architecture • Identifying new user communities • Education, Outreach and Training • Managing to the schedule and milestones • Track and assess flash technology developments • Education, Outreach and Training • I/O performance • Parallel file systems • InfiniBand/3D torus routing • Individual roles and responsibilities • Systems management processes • Education, Outreach and Training • Staffing ramp-up in October • Have fun doing this!
TeraGrid Support has been Instrumental • Diane Baxter • Jeff Bennett • Leo Carson • Larry Diegel • Jerry Greenberg • Dave Hart • Jiahua He • Eva Hocks • Tom Hutton • Arun Jagatheesen • Adam Jundt • Richard Moore • Mike Norman • Wayne Pfeiffer • Susan Rathbun • Scott Sakai • Allan Snavely • Mark Sheddon • Shawn Strande • Mahidhar Tatineni • And many others…
SDSC’s Summer Education Program • TeacherTech summer workshops http://education.sdsc.edu/teachertech • Conference of New Teachers in Genomics • Modeling Instruction in High School Physics: An Introduction • Introduction to Adobe Photoshop and the World of Digital Art • TeacherTECH Begins a Collaboration with UCSD-TV – Tune In! • Newton’s Laws of Gravity: From the Celestial to the Terrestrial • Earthquake Science: Beyond Static Images and Flat Maps • Student summer workshops http://education.sdsc.edu/teachertech/index.php?module=ContentExpress&func=display&ceid=18 • Exploring the World of Digital Art and Design • Introduction to Matlab: An Interactive Visual Math Experience • UCSD Biotechnology Academy • "Full Color Heroes" in Digital Art & Design: Comic Book Coloring! • 2D – 3D Insani-D! • 3D Photography: Experience It! • Photography + Photoshop = Fun! • Exploring Digital Photography and the Wonders of Photoshop • Introduction to Maya and 3D Modeling
SDSC’s Summer Education Program (cont.) • Research Experience For High School Students (REHS) (21 students) http://education.sdsc.edu/teachertech/index.php?module=ContentExpress&func=display&ceid=37 • Supercomputer-based Workflow for Managing Large Biomedical Images • Refinement of Data Mining Software and Application to Space Plasmas for Data Analysis and Visualization • Sonification of UCSD Campus Energy Consumption • Visualization and 3D Content Creation • The Cooperative Association for Internet Data Analysis Web Development Intern • Documentation Assistant – Health Info Databases Project