120 likes | 214 Views
SDSC RP Update Trestles Recent Dash results Gordon schedule SDSC’s broader HPC environment Recent EOT activity. March 24, 2011. Trestles - System Description. Trestles - Configuring for productivity for modest-scale and gateway users. Allocation Plans Target users that need <=1K cores
E N D
SDSC RP UpdateTrestles Recent Dash resultsGordon scheduleSDSC’s broader HPC environmentRecent EOT activity March 24, 2011
Trestles - Configuring for productivity for modest-scale and gateway users • Allocation Plans • Target users that need <=1K cores • Plan to allocate ~70% of the theoretically available SUs • Cap allocation per project at 1.5M SUs/year (~2.5% of annual total). • Allow new users to request up to 50,000 SUs in startup allocations, and front-load the SUs offered during the first few allocations cycles. • Configure the job queues and resource schedulers for lower expansion factors and generally faster turnaround. • Challenge will be to maintain fast turnaround as utilization goes up • Services • Shared nodes • Long-running queue • Advance reservations • On-demand queue • ~20 nodes set aside in on-demand queue. Users can run here at 25% (TBR) discount. • Jobs may be pre-empted (killed) at any time for on-demand users (initial pathfinder is SCEC for realtime earthquake analyses)
Results from the CIPRES gateway • Identify evolutionary relationships by comparing DNA • To date, >2000 scientists have run more than 35,000 analyses for 100 completed studies. These studies span a broad spectrum of biological and medical research. The following discoveries were made by scientists using the Gateway over the past year: • Hepatitis C virus evolves quickly to defeat the natural human immune response, altering the responsiveness of the infection to interferon therapy. • Humans are much more likely to infect apes with malaria than the reverse. • Toxic elements in local soils influence the geographical distribution of related plants. • Red rice, a major crop weed in the US, did not arise from domestic rice stock. • Beetles and flowering plants adapt to each other over time, to the benefit of both species. • Viruses can introduce new functions into Bakers’ yeast in the wild. • A microbe called Naegleriagruberi, which can live with or without oxygen, provided new insights into the evolutionary transition from oxygen-free to oxygen-breathing life forms.
Recent Dash Results: Flash Application Benchmarks • LiDAR Topographical database: Representative query of a 100 GB topographical database • Test configuration: Gordon I/O nodes. 1 with 16 SSD’s an 1 with 16 spinning disks. Running single and concurrent instances of DB2 on the node. • EM_BFS: Solution of 300M node using flash for out-of-core • Test configuration: Gordon I/O nodes. 1 with 16 SSD’s, and 1 with 16 spinning disks. • Abaqus: S4B – Cylinder Head Bolt Up. Static analysis that simulates bolting a cylinder head onto an engine block. • Test Configuration: Single Dash compute node run comparing local I/O to spinning disk and flash drive. • Reverse Time Migration: Acoustic Imaging/Seismic Application • Test Configuration: Dash Compute nodes with local SSD and local spinning disk. • Protein databank: Repository of 3D structures of molecules • Test configuration: Gordon I/O nodes. 1 with 16 SSD’s, and 1 with 16 spinning disks.
Flash Provides 2-4x Improvement in Run Times for LiDAR Query; MR-BFS, and Abaqus
Gordon Schedule (Approximate) • Sixteen production-level flash I/O nodes are already in-house for testing • Early results for a single I/O node: Random I/O (4K blocks): Read 420K IOPS, Write 165K IOPS • Sandy Bridge availability early summer • System delivery to SDSC late summer • Friendly user late fall • Production before end of CY11 • First allocation meeting: “Sept” cycle
SDSC’s Broader HPC Environment • In addition to TeraGrid systems … SDSC operates: • Triton (Appro 256-node cluster + 28 Sun large-memory nodes 256/512 GB) • SDSC system supporting staff, industrial partners, & UCSD/UC users • Thresher (IBM 256-node cluster) • UC-wide system, along with Mako at LBNL, operated for systemwide users as part of a UC-wide Shared Research Computing Services (ShaRCS) pilot for condo computing • Data Oasis –Lustre parallel file system • Shared by Triton, Trestles (and Gordon) • Phase 0 – 140 TB • Phase 1 – currently in procurement, ~ 2PB (raw), ~50 GB/s BW • Phase 2 – Summer 2012 – expansion for Gordon to 4PB, ~100 GB/s
Recent EOT Activity • Planning • Spring vSMP training workshop • Track 2D Early User Symposium (in conjunction with TG-11) • SDSC Summer Institute on HPC and Data-Intensive Discovery in Environmental and Ecological Sciences, featuring TG resources and TG Science Gateways • Presenting a poster and hosting the TG booth at the Tapia Conference in April; will promote TG-11, internship and job opportunities, and encourage new TG/XD users. • Computational Research Experience for Undergraduates (CREU) program this spring, and REHS (Research Experiences for High School Students) in summer 2011 • Last year's program was very successful. Applications this year are very strong. • TeacherTECH and StudentTECH programs are continuing, 2-3 per week. • Portal development continues for Campus Champions and MSI-CIEC communities. • Partnership with the San Diego County chapter of the Computer Science Teachers Association will host their second joint meeting in May (first in February). • Engaging with state-wide effort to bring CS education to all high schools