490 likes | 668 Views
Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration. Mission. Provide advanced computing resources required by a major research university Software Hardware Training Support. User Base. 40 Research groups
E N D
Research ComputingUniversity Of South FloridaProviding Advanced Computing Resources for Research and InstructionthroughCollaboration
Mission Provide advanced computing resources required by a major research university Software Hardware Training Support
User Base 40 Research groups 6 Colleges 100 faculty 300 students
Hardware System was build on the condominium model and consists of 300 Nodes 2400 Processors University provides infrastructure and some computational resources Faculty funding provides bulk of computational resources
Software Over 50 scientific codes Installation Integration Upgrades Licensing
Support Personnel Provide all systems administration Software support One-on-one consulting System efficiency improvements Users are no longer just the traditional “number crunchers
Current Projects Consolidating the last standalone cluster (of appreciable size) Advanced Visualization Center Group of 19 Faculty applied for funding Personnel Training Large Resolution 3D display
Current Projects New computational resources Approximately 100 nodes GPU resources Upgrade parallel file system Virtual Clusters HPC for the other 90 % FACC
Florida State University's Shared HPC Building and Maintaining Sustainable Research Computing at FSU
Shared-FSU HPC Mission • Support multidisciplinary research • Provide a general access computing platform • Encourage cost sharing by departments with dedicated computing needs • Provide a broad base of support and training opportunities
Turn-key Research SolutionParticipation is Voluntary • University provides staffing • University provides general infrastructure • Network fabrics • Racks • Power/Cooling • Additional buy-in incentives • Leverage better pricing as a group • Matching funds • Offer highly flexible buy-in options • Hardware purchase only • Short-term Service Level Agreements • Long-term Service Level Agreements • Shoot for 50% of hardware costs covered by Buy-in
Research Support @ FSU • 500 plus users • 33 Academic Units • 5 Colleges
HPC Owner Groups • 2007 • Department of Scientific Computing • Center for Ocean-Atmosphere Prediction Studies • Department of Meteorology • 2008 • Gunzburger Group (Applied Mathematics) • Taylor Group (Structural Biology) • Department of Scientific Computing • Kostov Group (Chemical & Biomedical Engineering) • 2009 • Department of Physics (HEP, Nuclear, etc.) • Institute of Molecular Biophysics • Bruschweiler Group (National High Magnetic Field Laboratory) • Center for Ocean-Atmosphere Prediction Studies (with the Department of Oceanography) • Torrey Pines Institute of Molecular Studies • 2010 • Chella Group (Chemical Engineering) • Torrey Pines Institute of Molecular Studies • Yang Group (Institute of Molecular Biophysics) • Meteorology Department • Bruschweiler Group • Fajer Group (Institute of Molecular Biophysics) • Bass Group (Biology)
Research Support @ FSU • Publications • Macromolecules • Bioinformatics • Systematic Biology • Journal of Biogeography • Journal of Applied Remote Sensing • Journal of Chemical Theory and Computation • Physical Review Letters • Journal of Physical Chemistry • Proceeding of the National Academy of Science • Biophysical Journal • Journal Chemical Theory Computation • Journal: J. Phys. Chem. • PLoS Pathogens • Journal of Virology • Journal of the American Chemical Society • The Journal of Chemical Physics • PLoS Biology • Ocean Modeling • Journal of Computer-Aided Molecular Design
FSU’s Shared-HPCStage 1: Infiniband Connected Cluster Sliger Data Center Shared-HPC pfs
FSU’s Shared-HPCStage 2: Alternative Backfilling DSL Building Condor Sliger Data Center Shared-HPC pfs
Backfilling Single Proc Jobs on Non-HPC Resources Using Condor
Condor Usage • ~1000 processor cores available for single processor computations • 2,573,490 processor hours used since Condor was made available to all HPC users in September • Seven users have been using Condor from HPC • Dominate users are Evolutionary Biology, Molecular Dynamics, and Statistics (same users that were submitting numerous single proc. jobs) • Two workshop introducing it to HPC users
FSU’s Shared-HPCStage 3: Scalable SMP DSL Building Condor Sliger Data Center Shared-HPC pfs SMP
FSU’s Shared-HPCStage 3: Scalable SMP • One MOAB Queue for SMP or very large memory jobs • Three “nodes” • M905 blade with 16 cores and 64GB mem • M905 blade with 24 cores and 64GB mem • 3Leaf system with up to 132 cores and 528 GB mem
DSL Building Condor Sliger Data Center Shared-HPC pfs SMP DSL Data Center 2° fs Vis
Interactive ClusterFunctions • Facilitates data exploration • Provides venue for software not well suited for a batch scheduled environment • (e.g., some MatLab, VMD, R, Python, etc.) • Provides access to hardware not typically found on standard desktops/laptops/mobile devises (e.g. lots of memory, high-end GPUs) • Provides licensing and configuration support for software applications and libraries
Interactive ClusterHardware Layout • 8 high-end CPU based host nodes • Multi-core Intel or AMD processors • 4 to 8 GB of memory per core • 16X PCIe connectivity • QDR IB connectivity to Luster storage • IP (read-only) connectivity to Panasas • 10 Gbps connectivity to campus network backbone • One C410x external PCI chassis • Compact • IPMI management • Supports up to 16 NVIDIATesla M2050 • Up to 16.48 teraflops
DSL Building Condor Sliger Data Center Shared-HPC pfs SMP DSL Data Center 2° fs Vis Db.Web
Web/Database HardwareFunction • Facilitates creation of Data analysis Pipelines/Workflows • Favored by external funding agencies • Demonstrated cohesive Cyberinfrastructure • Fits well into required Data Management Plans (NSF) • Intended to facilitate access to data on Secondary storage or cycles on owner share of HPC • Basic Software Install, no development support • Bare Metal or VM
FSU Research CI HTC HPC DB and Web 2° Storage 1° storage Vis and interactive SMP
Florida State University's Shared HPC • Universities are by design multifaceted and lack a singular focus of support • Local HPC resources should also be multifaceted and have a broad basis of support
University of Florida HPC Center HPC Summit
Short history Started in 2003 2004 Phase I: CLAS – Avery – OIT 2005 Phase IIb: COE – 9 investors 2007 Phase IIb: COE – 3 investors 2009 Phase III: DSR – 17 investors - ICBR - IFAS 2011 Phase IV: 22 investors HPC Summit
Budget Total budget 2003-3004 $0.7 M 2004-2005 $1.8 M 2005-2006 $0.3 M 2006-2007 $1.2 M 2007-2008 $1.6 M 2008-2009 $0.4 M 2009-2010 $0.9 M HPC Summit
Hardware 4,500 cores 500 TB storage InfiniBand connected In three machine rooms Connected by 20 Gbit/sec Campus Research Network HPC Summit
System software RedHat Enterprise Linux through free CentOS distribution upgrade once per year Lustre file system mounted on all nodes Scratch only Provide backup through CNS service Requires separate agreement between researcher and CNS HPC Summit
Other software Moab scheduler (commercial license) Intel compilers (commercial license) Numerous applications Open and commercial HPC Summit
Operation Shared cluster some hosted systems 300 users 90% - 95% utilization HPC Summit
Investor Model Normalized Computing Unit $400 per NCU Is one core In fully functional system (RAM, disk, shared file system) For 5 years HPC Summit
Investor Model Optional Storage Unit $140 per OSU 1 TB of file storage (RAID) on one of a few global parallel file systems (Lustre) For 1 year HPC Summit
Other options Hosted system Buy all hardware, we operate No sharing Pay as you go Agree to pay monthly bill Equivalent (almost) to $400 NCU prorated on a monthly basis Or rates are 0.009 cents per hour Cheaper than Amazon Elastic Cloud HPC Summit
Mission Statement • UM CCS is establishing nationally and internationally recognized research programs, focusing on those of an interdisciplinary nature, and actively engaging in computational research to solve the complex technological problems of modern society. We provide a framework for promoting collaborative and multidisciplinary activities across the University and beyond
CCS overview • Started in June 2007 • Faculty Senate approval in 2008 • Four Founding Schools: A&S, CoE, RSMAS, Medical • Offices in all Campus • ~30 FTEs • Data Center at the NAP of Americas
UM CCS Research Programs and Cores Physical Science & Engineering Data Mining Computational Biology & Bioinformatics Visualization Computational Chemistry Social Systems Informatics High Performance Computing Software Engineering
Over 1,000 UM users • 5,200 cores of Linux Based Cluster • 1,500 cores of Power-based Cluster • ~2.0 PT of Storage • 4.0 PT of Back-up • More at: • http://www.youtube.com/watch?v=JgUNBRJHrC4 • www.ccs.miami.edu Quick Facts
High Performance Computing • UM Wide Resource Provides Academic Community & Research Partners with Comprehensive HPC Resources: • Hardware & Scientific Software Infrastructure • Expertise in Designing & Implementing HPC Solutions • Designing & Porting Algorithms & Programs to Parallel Computing Models • Open Access of compute processing (first come serve) • Peer Review for large projects – Allocation Committee • Cost Center for priority access • HPC services • Storage Cloud • Visualization and Data Analysis Cloud • Processing Cloud