90 likes | 258 Views
UC Cloud Summit 2011 – LBL Campus Update. Krishna Muriki, (kmuriki@lbl.gov) High Performance Computing Services (HPCS), IT Division. HPC activities – Condo Cluster Computing: Model A new cluster support model. To achieve flexibility, sharing and better utilization of hardware.
E N D
UC Cloud Summit 2011 – LBL Campus Update Krishna Muriki, (kmuriki@lbl.gov) High Performance Computing Services (HPCS), IT Division.
HPC activities – Condo Cluster Computing: • Model • A new cluster support model. • To achieve flexibility, sharing and better utilization of hardware. • PIs purchase cluster hardware (nodes, leaf switches & cables). • PIs can purchase any additional storage other than provided. • HW in this condo has to be refreshed before 4 years. • PIs get free compute time equivalent to their contribution. • Making a Condo • HW connected and shared with institutional cluster Lawrencium. • PI purchased storage will be accessible on all the condo nodes. • Scheduling policies tuned to give faster turn around time for PI jobs. • Advantages • Monthly cluster support charges are waived. • Flexibility for PIs to use more resources (than purchased) when needed. • Easy mechanism to share idle resources to other users in the Lab.
IT Cloud Developments: • Evaluating services for the past 3 years • Google Apps including Google Docs and Sites • Google Calendar • Collaborative services like Manymoon and Smartsheet. • Business Systems • Point and Ship (for managing shipping) • Daptiv (Ops project management) • All systems leverage IT’s identity Management Infrastructure (SAML/Shib). • Future Cloud Developments: • Additional Google apps like Google code, Reader & Picasa • Taleo, a SaaS Talent Management Application. • Carbonite, a SaaS service for user-managed desktop backups.
IT service - Virtual Machine Hosting: • VMWARE based virtual machine environment • Over 100 virtual machines running • IT service - Cloud Hosting (Amazon EC2 server): • Provides computing resources on Amazon’s AWS Platform. • CentOS AMIs with standard IT monitoring tools • Option to create a VPN connection to LBL. • IT manages the OS and Amazon layers
Support from Network Infrastructure - ESNET • ESnet peers with multiple cloud providers (including Amazon, Google, Microsoft). • When possible, we peer in multiple locations (Bay Area, Chicago, etc) and we're eager to peer with other providers as well • We're interested in pushing advanced network services (including virtual circuits and performance monitoring) into cloud contexts • Multiple DOE-funded scientists are actively researching clouds for computation, services, storage. • Several DOE sites are sourcing cloud services • Questions about ESnet and cloud? Please send email to routing@es.net
Experiments with Amazon EC2 services: • Seeking Supernovae in the Clouds : A Performance Study • – K. Jackson, L. Ramakrishnan, K. Runge, R. Thomas. • AWS can be very useful for scientific computing. • Porting today requires significant effort. • Failures occur frequently and application must be able to handle them gracefully. • Performance Analysis of HPC applications on the AWS Cloud • – K. Jackson, L. Ramakrishnan, K. Muriki, S. Cannon, S. Cholia, J. Shalf, H. Wasserman, N. Wright. • Data shows that the more communication in application, the worse EC2 performance becomes. • Variability introduced by the shared nature of the virtualized environment causes significant variability in EC2 performance. • Berkeley Lab Contributes Expertise to New Amazon Web Services Offering • “When we applied these tests to the new Cluster Computer Instances for Amazon EC2, we found that the new offering performed 8.5 times faster than the previous Amazon instance types.” --K. Jackson.
Cloud computing for Science. • -- G. Bell, K. Jackson, G. Kurtzer, J. Li, K. Muriki, L. Ramakrishnan, J. White. • Large scale MPI has a high overhead on EC2. • Enables data-intensive science.
HPC Cloud Applied to Lattice Optimization • – C. Sun, H. Nishimura, S. James, K. Song, K. Muriki, Y. Qin • “Increased performance of the recently introduced AWS CCI instances better meet the needs of scientific community, however EC2 may work less well for large-scale parallel applications that depend heavily on memory and interconnect performance. • It remains important for researchers to benchmark their particular application and review the local costs when making a decision to use the cloud.”
NERSC Initiatives: • Magellan Project Mission: • Determine the appropriate role for commercial and/or private cloud computing for DOE/SC midrange workloads • Deploy a test bed cloud to serve the needs of mid-range scientific computing. • Evaluate the effectiveness of this system for a wide spectrum of DOE/SC applications in comparison with other platform models.