120 likes | 134 Views
Mid-Range Computing Working Group Report. Assessing and Defining the Future of Institutional Scientific Computing Resources at Berkeley Lab. CSAC and ITSD are working in partnership to determine the value of a Lab-wide Scientific Computing Resource
E N D
Mid-Range Computing Working Group Report Assessing and Defining the Future of Institutional Scientific Computing Resources at Berkeley Lab CSAC and ITSD are working in partnership to determine the value of a Lab-wide Scientific Computing Resource for the future of LBNL scientific research • Members • Past and Current Activities • New Approach and Plan August 10, 2001 - CSAC meeting
Members of the MRC WG ITSD CSAC Paul Adams PBS Ali Belkacem CSD Alessandra Ciocio Physics (chair) Ken Downing LSD Doug Olson NSD John Staples AFRD Shaheen Tonse EETD Michel Van Hove MSD Tammy Welcome NERSC Sandy Merola Jim Leighton Gary Jung Jon Bashor (CS) Yeen Mankin (CS) Erik Richman (TEID) August 10, 2001 - CSAC meeting
Past and Current Activities • Plan forLecture Series (Ali) • To emphasize the use of large-scale computing in furthering scientific computations • To raise awareness of mid-range computing among LBNL scientists • Web-based survey (P. Adams and S. Tonse) • To determine the interest and needs of LBNL scientists in the area of MRC and identify key users • Survey of MRC capability of other Labs (D.Olson and M. Van Hove) • To uncover success and failures of different MRC models • Study of current usage of computing resources at the Lab (G. Jung) • To find out where MRC would fit in to the current range of LBNL computers • Cost estimate for different scenarios (3 years) (T. Welcome and G. Jung) • alvarez, alvarez+, new cluster, SMP • Financial Model (Ali) • Formal Document (Jon Bashor) • To unify all documents and information August 10, 2001 - CSAC meeting
Study of current usage of computing resources at the Lab To find out where an MRC would fit in to the current range of LBNL computers from Group Server Workstations to NERSC By estimating the scale of computation at the Lab and comparing the relative power of the various computing platforms MRC may be the right replacement for the ended T3E program NERSC Alvarez Cluster Performance PDSF Small Clusters Group Server WS Users-divisions August 10, 2001 - CSAC meeting
LBNL Use of Computing Resources August 10, 2001 - CSAC meeting
LBNL Clusters and SMP Systems August 10, 2001 - CSAC meeting
Cost Estimate August 10, 2001 - CSAC meeting
Financial Model • The financial model must take into account the fiscal realities of Berkeley Lab • The Lab has been reducing overhead and this trend is not likely to be reversed • Relying to a large degree on recharge to fund the operation and upgrades of a facility not viable • Some scientific divisions within the Lab already spend a substantial portion of their budget • on scientific computing (hardware, software and support) every year • To provide an attractive alternative, a mid-range computing resource would have to be significantly • more powerful than a system that could be procured at the division level and the associated support • costs would have to be shown to be reasonable • A Viable Financial Model • Strong commitment (and funding) up front from scientific programs and divisions, • in conjunction with a contribution from Lab overhead funds. • A facility that essentially belongs to the scientific divisions and is configured with input from the users. • Operation and system management should be funded through overhead and would be provided • by the computing support component of ITSD. • Having the system centrally managed would benefit the supporting divisions by relieving them • of responsibility for operation and management, software, maintenance costs and cyber security. • The option of leveraging NERSC resources could also be explored. • Divisions supporting the system with funding would receive use of the resource in proportion to their financial support • Divisions that don’t buy in could still have access to the resource, but on a recharge basis August 10, 2001 - CSAC meeting
Formal Document Assessing and Defining the Future of Institutional Scientific Computing Resources at Berkeley Lab • A proposal compiled by the Mid-Range Computing Working Group of the Computing and • Communications Services Advisory Committee and the Information Technologies and Services Division • Sections: • Mid-Range Computing Working Group Members • Executive Summary • Is an Institutional Mid-Range Computing Resource Appropriate for Berkeley Lab? • How the Working Group is Proceeding • Two critical Components for Success • History and Current Status of High-Performance Computing at Berkeley Lab • What are Berkeley Lab’s MRC Options • A Financial Model for Institutional Mid-Range Computing • Where do we go from here? • Appendices: • Survey of key users of scientific computing at LBNL • Information on MRC at other labs • Cost estimate August 10, 2001 - CSAC meeting
New Approach • Original process: • Lecture series/Publicity • Identify key users through web-based survey • Division Directors buy in • Retreat with key users and technical experts to define architecture • Recommendations to upper level LAB management A preliminary positive feedback from the upper level Lab’s management is suggesting to follow a different approach (top-down) and the need to define a more concrete proposal. • Therefore: • - We recognize that the target computer will be alvarez • - We are going to redirect the effort to acquiring alvarez once it becomes available • - We are going to work on defining more clearly costs and schedules associated with • promoting alvarez to an MRC facility at LBNL • Since presently alvarez is not really a user facility, converting it to such will add some costs to those of the present operation. The first year of MRC would probably see a substantial contribution from LBNL, but by the third year, the facility should be at least 80% self-sufficient August 10, 2001 - CSAC meeting
Direct impressions on alvarez • The MRC WG met with Rob Ryne to discuss his alvarez computing experience • The programming environment is good • The compilers are of good quality • Alvarez is not production-ready yet. • - The one support person is very competent, but not 100% devoted to this one machine. • - Needs a dedicated support person. • The software support does not include any large math packages yet even though adequate so far • Users should be guaranteed a minimum number of CPU cycles and access • The ideal alvarez environment includes support for consulting, software help and installation and • maintenance. One person may be enough to do this for a 160 node machine. • General comments: • Level of support: an MRC machine would receive more support attention than individual small cluster • Operating money for alvarez must be a mixture of specific user contribution along with lab overhead • To jump-start the process, the lab subsidy would be larger in the startup years. • We can't use NERSC people to administer MRC long-term, but we can leverage from their expertise at • startup August 10, 2001 - CSAC meeting
Plan • Find out which divisions are interested and about to acquire or expand their own cluster • Need to find 5-10 potential users who would not upgrade their own computing facilities, • but would support alvarez instead as an initial critical mass for MRC • - CSAC members to help • Demonstrate that alvarez may provide a more cost-effective platform for scientific computing • If alvarez grew up (1000 nodes): • - Economically more viable • The total number of administrators would be reduced • Reliability of MRC better than private machines, as it would be administered professionally • Users should be guaranteed a minimum number of CPU cycles and access • Consulting and support, as well as HPSS and large peak capability are the attractions for users • moving from private to NERSC • Get potential future users involved in early discussions to define the requirements • Further discussion with main alvarez users to evaluate advantages and usefulness of the cluster • and preliminary understanding of requirements for possible upgrades • Executive summary meeting of MRC WG with McCurdy • Workshop with users and technical experts to further develop an MRC facility based on alvarez August 10, 2001 - CSAC meeting