1 / 19

HPC University Training Meeting

HPC University Training Meeting. Welcome!! March 26-27, 2008 http://www.teragridforum.org/mediawiki/index.php?title=Training_Implementation. Goals, Objectives and Outcomes. Understand community needs in relation to available resources Identify competencies and gaps in offerings

harry
Download Presentation

HPC University Training Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPC University Training Meeting Welcome!! March 26-27, 2008 http://www.teragridforum.org/mediawiki/index.php?title=Training_Implementation

  2. Goals, Objectives and Outcomes • Understand community needs in relation to available resources • Identify competencies and gaps in offerings • Establish mechanisms to disseminate and promote quality resources • Expand breadth and depth of training resources to address community needs • Foster continued information sharing and collaborations • Other?

  3. Perspectives from the Field • Defining HPC University - Scott • HPC RAT recommendations - Laura • Computational Science Competencies – Steve • On-line instruction methodologies - Sandie • Collaborative efforts - Leslie • Petascale training gaps - Shawn • Community engagement strategies - Kathy • Dissemination and Quality assurance – Joiner • Survey feedback - Julia

  4. Defining HPC University • Establishing competencies for skilled HPC educators, researchers, and practitioners • Defining roadmap for acquiring these competencies - from K-12 to researchers • Providing access to high-quality resources • Broadly disseminating information about events, activities, and resources • Cross-cutting among all disciplines • Requires collaboration among multiple agencies and organizations for broad impact • Certificate and degree-granting opportunities

  5. Proposed Discussion Topics • On-line instruction methodologies • Quality assurance – VV&A • Promotion, scaling and dissemination • Petascale training gaps • HPC Roadmap • Collaboration and coordination strategies

  6. Survey Summary • Responses from 12 sites • Audience: current and potential users, undergrads, grads, postdocs, industry, senior researchers, non-traditional communities, professionals, sysadmins

  7. Short-term Goals • Train users in advanced parallel programming (MPI, OpenMP and hybrid) through hands-on workshops, in-depth consulting, and knowledgeable online content to move them into tera- and petascale computing • Help users learn performance tuning, optimization and scaling to peta-scale systems. • ducate users to best use HPC facilities and services • Beginning to advanced courses in parallel programming • Facilitate effective use of high performance computing resources • Disseminate knowledge of tools and application software • Familiarize users with introductory grid computing strategies

  8. Long-term Goals • On-line self paced training, record all live training sessions • Enhance all training events • Develop next generation of top HPC researchers • Prepare users for effective use of future resources • Prepare people to be effective grid users • Broaden participation in supercomputing amongst a variety of scientific disciplines and user communities • Provide more proactive personalized help, supplemented with online resources and infrastructure more capable of responding quickly to user needs.

  9. Selecting Training Topics • Provide both ‘getting started’ courses and advance courses on scaling and optimization • Help users to effectively use facilities: • access machines, batch systems, program to best use the available hardware, transfer data, use mass storage, performance and debugging tools, compile for performance, etc. • We use our ticketing system, suggestions on training surveys and through email, suggestions from researchers and topics our HPC support staff are interested in to select topics • interacting with consulting and applications support staff to identify what users need based on their interactions. Depends on subject matter experts being available to develop content • Feedback from workshop evaluations, new architectures and tools, and their fit with Ralph Regula School of Computational Science curricula and competencies

  10. Selecting Topics (cont.) • Topic selection and level customized for individual • Job management, data management, security, workflow management systems, storage resource managers. • Site admin training. • Perceived need and user requests and of course instructor availability. • We always have an abundance of introductory courses because we continually have new students on campus. Our clientele is mostly graduate students so the need is there.

  11. Evaluating Impact • Workshop feedback forms • Annual User Survey • Suggestion period at the end of each full or multi-day event • Post-event evaluation forms for live events • Optional online form for online tutorials • Number of projects ported to the OSG grid, number of jobs run, number of papers published in which our infrastructure was used to produce results, number of new students/faculty joining our efforts, number of grid computing courses introduced at different institutions as a results of our training. • Assessment responses about participants’ new knowledge & skills that they can apply to their research after the training class

  12. Live vs. Asych • Formal training has been done as f2f events • Local presentations are provided as WebEx meetings and teleconferences. • We have done a few remote-only events (access grid), but they were poorly attended • We try to make all presentation and lab materials available online for reference • Present intermediate to advanced topics at live events and cover introductory topics and how-to programming topics on-line • We hold small-group meetings with discipline specific groups to gain a greater understanding of their computational and scientific needs • Developing simulation-based modules for use as curriculum in the classroom • Propose to capture as many workshops, seminars, presentations and deliver asynch

  13. New Development • Additional sources of info available via the Web • Asynch training via web and NCAST videos of current training • New topics as our users become more sophisticated • Multi-core capabilities • Revising User Information website • Tutorials associated with conferences and workshops put online • A textbook lab training text (with exercises) • More asynchronous through web technology • Meeting with members of non-traditional HPC disciplines to identify requirements to bring them to the resources available, looking for ways to help them transform their science • Introduction to Parallel Programming and MPI” and “Scientific Visualization” using ParaView • Asynchronous webcasts of training classes to broaden participation • Synchronous training class via videoconference (AccessGrid, Polycom)

  14. Major Gaps • Specialized training for computational scientists running on machines vs. computer scientists • Debugging, performance measurement, IO strategies, memory management, project management. • taking a new user from the introductory training sessions to someone who can actually parallelize their thinking thus their code • Multi-core parallelization. • Getting word of HPC libraries to users. • start a new project, including training on what tools/code/methods are easily available, what resource providers are accessible and how to pick one versus another, scaling • Competencies • Application specific guidance; i.e. it would be desired to have help available for applications in biology, chemistry, mathematics, etc. • Basic and advanced parallel programming and software design • Discipline specific parallel programming • Real coursework at the university level

  15. Gaps (cont.) • Workshops are too few and far between. Workshop content is not delivered with a synchronous remote capability for interested participants who cannot physically attend. Workshop content is not captured for post-workshop asynchronous delivery • inconsistency from one system to the next (compiler commands, for example). Each system should be pre-installed with sample code guaranteed to run on that system as well as supplemental training resources specific to that system. • Online tutorials lack specificity to a particular system; sample code does not run on most. • Lack of pro-active personalized support • Scaling up code from tens/hundreds of cores to thousands. • Scaling up code to petascale levels, cores >> 2048. • Methodology: synchronous and asynchronous training • Quality Assurance: accurate, verified, and validated training via synchronous and asynchronous methods • Coherent and guided set of online training tutorials/modules

  16. Gaps (cont.) • C programming • Fortran 90/95/2003 programming • Unix and Linux as applied in HPC • Parallel computing/programming • Distributed & grid • Data analysis & visualization

  17. What do you want to learn? • Ideas for training programs and roadmaps • What training topics offered? Who coordinates and presents? Audiences? Effective methods? Is effective remote training possible and, if so, what technologies are used? Are there opportunities for collaborative training events? • Ideas to improve our training so that users are best served • Interest for joint development of online tutorials and for collaborating on live training • Petascale computing techniques • To identify new and better ways to share materials or develop materials • New and different ways of making training available • Understand the training priorities for other sites, and be sensitive to any political issues

  18. RAT Report Focus • Mentoring, training the trainers, becoming source and editors for CSERD • Provide more details to the training map and the gaps e.g. identify multiple training paths • Training the trainers • Petascale computing aspects • Identification of good parallel computing course • V,V&A of collected training materials. • Targeting underrepresented populations • Capture expertise for asynchronous delivery

  19. Whew! • The scope and need is much broader than anything that can ever be accomplished via the limited funding for training • Setting priorities • Fostering collaboration • Avoid duplication of effort • Share best practices, resources, materials, etc.

More Related