1 / 47

EUFORIA Grid and HPC access for Fusion Marcin Plociennik (PSNC, Poland)

EUFORIA project provides Grid and HPC infrastructure and support for fusion research, focusing on code adaptation, workflows, visualization, and training. Fusion computing paradigm is advanced through middleware developments and scientific workflow visualization.

doyleo
Download Presentation

EUFORIA Grid and HPC access for Fusion Marcin Plociennik (PSNC, Poland)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EUFORIA Grid and HPC access for Fusion Marcin Plociennik (PSNC, Poland) on behalf of the EUFORIA project 2010 September15, Lisbon

  2. Chalmers University of Technology (Coordinator) from Sweden • Max Plank Institute for Plasma Physics (IPP) from Germany • Centro Superior de Investigaciones Científicas (CSIC) from Spain • Centro de Investigaciones Energéticas, Medio Ambientales y Tecnológicas (CIEMAT) from Spain • Forschungszentrum Karlsruhe (FZK) from Germany • Finnish IT Center for Science (CSC) from Finland • Abo Akademi University (ABO) from Finland • University of Edinburgh (UEDIN) from United Kingdom • Barcelona Supercomputing Center (BSC) from Spain • French Atomic Energy Commission (CEA) from France • University of Strasbourg from France • University of Ljubljana (UOL) from Slovenia • Poznan Supercomputing and Networking Center PSNC from Poland • Italian National Agency for New Technologies, Energy and the Environment (ENEA) from Italy

  3. EUFORIA • 14 member Institutes • 522pms covering • - Management (NA1) • - Training (NA2) • Dissemination (NA3) • Grid and HPC infra- • structure & support • (SA1, SA2, SA3) • - Code adaptation & • Optimization • (grid-JRA1, HPC-JRA2) • Workflows (JRA3) • Visualization (JRA4)

  4. Supporting fusion users • Providing infrastructure • Grid (parallel and serial) and HPC infrastructures and support • EUFORIA Grid infrastructure (2500Cpus, 50Tb) • HPC infrastructure available for application development and proof of principle runs (BSC, CSC, .EPCC) • Provide Application porting for select codes to both Grid and HPC • EFDA decision: Focus on Edge and Core Turbulence and Transport • Provide Training • Use of and adaptation for grid and HPC technologies • Direct Code adaptation for select codes and tools • Help to “self-help”. EUFORIA provides much of the training for this meeting. • Provide extended toolkits for existing infrastructure • Visualization , Workflow extensions Middleware developments

  5. Developing a new paradigm for fusion computing Scientific Workflow Visualization GRID HPC • - Building on e-infrastructure tools, middleware and installations • - Integrating tools and physics models together with a ”fusion simulation ontology” • - At least initially building on fusion de facto standards for data access and communication

  6. Migrating Desktop EUFORIA Grid Infrastructure BDII gridbdii01.ifca.es LFC gridlfc01.ifca.es VOMS voms01.ifca.es Core Services @ IFCA Integrated in ES-NGI Infrastructure WMS gridwms01.ifca.es iwrb2.fzk.de CossBroker gridxb01.ifca.es RAS i2g MPI-Start enabled WN Local Sites CE SE Local Storage

  7. EUFORIA Grid Infrastructure • All central services running at IFCA • Updated to latest gLite versions • Maintained and integrated in the ES-NGI • Local services: • IFCA in Santander, Spain • FZK in Karlsruhe, Germany • Chalmers University, Sweden • 2 sites from Ciemat (Trujillo and Madrid), Spain • Currently available resources for euforia users: ~3100 CPUs, ~2TB online

  8. Monitoring Grid Infrastructure • Monitoring assures proper function of the grid infrastructure • Central monitoring with agrest http://devel.ifca.es/agrest/ • Web interface for monitoring results available at http://monitor.ifca.es:8080/show/latests/EUFORIA

  9. The Grid Codes • Different code domains and different parallel strategies: • GEM: Linear & non linear Turbulence Gyrofluid code (Core Transport). MPI. • BIT1: Divertor code (SOL Transport). Parameter Scan • EIRENE: Neutral transport for tokamaks & Stellarators (Neutral Transport). MC Code. • DAB (Distributed Asinchronous Bees): Tool for the optimization of any concept in fusion devices. Asynchronous Algorithm. • Plus the previously gridified Codes (Taken from EGEE code Platform): Suitable for Workflows. - ISDEP (MC-Transport), Mishka & Helena (Equilibrium and MHD), VMEC (3D-Equilibrium suitable for tokamaks and sellarators).

  10. JRA 2 – HPC Porting and Optimisation • Work with group of ITM codes • Target improving performance and scaling • Aiming at ITER level simulations • Generally one or two orders of magnitude in computational requirements (i.e. run time, number of processors, etc..) • Wide range of codes • From highly parallelised to still serial • Targeting: MPI scaling, serial performance, I/O, OpenMP parallelisations, etc…. (each code presented different challenges and required different approaches) • Aim to work closely with developers • Integrate changes back into “main” source

  11. EUFORIA - HPC – Available resources HECToR (EPCC, UK) Cray XT6: • 44,544 cores • 59.4TB RAM • 360 Tflops peak • No. 16 in Top500 (274 Tflops LINPACK) Louhi (CSC, Finland) Cray XT4/XT5: • 10,860 cores • 4.5TB RAM • 102 Tflops peak • No. 74 in Top500 • (76 Tflops LINPACK) MareNostrum(BSC, Spain) IBM Cluster: • 10240 cores • 20TB RAM • 94 Tflops peak • No. 87 in Top500 • (63 Tflops LINPACK)

  12. Example work • MPI Optimisation • Replace MPI_Sendrecv with AlltoAll • Provide much better scaling for large AlltoAll communications • I/O Optimisation • Reduce I/O runtime by an order of magnitude by correctly tuning I/O functionality on HPC system

  13. Example work Before After • Serial Optimisations • Reduced runtime of Arakawa algorithm by as much as half by optimising using knowledge of particular processor and instruction set • Parallel scaling • Improved scaling of parallel code significantly, enabling code to scale from 64 to 512 processors

  14. Work on workflows • Easy use of supercomputers: • HPC => PC • No manual JDL (job description language) • No manual file transfer and storage • Hide the various architectures • GRID: UI + middleware + … (globus, glite, i2g, …) • HPC: IBM , Cray … + middleware (Unicore, gt4, …) • Cloud: Amazon, Google, … • Integration in the ITM framework • Orchestration tool: KEPLER • Fusion oriented data structure but could be applied to any XML data structure: ITM schemas & UAL

  15. One layer in charge of the job execution on the various architectures: Extend the UAL to the supercomputer architectures Principles GRID glite ITM framework: KEPLER Layer: RAS GRID i2g HPC unicore Cloud UAL

  16. RAS server: Extension towards new architectures: HPC DEISA Integration in the ITM framework: Creation of Kepler actors: And composite actors: Achievements

  17. Achievements: cont’d • Job launching/orchestration can be simplified • Extension of the UAL to GRID & HPC: • Based on MDS+ server • Example: • Workflow +UAL

  18. Workflow using the RAS actors • A high number of inputs has to be specified by the user. • The user has to integrate by himself the access to ITM database if needed.

  19. Users need more …HPC2K • One click tool: • User provides a subroutine • Tool creates the job • Tool reads and writes the data • Tool creates a component (Kepler actor) • User has to copy and paste it in a workflow Project, actor names & others parameters Arguments GRID or HPC infrastructure Kepler & UAL

  20. Kepler Launch the HPC/GRID actor generated by HPC2K, specifying the current simualtion time HPC/GRID actor Generate the input file from the data in the input port. Upload the input file, the executable and the libraries. Launch the script on GRID/HPC Download the output file. Send the results to the workflow Shell Script set the environment and run the wrapperr IO wrapper Get the ITM data using the UAL: CPO time slice Generate the output file. Write the output CPO Simulation code Receive the CPO, get the internal data and do the physic computattion. Update the variable inside the CPO

  21. Fusion VO HPC-Grid Workflow • Kepler launches the different actors and organizes the workflow. • Kepler runs on the fusion Gateway for managing the data. • One actor (ASTRA) running on HPC. • And the other (TRUBA) on the grid (thousands of jobs). A single ray is a job. ASTRA Code KEPLER: Workflow engine Fusion Gateway Frascati, Italy • n, T files (kB) - Power depostion profile (kB) • Equilibrium (50 MB) HPC HPC-FF Juelich, Germany or Mare Nostrum BSC, Spain or Altamira IFCA, Spain TRUBA Code (Grid) Ray tracing Code

  22. Achievements: meta-workflow • Launching a workflow on GRID from a workflow on the gateway Gateway GRID RAS on euforia.efda-itm.eu

  23. Visualization framework • matplotlib: 1D, 2D, simple (1 line of Python) to complex (full script with numpy transformations) plots, highly customizable • VisIt: 1D, 2D, 3D (and over) plots in a few clicks Fusion data Stand alone visualization tools Integrated visualization tools Python interface Python actor UAL library C++ interface VisIt actor VisIt plug-in

  24. Python / matplotlib • Simple plots in a few seconds • Use Python UAL interface to get data into numpy arrays • Call matplotlib fonctions to plot numpy arrays

  25. Examples

  26. VisIt • Open .ual file containing CPO info (shot, run, name), select one available plot • (require CPODef.xml enrichment) for a field of this CPO, then click on draw!

  27. Examples

  28. Python and VisIt in Kepler

  29. Chalmers University of Technology (Coordinator) from Sweden • Max Plank Institute for Plasma Physics (IPP) from Germany • Centro Superior de Investigaciones Científicas (CSIC) from Spain • Centro de Investigaciones Energéticas, Medio Ambientales y Tecnológicas (CIEMAT) from Spain • Forschungszentrum Karlsruhe (FZK) from Germany • Finnish IT Center for Science (CSC) from Finland • Abo Akademi University (ABO) from Finland • University of Edinburgh (UEDIN) from United Kingdom • Barcelona Supercomputing Center (BSC) from Spain • French Atomic Energy Commission (CEA) from France • University of Strasbourg from France • University of Ljubljana (UOL) from Slovenia • Poznan Supercomputing and Networking Center PSNC from Poland • Italian National Agency for New Technologies, Energy and the Environment (ENEA) from Italy Thanks

  30. Additional material

  31. Monitoring Grid Infrastructure • Tests every hour: • WMS and CrossBrokers: test service responsiveness, correct BDII information • CEs + SEs: • test correct submission of jobs • test file operations (creation, removal) with LFC for all the available SEs

  32. Accounting

  33. HPC • Provide access to large scale HPC resources • Major and world leading calculations • One million cpu hours on the UK supercomputer, HECToR ( 2009) • Two million cpu hours (standardized) by DEISA (2009 ) • Additional Two million+ cpu hours to distribute 2010

  34. HPC • Evaluation of resource provision • Questionnaire sent to resource recipients • Questionnaire sent to resource providers • Limited user response • 2 out of 8 user • Better centre response

  35. HPC • Integrate with DEISA • Provide feedback from users when retrieved • Ask for feedback on users • Working with • JRA2 – Porting of fusion codes, optimisation, etc… • JRA3/SA1 – Supporting mixed workflows between Grid and HPC • Workflow • Work with JRA3 • Porting and testing codes • Providing expertise on HPC systems • Simplifying HPC access for users • Provide easier access to resources and larger pool of resources • Easier to obtain resources

  36. From GEM to gGEM • GEM: Gyrofluid Turbulence Code: instabilities behaviour. • Versions: Serial (small cases) and MPI (High scalability, to hundreds of proc.). • Gaining experience in porting MPI codes. • Status: THE CODE IS RUNNING IN THE GRID. • CLOSE CONTACT WITH THE CODE OWNER 39

  37. BIT1 Porting • BIT1: PIC + MC code for plasma edge simulations. Simplified Plasma Model: 1D in real space. • Strong need of communication between nodes. • A wide range of parameters must be scanned.  PARAMETER SCAN PROBLEM. • Solved Problem: 8 Scrape-off-Layer widths x 8 impurity concentrations x Two types of bulk ions. Average CPU time per Job: 7 DAYS. • RUN BY CODE OWNER.

  38. BIT1 code BIT1 is an electrostatic Particle-in-Cell + Monte Carlo (PIC + MC) code for plasma edge simulations. Simplified Plasma Model. 1D in real space. • Dimensionality: 1D+3DV for plasma, 2D+3DV for neutrals and impurities. • High Complexity: about 30.000 lines, CPU time >1000 h • Resolution: down to electron gyro-motion • The electric field is calculated self-consistently, the magnetic field is fixed • Serial and parallel versions (average scalability 70-80% for 512 processors) D.Tskhakaya, NIFS, Toki-shi 24.09.2009

  39. Results of BIT1 PARAMETER SCAN PROBLEM. Solved Problem: 8 Scrape-off-Layer widths x 8 impurity concentrations x Two types of bulk ions. Average CPU time per Job: 7 DAYS. TOTAL ~ 2 year CPU time • Particle & energy fluxes in a single simulation • Strong in-out asymmetry.

  40. EIRENE A Monte-Carlo code to simulate neutral particle transport effects in plasmas (plasma-wall interactions) based on a discretization using a finite element mesh It requires lots of inputs: • A formated input fileincluding simulation parameters • Modelling data • Plasma background • Geometry descriptions • Mesh data Plasma flow field in ITER Divertor Image source: Detlef Reiter, FZJ EMC3-EIRENE run used Tapas4grid.

  41. Metaheuristics: Artificial Bee Colony Algorithm and VMEC (Variational Moment Equilibrium Code) • VMEC, 3D Equilibrium code, Ported to the grid: Capable of modelling 3D-tokamaks and stellarators. A configuration, given by Fourier representation of magnetic field and pressure profile, estimated on a single node. • Target functions to optimise: • 0) Equilibrium itself (must exist). • 1) NC Transport. • 2) Mercier Criterion Stability. (VMEC 8.46). • 3) Ballooning Criterium (COBRA code on the grid). Distributed Asynchronous Bees (DAB) EXAMPLE: Stellarator Optimization

  42. Example of input file

  43. Example subroutine cpo2ip(equi_in, ip) !----------------------------------------------------------------------- use euitm_schemas use euITM_routines implicit none integer,parameter :: DP=kind(1.0D0) type (type_equilibrium),pointer :: equi_in(:) integer :: ip integer :: i write(*,*) ' cpo2ip: in the computation routine ' write(*,*) 'time deb',equi_in(:)%time,size(equi_in) call flush(6) ip=23 return end subroutine cpo2ip • Get a CPO • Write an integer • Compilation by: • >g95 … • Include files depends on the Fortran compiler COPTS = -r8 -ftrace=full -fno-second-underscore -fPIC INCLUDES = -I/afs/efda-itm.eu/project/switm/ual/4.07b/include/amd64_g95 all: cpo2ip.o libcpo2ip.a cpo2ip.o: cpo2ip.f90 g95 $(COPTS) -c -o $@ $^ ${INCLUDES} libcpo2ip.a: cpo2ip.o ar -rv libcpo2ip.a cpo2ip.o

  44. GRID/HPC parameters • Enable the inexperienced users to keep default GRID/HPC parameters and… • …Enable the experienced users to change from Kepler engine some parameters for GRID/HPC execution without using HPC2K tool.

More Related