470 likes | 481 Views
EUFORIA project provides Grid and HPC infrastructure and support for fusion research, focusing on code adaptation, workflows, visualization, and training. Fusion computing paradigm is advanced through middleware developments and scientific workflow visualization.
E N D
EUFORIA Grid and HPC access for Fusion Marcin Plociennik (PSNC, Poland) on behalf of the EUFORIA project 2010 September15, Lisbon
Chalmers University of Technology (Coordinator) from Sweden • Max Plank Institute for Plasma Physics (IPP) from Germany • Centro Superior de Investigaciones Científicas (CSIC) from Spain • Centro de Investigaciones Energéticas, Medio Ambientales y Tecnológicas (CIEMAT) from Spain • Forschungszentrum Karlsruhe (FZK) from Germany • Finnish IT Center for Science (CSC) from Finland • Abo Akademi University (ABO) from Finland • University of Edinburgh (UEDIN) from United Kingdom • Barcelona Supercomputing Center (BSC) from Spain • French Atomic Energy Commission (CEA) from France • University of Strasbourg from France • University of Ljubljana (UOL) from Slovenia • Poznan Supercomputing and Networking Center PSNC from Poland • Italian National Agency for New Technologies, Energy and the Environment (ENEA) from Italy
EUFORIA • 14 member Institutes • 522pms covering • - Management (NA1) • - Training (NA2) • Dissemination (NA3) • Grid and HPC infra- • structure & support • (SA1, SA2, SA3) • - Code adaptation & • Optimization • (grid-JRA1, HPC-JRA2) • Workflows (JRA3) • Visualization (JRA4)
Supporting fusion users • Providing infrastructure • Grid (parallel and serial) and HPC infrastructures and support • EUFORIA Grid infrastructure (2500Cpus, 50Tb) • HPC infrastructure available for application development and proof of principle runs (BSC, CSC, .EPCC) • Provide Application porting for select codes to both Grid and HPC • EFDA decision: Focus on Edge and Core Turbulence and Transport • Provide Training • Use of and adaptation for grid and HPC technologies • Direct Code adaptation for select codes and tools • Help to “self-help”. EUFORIA provides much of the training for this meeting. • Provide extended toolkits for existing infrastructure • Visualization , Workflow extensions Middleware developments
Developing a new paradigm for fusion computing Scientific Workflow Visualization GRID HPC • - Building on e-infrastructure tools, middleware and installations • - Integrating tools and physics models together with a ”fusion simulation ontology” • - At least initially building on fusion de facto standards for data access and communication
Migrating Desktop EUFORIA Grid Infrastructure BDII gridbdii01.ifca.es LFC gridlfc01.ifca.es VOMS voms01.ifca.es Core Services @ IFCA Integrated in ES-NGI Infrastructure WMS gridwms01.ifca.es iwrb2.fzk.de CossBroker gridxb01.ifca.es RAS i2g MPI-Start enabled WN Local Sites CE SE Local Storage
EUFORIA Grid Infrastructure • All central services running at IFCA • Updated to latest gLite versions • Maintained and integrated in the ES-NGI • Local services: • IFCA in Santander, Spain • FZK in Karlsruhe, Germany • Chalmers University, Sweden • 2 sites from Ciemat (Trujillo and Madrid), Spain • Currently available resources for euforia users: ~3100 CPUs, ~2TB online
Monitoring Grid Infrastructure • Monitoring assures proper function of the grid infrastructure • Central monitoring with agrest http://devel.ifca.es/agrest/ • Web interface for monitoring results available at http://monitor.ifca.es:8080/show/latests/EUFORIA
The Grid Codes • Different code domains and different parallel strategies: • GEM: Linear & non linear Turbulence Gyrofluid code (Core Transport). MPI. • BIT1: Divertor code (SOL Transport). Parameter Scan • EIRENE: Neutral transport for tokamaks & Stellarators (Neutral Transport). MC Code. • DAB (Distributed Asinchronous Bees): Tool for the optimization of any concept in fusion devices. Asynchronous Algorithm. • Plus the previously gridified Codes (Taken from EGEE code Platform): Suitable for Workflows. - ISDEP (MC-Transport), Mishka & Helena (Equilibrium and MHD), VMEC (3D-Equilibrium suitable for tokamaks and sellarators).
JRA 2 – HPC Porting and Optimisation • Work with group of ITM codes • Target improving performance and scaling • Aiming at ITER level simulations • Generally one or two orders of magnitude in computational requirements (i.e. run time, number of processors, etc..) • Wide range of codes • From highly parallelised to still serial • Targeting: MPI scaling, serial performance, I/O, OpenMP parallelisations, etc…. (each code presented different challenges and required different approaches) • Aim to work closely with developers • Integrate changes back into “main” source
EUFORIA - HPC – Available resources HECToR (EPCC, UK) Cray XT6: • 44,544 cores • 59.4TB RAM • 360 Tflops peak • No. 16 in Top500 (274 Tflops LINPACK) Louhi (CSC, Finland) Cray XT4/XT5: • 10,860 cores • 4.5TB RAM • 102 Tflops peak • No. 74 in Top500 • (76 Tflops LINPACK) MareNostrum(BSC, Spain) IBM Cluster: • 10240 cores • 20TB RAM • 94 Tflops peak • No. 87 in Top500 • (63 Tflops LINPACK)
Example work • MPI Optimisation • Replace MPI_Sendrecv with AlltoAll • Provide much better scaling for large AlltoAll communications • I/O Optimisation • Reduce I/O runtime by an order of magnitude by correctly tuning I/O functionality on HPC system
Example work Before After • Serial Optimisations • Reduced runtime of Arakawa algorithm by as much as half by optimising using knowledge of particular processor and instruction set • Parallel scaling • Improved scaling of parallel code significantly, enabling code to scale from 64 to 512 processors
Work on workflows • Easy use of supercomputers: • HPC => PC • No manual JDL (job description language) • No manual file transfer and storage • Hide the various architectures • GRID: UI + middleware + … (globus, glite, i2g, …) • HPC: IBM , Cray … + middleware (Unicore, gt4, …) • Cloud: Amazon, Google, … • Integration in the ITM framework • Orchestration tool: KEPLER • Fusion oriented data structure but could be applied to any XML data structure: ITM schemas & UAL
One layer in charge of the job execution on the various architectures: Extend the UAL to the supercomputer architectures Principles GRID glite ITM framework: KEPLER Layer: RAS GRID i2g HPC unicore Cloud UAL
RAS server: Extension towards new architectures: HPC DEISA Integration in the ITM framework: Creation of Kepler actors: And composite actors: Achievements
Achievements: cont’d • Job launching/orchestration can be simplified • Extension of the UAL to GRID & HPC: • Based on MDS+ server • Example: • Workflow +UAL
Workflow using the RAS actors • A high number of inputs has to be specified by the user. • The user has to integrate by himself the access to ITM database if needed.
Users need more …HPC2K • One click tool: • User provides a subroutine • Tool creates the job • Tool reads and writes the data • Tool creates a component (Kepler actor) • User has to copy and paste it in a workflow Project, actor names & others parameters Arguments GRID or HPC infrastructure Kepler & UAL
Kepler Launch the HPC/GRID actor generated by HPC2K, specifying the current simualtion time HPC/GRID actor Generate the input file from the data in the input port. Upload the input file, the executable and the libraries. Launch the script on GRID/HPC Download the output file. Send the results to the workflow Shell Script set the environment and run the wrapperr IO wrapper Get the ITM data using the UAL: CPO time slice Generate the output file. Write the output CPO Simulation code Receive the CPO, get the internal data and do the physic computattion. Update the variable inside the CPO
Fusion VO HPC-Grid Workflow • Kepler launches the different actors and organizes the workflow. • Kepler runs on the fusion Gateway for managing the data. • One actor (ASTRA) running on HPC. • And the other (TRUBA) on the grid (thousands of jobs). A single ray is a job. ASTRA Code KEPLER: Workflow engine Fusion Gateway Frascati, Italy • n, T files (kB) - Power depostion profile (kB) • Equilibrium (50 MB) HPC HPC-FF Juelich, Germany or Mare Nostrum BSC, Spain or Altamira IFCA, Spain TRUBA Code (Grid) Ray tracing Code
Achievements: meta-workflow • Launching a workflow on GRID from a workflow on the gateway Gateway GRID RAS on euforia.efda-itm.eu
Visualization framework • matplotlib: 1D, 2D, simple (1 line of Python) to complex (full script with numpy transformations) plots, highly customizable • VisIt: 1D, 2D, 3D (and over) plots in a few clicks Fusion data Stand alone visualization tools Integrated visualization tools Python interface Python actor UAL library C++ interface VisIt actor VisIt plug-in
Python / matplotlib • Simple plots in a few seconds • Use Python UAL interface to get data into numpy arrays • Call matplotlib fonctions to plot numpy arrays
VisIt • Open .ual file containing CPO info (shot, run, name), select one available plot • (require CPODef.xml enrichment) for a field of this CPO, then click on draw!
Chalmers University of Technology (Coordinator) from Sweden • Max Plank Institute for Plasma Physics (IPP) from Germany • Centro Superior de Investigaciones Científicas (CSIC) from Spain • Centro de Investigaciones Energéticas, Medio Ambientales y Tecnológicas (CIEMAT) from Spain • Forschungszentrum Karlsruhe (FZK) from Germany • Finnish IT Center for Science (CSC) from Finland • Abo Akademi University (ABO) from Finland • University of Edinburgh (UEDIN) from United Kingdom • Barcelona Supercomputing Center (BSC) from Spain • French Atomic Energy Commission (CEA) from France • University of Strasbourg from France • University of Ljubljana (UOL) from Slovenia • Poznan Supercomputing and Networking Center PSNC from Poland • Italian National Agency for New Technologies, Energy and the Environment (ENEA) from Italy Thanks
Monitoring Grid Infrastructure • Tests every hour: • WMS and CrossBrokers: test service responsiveness, correct BDII information • CEs + SEs: • test correct submission of jobs • test file operations (creation, removal) with LFC for all the available SEs
HPC • Provide access to large scale HPC resources • Major and world leading calculations • One million cpu hours on the UK supercomputer, HECToR ( 2009) • Two million cpu hours (standardized) by DEISA (2009 ) • Additional Two million+ cpu hours to distribute 2010
HPC • Evaluation of resource provision • Questionnaire sent to resource recipients • Questionnaire sent to resource providers • Limited user response • 2 out of 8 user • Better centre response
HPC • Integrate with DEISA • Provide feedback from users when retrieved • Ask for feedback on users • Working with • JRA2 – Porting of fusion codes, optimisation, etc… • JRA3/SA1 – Supporting mixed workflows between Grid and HPC • Workflow • Work with JRA3 • Porting and testing codes • Providing expertise on HPC systems • Simplifying HPC access for users • Provide easier access to resources and larger pool of resources • Easier to obtain resources
From GEM to gGEM • GEM: Gyrofluid Turbulence Code: instabilities behaviour. • Versions: Serial (small cases) and MPI (High scalability, to hundreds of proc.). • Gaining experience in porting MPI codes. • Status: THE CODE IS RUNNING IN THE GRID. • CLOSE CONTACT WITH THE CODE OWNER 39
BIT1 Porting • BIT1: PIC + MC code for plasma edge simulations. Simplified Plasma Model: 1D in real space. • Strong need of communication between nodes. • A wide range of parameters must be scanned. PARAMETER SCAN PROBLEM. • Solved Problem: 8 Scrape-off-Layer widths x 8 impurity concentrations x Two types of bulk ions. Average CPU time per Job: 7 DAYS. • RUN BY CODE OWNER.
BIT1 code BIT1 is an electrostatic Particle-in-Cell + Monte Carlo (PIC + MC) code for plasma edge simulations. Simplified Plasma Model. 1D in real space. • Dimensionality: 1D+3DV for plasma, 2D+3DV for neutrals and impurities. • High Complexity: about 30.000 lines, CPU time >1000 h • Resolution: down to electron gyro-motion • The electric field is calculated self-consistently, the magnetic field is fixed • Serial and parallel versions (average scalability 70-80% for 512 processors) D.Tskhakaya, NIFS, Toki-shi 24.09.2009
Results of BIT1 PARAMETER SCAN PROBLEM. Solved Problem: 8 Scrape-off-Layer widths x 8 impurity concentrations x Two types of bulk ions. Average CPU time per Job: 7 DAYS. TOTAL ~ 2 year CPU time • Particle & energy fluxes in a single simulation • Strong in-out asymmetry.
EIRENE A Monte-Carlo code to simulate neutral particle transport effects in plasmas (plasma-wall interactions) based on a discretization using a finite element mesh It requires lots of inputs: • A formated input fileincluding simulation parameters • Modelling data • Plasma background • Geometry descriptions • Mesh data Plasma flow field in ITER Divertor Image source: Detlef Reiter, FZJ EMC3-EIRENE run used Tapas4grid.
Metaheuristics: Artificial Bee Colony Algorithm and VMEC (Variational Moment Equilibrium Code) • VMEC, 3D Equilibrium code, Ported to the grid: Capable of modelling 3D-tokamaks and stellarators. A configuration, given by Fourier representation of magnetic field and pressure profile, estimated on a single node. • Target functions to optimise: • 0) Equilibrium itself (must exist). • 1) NC Transport. • 2) Mercier Criterion Stability. (VMEC 8.46). • 3) Ballooning Criterium (COBRA code on the grid). Distributed Asynchronous Bees (DAB) EXAMPLE: Stellarator Optimization
Example subroutine cpo2ip(equi_in, ip) !----------------------------------------------------------------------- use euitm_schemas use euITM_routines implicit none integer,parameter :: DP=kind(1.0D0) type (type_equilibrium),pointer :: equi_in(:) integer :: ip integer :: i write(*,*) ' cpo2ip: in the computation routine ' write(*,*) 'time deb',equi_in(:)%time,size(equi_in) call flush(6) ip=23 return end subroutine cpo2ip • Get a CPO • Write an integer • Compilation by: • >g95 … • Include files depends on the Fortran compiler COPTS = -r8 -ftrace=full -fno-second-underscore -fPIC INCLUDES = -I/afs/efda-itm.eu/project/switm/ual/4.07b/include/amd64_g95 all: cpo2ip.o libcpo2ip.a cpo2ip.o: cpo2ip.f90 g95 $(COPTS) -c -o $@ $^ ${INCLUDES} libcpo2ip.a: cpo2ip.o ar -rv libcpo2ip.a cpo2ip.o
GRID/HPC parameters • Enable the inexperienced users to keep default GRID/HPC parameters and… • …Enable the experienced users to change from Kepler engine some parameters for GRID/HPC execution without using HPC2K tool.