210 likes | 320 Views
Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications. J. Nieto 1 , D.Sanz 1 , G. de Arcas 1 , R . Castro 2 , J.M . López 1 , J. Vega 2 1 Universidad Politécnica de Madrid (UPM ), Spain 2 Asociación EURATOM/CIEMAT para Fusión . Spain. Index.
E N D
Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J. Nieto1, D.Sanz1, G. de Arcas1, R. Castro2, J.M. López1, J. Vega2 1 Universidad Politécnica de Madrid (UPM), Spain 2 Asociación EURATOM/CIEMAT paraFusión. Spain
Index • Scope of the project • Project goals • Sample algorithm • Test system • Subtask 1: GPU benchmarking • Subtask 2: EPICS integration (DPD) • Results • Conclusions
FPSC Project • FPSC Project Objective: To develop a FPSC prototype focused on Data Acquisition for ITER IO • The “functional requirements” of FPSC prototype: • To provide high rate data acquisition, pre-processing, archiving and efficient data distribution among the different FPSC software modules • To interface with CODAC and to provide archiving • FPSC software based compatible with RHEL and EPICS • To use COTS solutions
FPSC HW architecture DEVELOPMENT HOST GPUs
GPU subtasks • Goals: • To provide benchmarking of Fermi GPUs (subtask 1) • Analyze GPU development cycle (methodology) • Compare execution times in GPU & CPU for similar developing effort • To provide a methodology to integrate GPU processing units into EPICs (subtask 2) • Requisites: • Use an algorithm representative of the type of operations that would be needed in plasma pre-processing
Linux RedHat Enterprise v5.5 64bits CODAC CORE SYSTEM 2.0 EPICS IOC DPD Subsystem CPU Asyn GPU Asyn IPP v7.0 CULA R11 CUBLAS v3.2 GPU Test System Xeon X5550 QuadCore NVIDIA GTX580
Sample algorithm • Best fit code for detecting position and amplitude of a spectra composed by a set of Gaussians based on Levenberg-Marquardt method
Subtask 1 • Goal: benchmarking of a Fermi GPU • Standard GPU programming methodology: • GPU is operated from the host as a coprocessor • Host threads sequence GPU operations: • Responsible for moving data (Host↔Device) • Operations are coded: • Programming kernels: CUDA • Using libraries primitives: CULA, CUBLAS…
Subtask 2 EPICS IOC EPICS IOC • Goal: to provide EPICS support for GPU processing Asyn Layer Asyn Layer Single process approach DPD approach DPD Acquisition & Processing Processing units FPGA Data Generation CPU GPU Others: archiving…
Proposed methodology Configuration State Machine CODAC • The core of FPSC software is the DPD, it allows for: • Moving data with very good performance. • Integrating all the functional elements (EPICS monitoring, Data processing, Data Acquisition, Remote archiving, etc). • Having a code completely based on the standard asynDriver. • Full compatibility with any type of required data EPICS IOC Asyn Layer Hardware Monitoring DPD (Data Processing and Distribution) Subsystem Timing Hardware/ Cubicle Signals TCN/1588 SDN GPU Proc. CPU Proc. FPGA Archiving Monitoring
DPD features (I) • DPD enables to configure both the different functional elements (FPGA acquisition, GPU processing, SDN, EPICS monitoring, data processing, data archiving) of the FPSC and the connections (links) between them. • Functional elements allow: • reading data blocks from inputs • processing received data • generating new signals • routing data blocks to output links • DPD enables the integration of new type of functional elements to extend the FPSC functionality. This implies the creation of the corresponding asynDrivers that can be carried out in a simple way. • Enables a very easy integration of any existing asynDriver EPICS IOC Output Links Input Links
DPD features (II) Backup Block Link • DPD enables to configure the data routing at configuration-time or even at run-time (to implement fault tolerant solutions) • DPD provides a common set of EPICS PVs for the several functional elements and their respective links • DPD provides on-line measurements of both throughputs and buffer occupancy in the links • DPD implements an optional multi-level buffering (memory, disk) backup solution for any link of the system Level 0 Level 1 Level 2
Test scenario T0 DPD (Data Processing and Distribution) Subsystem T2 GPU Proc. GPU Proc. GPU T1 T4 T3 GPU Proc. Data Generation T4-T0 Total Service Time (TTS) T4-T1 Module Service Time (TMS) TP0 T3-T2 Processing Time (TP) Internal Process Time (TP0) Host → Dev Host → Dev Processing Dev → Host
Timing (II) TiCamera DataGenerator T0: New data block is generated Received data block T1: Data block is received in the module Data block Received T2: Data block is ready to be processed TTS DataFit Processing TMS TP T3: DataFit processing is finished DataFit result packing and routing T4: New DataFit processed data is packed and sent
Test scenario 1 EPICS waveform Monitoring TiCamera DataGenerator GPU processing: TiCameraFit
Test scenario 2 EPICS waveform Monitoring TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit
Test scenario 3 EPICS waveform Monitoring TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#1 processing: TiCameraFit
Results S2 • To determine DPD overhead with respect to “hard coded” approach • To test DPD scalability (multi-module, multiple-hw support) • Using 3rd solution, we have been able to process 3MB/s running 2 modules in 2 different GPUs
Conclusions • Development methodology for using GPUs is being standardized, providing increasing levels of abstraction from hardware implementation details • “Hard coded” implementations seriously compromise scalability and maintainability, • without guarantying relevant increase in performance • Specific frameworks are being developed for different scenarios (Thrust, DPD…) • To simplify development • To promote reusability • To provide scalability and maintainability • To include first level parallelism (internal load balancing based on multithreading)