1 / 21

Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications

Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications. J. Nieto 1 , D.Sanz 1 , G. de Arcas 1 , R . Castro 2 , J.M . López 1 , J. Vega 2 1 Universidad Politécnica de Madrid (UPM ), Spain 2 Asociación EURATOM/CIEMAT para Fusión . Spain. Index.

mili
Download Presentation

Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J. Nieto1, D.Sanz1, G. de Arcas1, R. Castro2, J.M. López1, J. Vega2 1 Universidad Politécnica de Madrid (UPM), Spain 2 Asociación EURATOM/CIEMAT paraFusión. Spain

  2. Index • Scope of the project • Project goals • Sample algorithm • Test system • Subtask 1: GPU benchmarking • Subtask 2: EPICS integration (DPD) • Results • Conclusions

  3. FPSC Project • FPSC Project Objective: To develop a FPSC prototype focused on Data Acquisition for ITER IO • The “functional requirements” of FPSC prototype: • To provide high rate data acquisition, pre-processing, archiving and efficient data distribution among the different FPSC software modules • To interface with CODAC and to provide archiving • FPSC software based compatible with RHEL and EPICS • To use COTS solutions

  4. FPSC HW architecture DEVELOPMENT HOST GPUs

  5. GPU subtasks • Goals: • To provide benchmarking of Fermi GPUs (subtask 1) • Analyze GPU development cycle (methodology) • Compare execution times in GPU & CPU for similar developing effort • To provide a methodology to integrate GPU processing units into EPICs (subtask 2) • Requisites: • Use an algorithm representative of the type of operations that would be needed in plasma pre-processing

  6. Linux RedHat Enterprise v5.5 64bits CODAC CORE SYSTEM 2.0 EPICS IOC DPD Subsystem CPU Asyn GPU Asyn IPP v7.0 CULA R11 CUBLAS v3.2 GPU Test System Xeon X5550 QuadCore NVIDIA GTX580

  7. Sample algorithm • Best fit code for detecting position and amplitude of a spectra composed by a set of Gaussians based on Levenberg-Marquardt method

  8. Subtask 1 • Goal: benchmarking of a Fermi GPU • Standard GPU programming methodology: • GPU is operated from the host as a coprocessor • Host threads sequence GPU operations: • Responsible for moving data (Host↔Device) • Operations are coded: • Programming kernels: CUDA • Using libraries primitives: CULA, CUBLAS…

  9. Results S1 (I)

  10. Results S1 (II)

  11. Subtask 2 EPICS IOC EPICS IOC • Goal: to provide EPICS support for GPU processing Asyn Layer Asyn Layer Single process approach DPD approach DPD Acquisition & Processing Processing units FPGA Data Generation CPU GPU Others: archiving…

  12. Proposed methodology Configuration State Machine CODAC • The core of FPSC software is the DPD, it allows for: • Moving data with very good performance. • Integrating all the functional elements (EPICS monitoring, Data processing, Data Acquisition, Remote archiving, etc). • Having a code completely based on the standard asynDriver. • Full compatibility with any type of required data EPICS IOC Asyn Layer Hardware Monitoring DPD (Data Processing and Distribution) Subsystem Timing Hardware/ Cubicle Signals TCN/1588 SDN GPU Proc. CPU Proc. FPGA Archiving Monitoring

  13. DPD features (I) • DPD enables to configure both the different functional elements (FPGA acquisition, GPU processing, SDN, EPICS monitoring, data processing, data archiving) of the FPSC and the connections (links) between them. • Functional elements allow: • reading data blocks from inputs • processing received data • generating new signals • routing data blocks to output links • DPD enables the integration of new type of functional elements to extend the FPSC functionality. This implies the creation of the corresponding asynDrivers that can be carried out in a simple way. • Enables a very easy integration of any existing asynDriver EPICS IOC Output Links Input Links

  14. DPD features (II) Backup Block Link • DPD enables to configure the data routing at configuration-time or even at run-time (to implement fault tolerant solutions) • DPD provides a common set of EPICS PVs for the several functional elements and their respective links • DPD provides on-line measurements of both throughputs and buffer occupancy in the links • DPD implements an optional multi-level buffering (memory, disk) backup solution for any link of the system Level 0 Level 1 Level 2

  15. Test scenario T0 DPD (Data Processing and Distribution) Subsystem T2 GPU Proc. GPU Proc. GPU T1 T4 T3 GPU Proc. Data Generation T4-T0 Total Service Time (TTS) T4-T1 Module Service Time (TMS) TP0 T3-T2 Processing Time (TP) Internal Process Time (TP0) Host → Dev Host → Dev Processing Dev → Host

  16. Timing (II) TiCamera DataGenerator T0: New data block is generated Received data block T1: Data block is received in the module Data block Received T2: Data block is ready to be processed TTS DataFit Processing TMS TP T3: DataFit processing is finished DataFit result packing and routing T4: New DataFit processed data is packed and sent

  17. Test scenario 1 EPICS waveform Monitoring TiCamera DataGenerator GPU processing: TiCameraFit

  18. Test scenario 2 EPICS waveform Monitoring TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit

  19. Test scenario 3 EPICS waveform Monitoring TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#1 processing: TiCameraFit

  20. Results S2 • To determine DPD overhead with respect to “hard coded” approach • To test DPD scalability (multi-module, multiple-hw support) • Using 3rd solution, we have been able to process 3MB/s running 2 modules in 2 different GPUs

  21. Conclusions • Development methodology for using GPUs is being standardized, providing increasing levels of abstraction from hardware implementation details • “Hard coded” implementations seriously compromise scalability and maintainability, • without guarantying relevant increase in performance • Specific frameworks are being developed for different scenarios (Thrust, DPD…) • To simplify development • To promote reusability • To provide scalability and maintainability • To include first level parallelism (internal load balancing based on multithreading)

More Related