230 likes | 405 Views
HANDBOOK ON GREEN INFORMATION AND COMMUNICATION SYSTEMS. Chapter 9: Green Computing Platforms for Biomedical Systems. Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam Mukherjee, and Bharat S Joshi University of North Carolina at Charlotte. Overview. Green Computing in Biomedical Field
E N D
HANDBOOK ON GREEN INFORMATION AND COMMUNICATION SYSTEMS Chapter 9: Green Computing Platforms for Biomedical Systems Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam Mukherjee, and Bharat S Joshi University of North Carolina at Charlotte
Overview • Green Computing in Biomedical Field • Survey of Green Computing Platform • Analysis of popular Biomedical Applications • Design Framework for Biomedical Embedded Processors • Survey of Simulation tools for Design Space Exploration • Development and Characterization of Benchmark Suite • Design Space Exploration and Optimization Techniques of Embedded Micro architectures • Conclusion • Future Research Areas
Green Computing in Biomedical Field Computing in Biomedical systems can be classified into 3 categories. • Implantable device level • Portable/Embedded platform level • Server level
Characteristics of Biomedical Systems • Power consumption • Renewable energy resource – energy harvesting • Heat dissipation • Minimizing area • Cost • Performance
Survey of Green Computing Platforms Implantable Devices • monitor the physiological parameters of the human body. • Pacemakers, cardioverter-defibrillators, cochlear • Most of the implantable devices are inactive most of the times and activate based on a stimulus from the body Configuration of a brain implant or brain-machine interface (BMI)
Embedded Platforms • physiological monitoring systems • recognition systems Wearable ultra-low power biomedical signal processor, CoolBio™.
Power Management in Intel ATOM ATOM includes power management control block, a power management block, a clock synthesizer and a few programmable registers which work on reducing the noise, achieving low quiescent current, real-time dynamic switching of voltage and frequency between multiple performance modes, varying core operation voltage and processor speeds to save on ATOM’s power and improve its performance. Figure : Power management in Intel ATOM
Servers The Oracle WebLogic Server 11g software was used to demonstrate the performance of the Avitek Medical Records sample application. A configuration using SPARC T3-1B and SPARC Enterprise M5000 servers from Oracle was used and showed excellent scaling of different configurations as well as doubling previous generation SPARC blade performance.
Analysis of Biomedical Applications Flowchart for choosing algorithm-architecture combination best suited for an application
Pairwise Correlation Another way to interpret PPMCC X: {x1, x2, x3, ….. xn} Y : {y1, y2, y3, ….. yn} r : coefficient of correlation Cov(X,Y) : covariance of X and Y SX : standard deviations of X SY : standard deviations of Y µX: Expectation of X µY: Expectation of Y
i,j : ith, jth channel where 1≤i,j≤m x(i,k), x(j,k) : kth sample from ith, jth channel where 1≤i,j≤m, i≠j and 1≤k≤n r(i,j) : Correlation coefficient between ith, jth channel where 1≤i,j≤m
Choosing initial algorithm and architecture Initially the PWC is written In serial fashion for Xeon Dual Core processor . After running Vtune we arrive at the following statistics Table 1: Performance of Serial code on Intel Xeon Dual Core processor The code is them parallelised in OpenMP and analysed once again to arrive at better performance values as shown below Table 4.3: Performance of OpenMP code on Intel Xeon Dual Core processor Implementation on Cell using the Ring Algorithm gives a speed-up of approx. 56 when compared with serial version on Intel Xeon.
Design Framework for Biomedical Embedded Processors Design flow for Bio-medical Embedded Processors
Development and Characterization of Benchmark Suite A good multicore benchmark will identify bottlenecks in the multicore system design including memory and I/O bottlenecks, computational bottlenecks, and real-time bottlenecks*. In addition, a good multicore benchmark will identify synchronization problems where code and data blocks are split, distributed to various compute engines for processing, and then the results are reassembled. *S Gal-On, M Levy, S Leibson, “How to Survice the Quest for a useful Multicore Benchmark", ECN Magazine, Dec 2009
Performance analysis of the benchmark Analysis of PWC on various Simulator tools CASPER CPI D$ size (in bytes) Avg Power (uW) D$ size (in bytes) Average Power per core on CASPER CPI per core on CASPER
MV5 Simulation Analysis of Parallel version of the code (per CPU results) on MV5 with various configurations
Design Space Exploration and Optimization Techniques of Embedded Micro architectures Different approaches used for design space exploration for multicore processor architecture and optimization algorithms • Artificial Neural Networks (ANN) • Fast Genetic Algorithms(Used in CASPER) • Genetically programmed Response surfaces(GPRS used on MV5)
Conclusion • Methodologies for the characterization of bio-medical applications for ultra-low-energy and low heat producing embedded implantable devices, as well as for low power dissipation but high performance embedded computing platforms. PWC benchmark the computation complexity is O(mn2), which has given a CPI of 0.67 and L2 Cache miss percentage of 25.67 on Intel Xeon Dual Core processor • Outlines of the procedure to be followed for the design space exploration of processor micro-architectures using existing simulation tools and optimizers. heterogeneous configuration with two IO and two OOO consumes less energy per CPU (29.918 mJ) compared to a homogenous configuration on MV5's alpha architecture simulation
Future Research Areas Development of better different instruction set architectures (ISAs) Corresponding cross-compilers to generate optimized executables for the simulators Upgrading existing simulation platforms to support full system mode with real time kernel libraries to account for the latency and throughput of the real-life applications Development of advanced real time operating systems and scheduling algorithms to schedule the various applications on different heterogeneous cores to meet the hard real time constraints.