230 likes | 379 Views
Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages. I. Viola, A. Kanitsar, M. E. Gr öller. Institute of Computer Graphics and Algorithms Vienna University of Technology Vienna, Austria. CPU. CPU. DATA ACQUISITION. GPU. DATA ENHANCEMENT. GPU.
E N D
Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages I. Viola, A. Kanitsar, M.E. Gröller Institute of Computer Graphics and Algorithms Vienna University of Technology Vienna, Austria
CPU CPU DATA ACQUISITION GPU DATA ENHANCEMENT GPU VISUALIZATION MAPPING VISUALIZATION MAPPING RENDERING RENDERING Volume Visualization Pipeline CPU DATA ENHANCEMENT
DATA ENHANCEMENT liver dataset segmented vessels GPU-based Algorithms • high performance • high flexibility • easy implementation: HLSL • necessary features: • floating point precision • long shader programs latest commodity graphics hardware
Talk Outline • processing pipeline • GPU-based filtering • per-vertex stage • per-fragment stage • median filter • bilateral filter • rotated mask filter • GPU-based segmentation
Liver Vessel Tree Visualization • pre-filtering • improving thresholding segmentation • edge-preserving filters • interactive threshold adjustment • mask generation • volumetric clipping • volume rendering GPU CPU GPU
Talk Outline • processing pipeline • GPU-based filtering • per-vertex stage • per-fragment stage • median filter • bilateral filter • rotated mask filter • GPU-based segmentation
Filtering in Graphics HardwareIssues • data representation: textures • 3D texture • stack of 2D textures • access to value: texture fetch • neighborhood addressing: texture offset we use 5×5×5 neighborhood • filter implementation: per-fragment stage • results: rendered into off-screen buffer
TEXTURE STACK OFF-SCREEN BUFFER STACK Data Representation TEXTURE STACK OFF-SCREEN BUFFER STACK
Neighborhood Addressing Two alternatives: • directly in fragment program requires additional computation • pre-compute in per-vertex stage • store in vertex attributes • interpolation “for-free” • swizzle operator
Address Pre-computation IN.TEXCOORD0.xy PER-VERTEX STAGE X-2 Y+2 XY XW X-1 Y+1 TEXCOORD0.xy ZY ZW TEXCOORD0.zw TEXCOORD0.xw XY TEXCOORD0.zy FILTER KERNEL OUT.TEXCOORD0.xy = OUT.TEXCOORD0.xyzw= IN.TEXCOORD0.xyxy + float4(-2, 2 + float4(-2, 2, -1,1)
Per-fragment Stage • medical data - 12 bit precision fixed point 12-bit arithmetics • use cache coherence • exploit 4D instructions • reduce conditionals • reduce number of registers • push computation to per-vertex stage
Median Filter • central value of ordered set • implementation • CPU-based sorting • GPU-based similar to quickselect()
GPU-based Median Filter • input data 12 bit [0..4095] • multi-pass approach • not efficient on CPU • exploiting GPU 4D arithmetics 0 1 2 3 4 5 6 7
Bilateral Filter • edge preservation: anisotropic filter kernel • product of two weights: • geometric: • photometric: high geometric weight low photometric weight f(x) high geometric weight low geometric weight x
GPU-based Bilateral Filter • weights are precomputed • geometric weight stored in unused vertex attributes (COLOR0) • photometric weight stored in 1D mirror LUT • weight product • sum-up contributions & weights • normalize
Rotated Mask Filter • anisotropic noise removal with edge preservation • splits filter mask into sub-regions • mean and variance value for each sub-region • result – mean value of sub-region with minimal variance • GPU implementation • single pass - slow • multiple passes - reduce temp. registers
Talk Outline • processing pipeline • GPU-based filtering • per-vertex stage • per-fragment stage • median filter • bilateral filter • rotated mask filter • GPU-based segmentation
Segmentation • input: pre-filtered data after noise removal • thresholding segmentation • 0 outside interval • 1 within interval • interactive threshold adjustment • output: compressed form 32 slices in one 32 bit slice
Results • GPU: NVIDIA GeForceFX 5900 Ultra • CPU: AMD AthlonXP 2.4 GHz, 1GB DDR RAM • liver dataset: 512×512×72
Conclusions • data enhancement step on GPU! • simple tasks better speedup • optimization HW specific • high-level programming • friendly • many implementation possibilities • compiler efficiency