E N D
Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on a Reconfigurable ComputerTarek El-Ghazawi1, Esam El-Araby1, Abhishek Agarwal1, Jacqueline Le Moigne2, and Kris Gaj31The George Washington University,2NASA/Goddard Space Flight Center,3George Mason University{tarek, esam, agarwala}@gwu.edu, lemoigne@backserv.gsfc.nasa.gov, kgaj@gmu.edu
Airborne Spaceborne Objectives and Introduction Investigate Use of Reconfigurable Computing for On-Board Automatic Processing of Remote Sensing Data • Remote Sensing Image Classification • Applications: • Land Classification, Mining, Geology, Forestry, Agriculture, Environmental Management, Global Atmospheric Profiling (e.g. water vapor and temperature profiles), and Planetary Space missions • Types of Carriers:
Multispectral / Hyperspectral Imagery Comparison Types of Sensing • Mono-Spectral Imagery 1 band (SPOT ≡ panchromatic) • Multi-Spectral Imagery 10s of bands (MODIS ≡ 36 bands, SeaWiFS ≡ 8 bands, IKONOS ≡ 5 bands) • Hyperspectral Imagery 100s-1000s of bands (AVIRIS ≡ 224 bands, AIRS ≡ 2378 bands)
AISA AVIRIS AURORA GER Different Airborne Hyperspectral Systems
Solutions Automatic On-Board Processing Reduces the cost and the complexity of the On-The-Ground/Earth processing system larger utilization for broader community, including educational institutions Enables autonomous decisions to be taken on-board faster critical decisions Applications: Future reconfigurable web sensors missions Future Mars and planetary exploration missions Dimension Reduction* Reduction of communication bandwidth Simpler and faster subsequent computations Why On-Board Processing? Problems • Complex Pre-processing Steps: • Image Registration / Fusion • Large Data Volumes • Large cost and complexity of the On-The-Ground / Earth processing systems • Large critical decisions latency • Large data downlink bandwidth requirements * Investigated Pre-Processing Step
Solutions Reconfigurable Computers (RCs) Higher performance (throughput and processing power) compared to conventional processors Lower form / wrap factors compared to parallel computers Higher flexibility (reconfigurability) compared to ASICs Less costs and shorter time-to-solution compared to ASICs Why Reconfigurable Computers? On-Board Processing Problems • High Computational Complexities • Low performance for traditional processing platforms • High form / wrap factors (size and weight) for parallel computing systems • Low flexibility for traditional ASIC-Based solutions • High costs and long design cycles for traditional ASIC-Based solutions
512 pixels 224 bands 512 pixels Pixels ≡ (Rows x Columns) Parallel Computing Scope, Reconfigurable Computing 2nd Scope Rows Bands Bands Reconfigurable Computing 1stScope Columns Matrix Form Hyper Image Data Arrangement
0 1 2 . . Bands-1 0 (0,0) 0 1 2 . . Bands-1 1 (0,1) (0,0) (0,1) (0,cols-1) Rows 0 1 2 . . Bands-1 (Pixels-1) Bands (rows-1,cols-1) (rows-1,0) (rows-1,cols-1) Columns 8 Bits Hyper Image Array Form Data Arrangement (cnt’d) Pixels = Rows X Columns
AVIRIS: INDIAN PINES’92 (400x400 by 192 bands) AVIRIS: SALINAS’98 (217x512 by 192 bands) Examples of Hyperspectral Datasets
Multi-Resolution Wavelet Decomposition of Each Pixel 1-D Spectral Signature (Preservation of Spectral Locality) Dimension Reduction Techniques • Principal Component Analysis (PCA): • Most Common Method Dimension Reduction • Does Not Preserve Spectral Signatures • Complex and Global computations: difficult for parallel processing and hardware implementations • Wavelet-Based Dimension Reduction: • Preserves Spectral Signatures • High-Performance Implementation • Simple and Local Operations
L 2 2 2 2 2 L H HL LL L H LH HH L H 2 H 1-D DWT 2-D DWT (1-level Decomposition)
L 2 H 2 L 2 2 2 L H HL 2 L L 2 H 2 2 H H 2 HH LH L 2 H 2 Second Level First Level 2-D DWT (2-level Decomposition)
Wavelet-Based vs. PCA(Execution Time, 500 MHz P3) Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
WAVELET PCA Wavelet-Based vs. PCA (cnt’d) (Execution Time, 500 MHz P3) Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
Wavelet-Based vs. PCA (cnt’d)(Classification Accuracy) • Implemented on the HIVE (8 Pentium Xeon/Beowulfs-Type System) 6.5 times faster than sequential implementation • Classification Accuracy Similar or Better than PCA • Faster than PCA
PIXEL LEVEL OVERALL Save Current Level [a] of Wavelet Coefficients Read Data Decompose Spectral Pixel Read Threshold (Th) DWT Coefficients (the Approximation) Reconstruct Individual Pixel to Original Stage Compute Level for Each Individual Pixel (PIXEL LEVEL) Reconstructed Approximation Compute Correlation (Corr) between Orig and Recon. Remove Outlier Pixels No Corr < Th Get Lowest Level (L) from Global Histogram Yes Get Current Level [a] of Wavelet Coefficients Decompose Each Pixel to Level L Add Contribution of the Pixel to Global Histogram Write Data The Algorithm
Prototyping Wavelet-Based Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer, the SRC-6E
Llevel X L1:L5 DWT_IDWT GTE_1: GTE_5 Histogram MUX Correlator Level Y1:Y5 TH N Top Hierarchy Module
Level_2 Level_5 Level_3 Level_4 Level_1 L5 L L4 2 2 2 2 2 L L L L L3 L2 L1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 L0 L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ L’ X D D D D Y3 Y4 Y5 Y1 Y2 Decomposition and Reconstruction Levels of Dimension Reduction (DWT_IDWT)
X Register Register Register Register C(3) C(n) C(1) C(2) X + Output Image F(i) X … X Input Image D(i) FIR Filters (L, L’) Implementation
termxx termAB termAB termAB X Shift Left (32 bits) termxy term2xy N MULT Yi Compare GTE_i (Increment Histogrami) termyy termxxtermyy MULT MULT MULT TH TH2 Correlator Module
GTE_3 GTE_2 GTE_1 GTE_4 GTE_5 cnt_3 cnt_2 cnt_4 cnt_5 cnt_1 Level Update Histogram Counters Level Selector Histogram Module
µP Functions MAP Function MAP Alloc. Read Data CM to OBM Compute OBM to CM Write Data MAP Free Transfer-In Computations Transfer-Out Repeat nstreams times End-to-End time (HW) Configuration + End-to-End time (SW) End-to-End time with I/O Allocation time Release time Measurements Scenarios
DMA-IN Compute DMA-OUT TDMA-IN TCOMPUTATIONS TDMA-OUT TTOTAL DMA-IN DMA-IN DMA-IN Compute Compute Compute DMA-OUT DMA-OUT DMA-OUT DMA-IN DMA-IN DMA-IN DMA-OUT DMA-OUT DMA-OUT Compute Compute Compute SRC Experiment Setup and Results • Salinas’98 • 217 X 512 Pixels, 192 Bands = 162.75 MB • Number of Streams = 41 • Stream Size = 2730 voxels ≈ 4 MB • Non-Overlapped Streams • TDMA-IN = 13.040 msec • TCOMP = 0.62428 msec • TDMA-OUT = 22.712 msec • TTotal = 1.49 sec • Throughput = 109.23 MB/Sec • Overlapped Streams • TDMA = 35.752 msec • TCOMP = 0.62428 msec • Xc = 0.0175 • Throughput = 111.14 MB/Sec • Speedupnon-overlapped = (1+ Xc) = 1.0175 (insignificant)
Concluding Remarks • We prototyped the automatic wavelet-based dimension reduction algorithm on a reconfigurable architecture • Both coarse-grain and fine-grain parallelism are exploited • We observed a 10x speedup using the P3 version of SRC-6E. From our previous experience we expect this speedup to double using the P4 version of SRC machine • These speedup figures were obtained while I/O is still dominating. The speedup can be increased by improving I/O Bandwidth of the reconfigurable platforms