SLAAC Team Meeting 3-99

SLAAC Team Meeting 3-99 Ultra- Wideband Coherent RF Mark Dunham Multi- Dimensional Imaging Jeff Bloch LANL Challenge Problems Kevin McCabe kmccabe@lanl.gov 505-667-0728

University Collaborations: BYU, UT,... MultiDimensional Image Processing Ultra-Wide Band Radio Frequency Kurt Moore Real Time Processing of Multi/Hyper Spectral or Time domain Data Cubes DAPS Real Time Processing wide band RF data ReConfigurable Computing Hardware: IP51/RCA-2/DARPA/(RCA-3)/Commercial CALIOPE Kurt Moore HIRIS/MTI John Szymanski James Theiler SHS RULLI Hyperspectral Demonstrations Mike Caffrey, Phil Blain, Noor Khalsa, Tony Rose, Tony Nelson RCC Architecture Development/Deployment Hardware/Software Environment ALDEBARAN Mark Dunham Bellatrix Scott Robinson Capella R. Dingler Cibola Mike Caffrey Mike Caffrey, Tony Salazaar, John Layne, Jan Friego Classification/Compression/Recognition Algorithms for fixed point RCC Hardware John Szymanski, James Theiler, Jeff Bloch, Kurt Moore, Chris Brislawn Steven Brumby Reid Porter,Simon Perkins FORTE’ V-SENSOR DII: “Rapid Feature Identification Using RCC Technology and Genetic Algorithms” Jeff Bloch, John Szymanski DARPA Collaboration: RCC HW/SW Tool Evaluation Kevin McCabe

Collaboration Phase 1 • Represent Challenge Problems • Ultra-Wide Band RF (UWBRF) Signal Processing • Multi-Dimensional Image Processing (MDIP) • Provide specific challenge problem descriptions to ACS investigators • MDIP: http://nis-www.lanl.gov/nis-projects/daps/ • UWBRF: http://www.lanl.gov/rcc/ • Seek out and collaborate with ACS investigators whose work matches our need • ISI - DRP and RRP • Northwestern - Matlab compiler • Ptolemy - System level analysis tool • Validate Hardware or Software Strategies

Collaboration Phase 2 Technology Insertion • MDIP Rapid Feature Identification Project (RFIP) • Multi-dimensional image processing via algorithms derived in real time for rapid searching of archival information and high bandwidth sensor data streams • Bellatrix UWBRF Signal Compression • Wideband Signal Compressor for ID-1 Compatible Tape Recorders • Airborne environment • Capella UWBRF Accelerated Analysis Tool • Acceleration of government algorithms to make real time analysis feasible Additional Potential Challenge area • Plume Detection • Airborne LIDAR based sensor to detect and analyze plumes

RFIP Jeff Bloch, 505-665-2568, jbloch@lanl.gov John Szymanski, szymanski@lanl.gov, 505-665-9371 James Theiler, jtheiler@lanl.gov, 505-665-5682

RFIP • Objective • Manipulate image processing steps carried out on RCC hardware to develop remote sensing algorithms for classifying and identifying features of interest to an image analyst. • Provide software suite and hardware to work in host workstation(s) • Search speeds comparable to archive retrieval times • Platform and RCC hardware independence • Scalable within a platform and via networked platforms • Approach • Rapid evolution of feature recognition procedure via: • Hardware accelerator containing tunable image processing operators • Software engine that parallelizes a search and manipulates accelerated operators to maximize performance against truth data

RFIP • Current Plan • Develop non-real time demonstration (proof of concept) of an algorithm to evolve image classification procedures for identifying features of interest (fully funded by RFIP) • Select image operators amenable to RCC acceleration • Select algorithm framework and software • Select simulation environment (IDL, Perl, etc.) • Select test datasets • Develop, write, and execute an all software demonstration • Demonstrate ability of RCC to dramatically speed up image processing steps(RFIP - SLAAC partnership) • Select demonstration RCC hardware (SLAAC-1 insertion opportunity) • Develop tunable operator architecture in VHDL • Select some image operators and code them in VHDL • Develop software engine to drive a single RCC • Benchmark accelerated operators against software operators

RFIP • Future Plan (2 DII proposals being submitted this month) • Fully develop RCC accelerated workstation • Add to selection image operators amenable to RCC acceleration and code them in VHDL • Refine algorithm framework and software • Define and procure RCC computer suitable for analysts workstation • Broaden test datasets • Benchmark against all software solution • Demonstrate parallelizability and scalability of approach on multiple workstations • Implement prior all software solution across multiple workstations • Develop a parallel execution scheme • Accelerated algorithm evolution against one truth data set • Accelerated processing of search data • Develop advanced user interface • Benchmark against single workstation all software solution

RFIP • RFIP Project Status • Non-real time demonstration functional • Ability to demonstrate a limited number of operators • Further refinement of tunable operators approach ongoing • Rapid evolution of a Water Finding Procedure demonstrated • Benchmark of all software solution to be done • Current RFIP funding ends 12/99 • RFIP SLAAC collaboration effort • Demonstration of RCC • Candidate image operators selected • Tunable operator architecture concept under development • Targeting of RCC hardware to be done • Develop software engine to drive a single RCC to be done • Benchmark against software operators to be done • Limited demonstration to prove principle by 12/99

Control Line Inputs Spectral Spectral Spectral Spectral Spectral Spectral Spectral Spatial Spatial Spatial Ground Truth Training Data Fitness Function Feature 1 Feature 2 Feature 3 Threshold Threshold Threshold Multi-Spectral Image Channel Inputs The Tunable Operator Architecture • Fitness Output

RFIP - SLAAC collaboration SLAAC technology • Hardware • SLAAC-1 RRP with Linux driver • LANL has VXI based RCC in use just in last few months BUT! • VXI form factor not suitable for RFIP workstations • Have only a primitive board support package • SLAAC-1 advantages • PCI form factor with Linux driver matches RFIP workstations • Runtime library is key to research into scalability proposed by RFIP team • Still under discussion but: Xilinx architecture in SLAAC-1 may have advantages over Altera architecture for tunable operator concept under development

RFIP - SLAAC collaboration SLAAC technology • Software • Runtime library is key to research into scalability proposed by RFIP team • Long-term vision Clusters of work-stations employing accelerated hardware to allow: 1) Rapid development of new tools, and constant refinement of existing tools, for analysts mining large data bases for timely information. 2) Greater acceleration by distributing inherently parallelizable processing

RFIP - SLAAC collaboration Goals • Long Term Goals (beyond 12/99) • Determine best architecture for MDIP class of problems • Single RRP in a workstation • Operators accelerated at least 10x • Demonstrate scalability • Multiple RRPs in a single workstation • Multiple workstations with 1 RRP each • 9 month Insertion Plan • Map tunable operator architecture into SLAAC-1 • Target VHDL operators to Xilinx 40150 • Develop interface to software engine on host

Bellatrix • Scott Robinson, 505-665-1954, shr@lanl.gov • John Layne, 505-667-5137, jpl@lanl.gov • Mark Dunham, 505-667-0045, mdunham@lanl.gov

Bellatrix • Objective • To demonstrate the ability to continuously record wideband data for the COMBAT SENT program. • Apply lossy compression while still preserving the signal characteristics required by the analyst. • Demonstrate a novel algorithm for lossy compression of wideband signals so that 40 MHz @ 12 bits can be recorded at 50 MB/s with upgrade path to 70MHz. • Devise hardware solution for higher resolution and up to 200 MHz bandwidth under light signal conditions. • Provide a scalable platform that can be used for R&D on new WB processing tasks after delivery of the compression system in anticipation of NextGen architecture.

Bellatrix • Approach • Develop three signal processing algorithms for RCC acceleration • Sub-Band Coding Compression • Homomorphic Compression • Burst Digitization Compression • Initially target a 100Mss, 12 bit channel recorded onto an ID-1 tape. • Apply lossy compression techniques to convert 150 Mbytes/second of incoming data to 50 Mbytes/second outgoing to tape.

LANL RCA-2 FPGA Computer Celerity A256 ID-1 Tape Interface VXI PPC Con- troller I/O 3 I/O 1 I/O 2 ApCom 1610 IF/IF Converter 160 MHz IF BELLATRIX 1.0 WB CompressionSub-Band Coding (V. 2) VXI Chassis 40 MHz analog BW FPDP ID-1 100Mss, 12 bit Digitizer Mezzanine (under development) Ethernet 10baseT Control Sony DIR-1000H Tape

Celerity A256 ID-1 Tape Interface VXI PPC Con- troller ApCom 1610 IF/IF Converter 160 MHz IF Pentium PCI Slot Computer BELLATRIX 1.0 WB CompressionSub-Band Coding with SLAAC-1 VXI Chassis Ethernet 10baseT Control 40 MHz analog BW ID-1 FPDP 100Mss, 12 bit Digitizer Mezzanine (under development) I/O 1 I/O 2 Sony DIR-1000H Tape SLAAC-1 FPGA Computer

LANL RCA-2 FPGA Computer LANL RCA-2 FPGA Computer Celerity A256 ID-1 Tape Interface VXI PPC Con- troller VXI Chassis I/O 2 I/O 3 I/O 2 I/O 1 I/O 3 I/O 1 ApCom 1610 IF/IF Converter 160 MHz IF 40MHz analog BW Ethernet 10baseT Control QC-64 FPDP QC-64 CRI Peg-80 FFT CPU CRI Peg-80 FFT CPU Pentium PC WinNT OS 100Mss, 12 bit Digitizer Mezzanine (under development) ID-1 Sony DIR-1000H Tape Industrial PCI Chassis & Backplane BELLATRIX 1.0 WB CompressionHomomorphic or Burst Digitization (V. 2)

Bellatrix • Plan and Status • Software models of lossy compression techniques have been developed. • Accomplishments: Demonstration of experimental algorithms on Blackbeard and FORTE signals; analysis of rate-distortion characteristics and effects of data quantization on exploitability • Validate lossy compression models against actual data in process. • Develop a 12-bit, 100 Msps A/D converter input card for the RCA-2 using the new Analog Devices AD9432. • Implement Sub-Band Coding compression technique on RCC for initial flight demonstration Sept. 1999. • Follow-on demos dependent on success of initial demo • Eventually add demodulation, cross-correlation delay estimation, parameterization, set-on, and SNOI removal.

Bellatrix - SLAAC collaboration Goals • Evaluate suitability of RRP architecture for UWB problem • High rate systolic streaming data • Collaborate with developers of Nextgen architecture

FFT(N+K) Spectrum Memory Freqs Wideband Compression via Burst Digitization Input Signal Sl Sk Sj Si t W X Compressed Output i j k l Adaptive Thresholds SAVE? Activity Rules

Homomorphic Compression Algorithm 2i+j Positive Frequencies only: (Analytic Signal Format) Dm T0 Baseline Threshold 0 0 N/2 Discrete Components fnyq

Joint Time Frequency Compression of Wideband RF Signals • Developers: Chris Brislawn (CIC-3), Shane Crockett (student, USNA). • Example: spectrogram of FORTE data (L); after 4:1 compression (R).

Lossy Compression of Wideband RF • Time Based Compression (Burst Digitization) • Well suited for pulse-like signals with low duty factor • Performance depends on detection of pulse presence • Weak SNR cases need sophisticated triggering • Frequency Domain Compression (Homomorphic + Thresholding) • Well suited to long duration signals and complex signal mixtures • Simple versions of algorithm can provide 5X compression & high fidelity • Higher compression ratios tend to round fast rise/fall times on pulses • Compression Through Sub-Band Coding • Joint localization of signal in time and frequency • Adaptive bit allocation and scalar quantizer design • Compression generally removes signal “noise”

Capella-2 • Scott Robinson, 505-665-1954, shr@lanl.gov • Robert Dingler, 505-665-3483, rdingler@lanl.gov • Steve White, 505-667-4623, swhite@lanl.gov • Tony Salazar, 505-667-2508, aasalazar@lanl.gov • Mark Dunham, 505-667-0045, mdunham@lanl.gov

Capella-2 Objective • Provide 1000 lines/sec minimum, 10,000 lines/sec goal, of Government Spectrum & A Raster displays for quick look data searches. • Implement selected routines for 40 MHz real time analysis. • Allow key concept demonstrations of a Modular Coherent UWB Processor, including SNOI removal, set-on, demod, and cross-correlation. • Demonstrate that pre-D processing can yield superior Pd, de-interleave, and metrics in real time, with respect to PDW methods.

Dataflow Block Diagram FFT Time- Frequency Filters IFFT ALPHA 4100 RCC Pulse Parameterizer RCC Synchronous Video Integration

Capella-2 Software Environment C++ MFC Routines National Inst. Pentium PC Running NT 4.0 Control and Status Control - Text commands sent to socket via TCP/IP PCI Bus 1 Ethernet FPDP Out DLL HW Library Routines 60 MB/S RAID Dec Alpha 4100 4-Processor Host MXI Control SW MXI Control SW PCI Bus 2 FPDP In Calculex SW Model CRI FFT Board RCC Boards Daughter Cards VXI Crate “Black Box” Tape/GigaFlash System “Black Box” Software Functions: Initialize, Load Flex File, Set Registers, Start Processing, Stop Processing, Report Status

SYSRAM 70 MB/s RAID Peritek Display PPC604 Controller LANL RCA-2 CRI FFT LANL RCA-2 Set-on Rcvr CRI FFT SLAAC-3 UWB Digitizer 50 MB/s Tape Basic Workstation Accelerator System Waterfall Video SVGA FPDP Alta To External Network Ether Ether 10bT PCI A Alpha CPU Alpha CPU Alpha CPU Alpha CPU PCI B DEC Alpha 4100 VXI Crate U-SCSI U-SCSI FPDP Alta ID-1

Review

Highest Priority MDIP Algorithm Needs: • Spectral and Spatial Classification in real time • Spectral matched filter algorithms • “K-Means” style classification algorithm • Plume detection • Rare signal or signature detection

Highest Priority UWB Algorithm Needs: • Find a coding algorithm/process to compress information bandwidth through an FFT • Decompose a non-linear RF chirp into an efficient wavelet or multi-resolution expansion • Apply image processing/recognition algorithms to streaming time-frequency images to find objects of interest. • Identify fast methods of classification and correlation suitable for FPGA implementation

Myrinet-2560/SAN Compatible I/O Interface Development Status March 3, 1999 Douglas E. Patrick NIS-4 Space Instrumentation and Systems Engineering Mail Stop D448 Los Alamos National Laboratory (505)-665-1203 patrick@lanl.gov

A Few Current Myrinet/SAN Interface Design Efforts by others • Lockheed/Sanders: LANAI processor based Common Node Adapter (CNA) • Lockheed/Martin Astronautics: FPGA driven I/O design using FI32 SAN/FIFO interface Version 1.3 (currently not using any Packet or Header info) • Air Force Research Laboratory: FPGA driven I/O similar to LMCO but uses AFRL Packets

Myrinet Interface Design Goals • Leverage off of LMCO and AFRL Design • Maintain as much Myrinet-2560/SAN Compatibility as possible (within reason) • Maintain protocol and packet compatibility with those that we will be interfacing with (AFRL, LMCO, etc..) • At a minimum, be able to easily reconfigure (via FPGA reprogramming) for mission specific protocol(s)/packet(s).

SLAAC-3

Desirable architectural elements Overall Architecture • Distinct from current COTS RCCs • Careful trade study of PE-PE connectivity vs. PE-Memory • Bus widths • Data Broadcast between one input and all PEs • Independent addressibility • Anticipate direction of reconfigurability features

Desirable architectural elements Input/Output • High Speed IO of flexible type • Mezzanine card with standard interface to RCC • Directly connected to PEs • Capable of wide 64 bit data • Ability to split and combine data streams for parallelizability and scalability • 3 IO Ports, 2 in and 1 out or vice versa • 1 IO connected to all PEs • IO dataflow decoupled from PE by FIFOs

Desirable architectural elements Memory • Multiple parallel memory banks local to each PE • Independently addressable • Parallel access • 18 bit data • Ability to split and merge data streams for parallelizability and scalability • 3 IO Ports, 2 in and 1 out or vice versa • 1 IO connected to all PEs • IO dataflow decoupled from PE by FIFOs

Desirable architectural elements Memory • Shared memory banks between adjacent PEs • Connected via crossbar switches • Ability to split and merge data streams for parallelizability and scalability • 3 IO Ports, 2 in and 1 out or vice versa • 1 IO connected to all PEs • IO dataflow decoupled from PE by FIFOs Datapaths • Broadcast bus between one input all PEs

Desirable architectural elements • 6U VME64 Board (+3.3V included in this standard) • Mezzanine cards for flexibility • Simple Fast interface from PE to back-plane • Simple VME Interface Controller • Configuration Manager and local configuration memory • +2.5V or +1.8V Need these for future FPGAs • Independent clock with skew control

SLAAC Team Meeting 3-99