290 likes | 438 Views
Making a Case for Distributed Adaptive Computing in Remote Sensing Science Data Processing. Clay Gloster Department of Electrical & Computer Engineering NC State University, Raleigh, NC email: gloster@eos.ncsu.edu Marco Figueiredo marco@fpga.gsfc.nasa.gov
E N D
Making a Case for Distributed Adaptive Computing in Remote Sensing Science Data Processing Clay Gloster Department of Electrical & Computer Engineering NC State University, Raleigh, NC email: gloster@eos.ncsu.edu Marco Figueiredo marco@fpga.gsfc.nasa.gov SGT Inc./NASA Goddard Space Flight Center Greenbelt, MD, USA
Outline • Problem Statement • Potential Solutions • Application: Image Classification • An Overview of Adaptive Computing • Research Goals/ Vision • Summer 97/98 Accomplishments • Experimental Results • Conclusions & Future Work
Generic Challenge: Scientific Data Processing • Problem Statement: • Given an application that requires an excessive number of scientific computations: • Design a system that can perform these computations with the following constraints. • The system must provide improved performance over current state of the art. • System cost cannot be excessive!! • The system must be flexible and easy to adapt to new applications. • The system development time must be small. • Technology developed during system development must be easily transferred to many potential users
NASA Challenge: Remote Sensing Science Data Processing • Large data input/output requirements (Data Intensive) • The MODIS instrument, to be launched on the EOS-AM1 satellite, has average daily data volumes of 530MB. • Large data processing requirements (Compute Intensive) • The MODIS instrument, to be launched on the EOS-AM1 satellite, has data processing requirement of 5.7 GFLOPs . • Algorithms can change even after the instrument is in orbit • Enhancements made to algorithms • Errors found in algorithms • Correction for errors introduced by instrument fatigue
Potential Solutions • General-purpose processors (Software) • Personal Computers (400MHZ), computer workstations, Supercomputers, etc. • Application-specific processors (Hardware) • IC’s designed to solve a particular problem • Special-purpose processors (Hybrid) • Digital Signal Processors, Math Coprocessors, etc. • Adaptive Computers (A New Paradigm) • Consists of a low cost, high performance, software programmable general purpose processor with one or more low cost, high performance hardware programmable coprocessors.
5 - Best GP SP ASIC ADP 1 - Worst 5 3 1 4 Price Development 4 2 1 3 Time Cost (GP) = 3.50 Performance 2 3 5 4 Cost (SP) = 2.50 Cost (ASIC) = 2.00 Cost (ADP) = 4.00 3 2 1 5 Adaptability Cost Analysis Cost(X) = S ( W )(X ) i i i=1, ..., 4 W = 0.25 i
Previous Results Adaptive Computing has been shown to provide several orders of magnitude speedup for select applications.
Divider Adder Adaptive Computing • With the advent of programmable hardware devices called Field Programmable Gate Arrays (FPGAs), we can now reprogram hardware. • With today’s technology, we can download a new function into an FPGA in time on the order of 200 microseconds. • Current devices also allow partial configuration of the device while other portions of the device continue to function. Time B Time A
Adaptive Computing for Space Applications Satellite with Adaptive Computer
Relevant Features of Java Release 1.1 • Object Oriented Programming Language • The notion of hardware objects is easily implemented • Threads • Applets • Native Methods (interfaces Java to existing code.) • Remote Method Invocation (RMI) • The notion of relocatable hardware objects is easily implemented. • Java Security (Digital Signatures, secure RMI) JAVA is a good programming language for this project!!!!
Proof of Concept (Summer 1997) • Develop an understanding of the ASDP project and identify where NC State could make a contribution • Use the Java language to implement a simple addition program that can be executed remotely. • Accomplishments: • Implementation of a simple addition program controlled from the local host in Java. • Implementation of a simple addition program controlled from a remote host in Java.
Proof of Concept (Summer 1998) • Use the Java language to implement a new version of the PNN algorithm that can be executed remotely. • Accomplishments: • Implementation of the Pnn Algorithm controlled from the local host in Java. • Implementation of a Pnn Algorithm controlled from a remote host in Java. • New implementation of Pnn Algorithm that processes a block of data rather than a single pixel or a row.
Current Research Projects • A generic design methodology for the implementation of high performance remote sensing scientific data processing applications that can drastically improve development time. • A partial implementation of the PNN algorithm for image classification using floating-point units to evaluate the feasibility of using floating point as opposed to fixed point units in state-of-the-art reconfigurable computing environments. • A distributed library of generic floating point arithmetic design modules that are fast, modular, and can easily scale. • A distributed reconfigurable computing implementation of the PNN algorithm for image classification that improves performance by an order of magnitude over the software implementation. • A prototype implementation of the hardware resource allocation system that manages reconfigurable computing hardware.
A Generic Design Methodology • The Von-Neumann Paradigm • Given a typical microprocessor/CPU with a fixed architecture • Given an application • Today’s scientists are trained to use an existing design methodology to map the application onto the given processor. • The New Paradigm • Given a typical reconfigurable computer with an adaptive architecture • Given an application • Develop a new design methodology for future scientists to be trained in.
Pnn: A Case Study • Use Pnn and other applications to develop a generic design methodology • Use Pnn and other applications to reveal the limitations of current reconfigurable computing architectures/systems • The result of this study will be an environment • Scientist can enter the concept in enough detail for an engineer to implement. • Tasks developed as a part of the methodology will be evaluated for potential automation. • Parallelism/distributed processing should be exploited in this environment whenever possible.
A Feasibility Study: Floating Point Units and Reconfigurable Computing • Develop Parameterized VHDL Models for addition, subtraction, multiplication, and division • Investigate the feasibility of implementing modular floating point units • Assess the current state of the art to identify the capacity of current FPGAs for various floating point units. • Make recommendations toward migrating from fixed point FPGA implementations to floating point. • Estimate the time frame when FPGAs will be able to support application development using floating point units.
Fixed Point Versus Floating Point Area Floating Point Fixed Point Pipeline Depth/Number of Units
Hardware Resource Allocation System • Single Adaptive Computing Resource Allocation System • Multiple Adaptive Computing Resource Allocation System • Preemptive Hardware Resource Allocation System
µ p 1 Single RC Resource Allocation System I n t e r n e t • N sites distributed over the Internet • 1 adaptive computer resource • M1 hardware programmable modules S i t e 1 C P C P C P S i t e 2 1 1 1 1 2 M 1 S i t e N
µp 1 CP CP CP 1 1 1 1 2 M 1 Multiple RC Resource Allocation System Internet • N sites distributed over the Internet • m adaptive computer resources • SMi , i =1,m, hardware programmable modules Site 1 µp 2 CP CP CP 2 2 2 1 2 M Site 2 2 µp m CP CP CP m Site N m m 1 2 M m
Experimental Results Pnn • PNN Image Classification run on a 512x512 image, 4 bands, 5 classes. • Experiments run on: • A Pentium 166MHz with 64MB of memory, running Windows NT • Pixel and block based versions of the algorithm were evaluated. • One Block = 6 rows x 512 pixels/row x 4 bytes/pixel = 12,228 bytes • Local and remote versions of the algorithm were evaluated. • In the remote experiments, client and server were the same machine.
LocalCPU RemoteCPU client server Internet Local/Remote Image Classification • Both the client and the server contain adaptive computing resources. • Both the client and the server contain software (Java & C) versions of the classification algorithm. • Local Classification is executed on the client. • Remote Classification is executed on the server via a request from the client. NC State Univ. gloster.cacc..ncsu.edu NASA GSFC classic.gsfc.nasa.gov
Remote Image Classification (Pixel Based) Description Avg Row Total Software (Java) 14.83 7597.79 Software (C) 15.79 8087.24 Hardware (Single) 4.16 2128.38 Hardware (Multiple) 4.25 2156.02 Times reported in CPU seconds
Remote Image Classification (Block Based) Description Avg Row Total Software (Java) 2.65 1358.91 Software (C) 3.61 1847.33 Hardware (Single) 0.35 178.217 Hardware (Multiple) 0.35 180.15 Times reported in CPU seconds
Local Image Classification (Pixel Based) Description Avg Row Total Software (Java) 2.57 1317.31 Software (C) 3.65 1871.36 Hardware (Single) 0.27 141.10 Hardware (Multiple) 0.14 77.45 Times reported in CPU seconds
Local Image Classification (Block Based) Description Avg Row Total Software (Java) 2.56 1309.65 Software (C) 3.69 1889.63 Hardware (Single) 0.28 143.05 Hardware (Multiple) 0.28 142.57 Times reported in CPU seconds
An Interesting Result • Using REMOTE HARDWARE is faster than LOCAL SOFTWARE • Remote Single Module Classification: 178.22s • Local Java Classification 1309.65s • What does this infer? • One site should develop many applications using the proposed resource allocation system. • Applications should be served from centers of excellence, i.e. Distributed Active Archive Centers (DAACs) • Alternatively, applications should be given/licensed to users that have adaptive computing resources in-house.
Conclusions/Future Work • The Java programming language is a good language for this project. • Distributed Adaptive Computing Implementations can provide better performance over local software implementations • Floating point implementations may be feasible for remote sensing science data processing applications. • An adaptive computing hardware resource allocation system can be beneficial. • Portions of a generic design methodology can be automated reducing development time for reconfigurable computing implementations.