420 likes | 595 Views
Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design. Zafer Gürdal * & Mark T. Jones ** Virginia Tech * Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics ** The Bradley Department of Electrical and Computer Engineering
E N D
Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design Zafer Gürdal* & Mark T. Jones** Virginia Tech * Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics ** The Bradley Department of Electrical and Computer Engineering 06/17/03 National Institute of Aerospace, Hampton VA Support • NASA LaRC, NRA 98, Innovative Algorithms for Aerospace Engineering Analysis and Optimization, PM: Jarek Sobieski • NASA LaRC, Mechanics and Durability Branch, PM: Damodar Ambur • Virginia Tech, ASPIRES Program
Outline • Introduction • Evolutionary Design • Elements of Cellular Automata • CA applied to Engineering Design • Truss Domain • Composite Laminate Design • Hardware Implementation • Configurable Computing – FPGAs • CA Implementation Results • Multigrid Acceleration
Evolutionary Design • Mimic natural evolution of biological systems for structural design • Evolutionary design often relies on local optimality/decision making of independent parts • Examples: Reaction wood Bone growth • Cellular Automata: Decomposition of a seemingly complex macro behavior into basic small local problems
Evolutionary Design Individual Designs Species Genetic Algorithms ESO,MMD,CA Local Evolution of Analysis and Design Local Rules for Design, Global Analysis Cellular Automata ESO, MMD Evolutionary Design of Structures
Cellular Automata • Weiner (1946), Ulam (1952), von Neumann (1966) • Automata Networks • Cell Dynamic Scheme • Idealizations of complex natural systems • Flock behavior • Diffusion of gaseous systems • Solidification and crystal growth • Hydrodynamic flow and turbulence • General characteristics • Locality • Vast Parallelism • Simplicity
Elements of Cellular Automata • Cell Definitions • Lattice Configurations • Neighborhoods • Boundaries • Update rules • Iteration Schemes
Rectangular Triangular Hexagonal Elements of Cellular Automata • Definition for state of a cell and update rule time step Center cell cell ID Neighborhood cells • Two-dimensional Lattice Configurations
NN NW NE N NW N NE N E EE WW W W E W E S SW SE SW S SE SS S Neighborhood Definition • Rectangular Neighborhoods von Neumann Moore MvonN • Boundaries • Periodic • Location Specific
N NE NW uC W C E vC d vSE SW S SE uSE Update Rule – 2D Truss Domain Analysis Ground Structure Single Cell • Displacement Update:
Undeformed CA Analysis FEM Analysis Applied force or displacement Sample Truss Analysis Results • Linear Analysis • Nonlinear Analysis
2985 iterations Linear analysis Nonlinear analysis total reaction 1641 iterations # of iterations Linear vs. Nonlinear Analysis
75 kN 100 kN 40 m 60 m Sizing/Design Rules • Local Optimization Formulation • Sequential Move and Size • Fully Stressed Design Dense Truss Solution (CDF = 40) Geometry & Basic Ground Structure CDF = 1
Y x a(x,y) y ¶ W W X Design of Fiber Reinforced Panels • Minimum Compliance Design where (x,y): fiber angle distribution • Minimum Strain Energy Density (Pedersen 1990) Principal Strain Direction
Panel with a Circular Hole in Shear Quarter Panel Model Optimality Criteria (OC) Design
Panel with a Circular Hole in Shear Pattern Matching + OC Design Pattern Matching + Discrete Design
Panel with a Circular Hole in Shear Topology + Orientation Design Topology + Discrete Fiber Orientation
Hardware Integration • Current parallel architectures are limited • Specialized CA machines mimicking CA domains • Domain Modeled === Hardware Domain
Configurable Computing and Field Programmable Gate Arrays (FPGAs)
Definitions and Potential • Configurable computers are a relatively new class of computer architecture in which hardware circuits are (re-)configured for a specific algorithm • Offer “ASIC-like” speeds without the cost of designing and fabricating a chip • ASIC cost can run into many millions • General-purpose CPUs are slow • Configurable computers are often built using FPGAs because of their widespread availability (>>$1B market)
An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip Each CLB contains registers and LUTs, where each LUT can implement a 4-input logic operation By programming the CLBs and interconnections large circuits can be represented in the FPGA One Xilinx XC2V4000 FPGA can represent a circuit up to 1M gates Field Programmable Gate Array (FPGA) Layout
DINI DN3000k10 Board • DINI DN3000k10 is an FPGA based PCI card • Contains five Xilinx XCV4000 FPGAs connected by a 226 bit wide bus • One of the FPGAs has a separate connection for communicating to a PC via the PCI bus • FPGAs can be configured through the PCI bus or configurations can be stored on board
Algorithms for FPGAs • Target FPGA strengths: parallel, pipelined, customized • Goal is to have every part of the chip actively computing at the highest possible clock speed • Do: re-think the algorithm to • Expose the natural parallelism • Pipeline time-consuming operations • Examine the precision that is really necessary • Do not: Implement algorithms as you would in software on a traditional computer
Multiplier Options Usage (% CLBs)* *Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs
Application Performance • HokieGene – Genome Matching Project (2003) • Matching engine executes on one FPGA (XC2V1000) • Performs 200 billion cell updates per second • 1,200 billion operations per second (1.2 TOPS) • BYU - Network Intrusion Detection Systems (2002) • Hardware implementation uses one FPGA (XC2V1000) • Outperformed software version running on P3 – 750MHz: • Up to 400 times more throughput than software version • Up to 1000 times less latency than software version • Xilinx – High Performance DES Encryption (2000) • Implemented on one small FPGA (XCV150) • Maximum throughput 10.75 GB/sec • Outperformed best ASIC implementation • University of Texas at Austin – Target Recognition System (2000) • System built using one FPGA (ORCA 40k) and Myrinet interfacing • Capable of processing 900 templates per second • 2,800 billion operations per second (2.8 TOPS)
Iterative Methods for Linear Systems • Consider Jacobi’s method • D xi+1 = (D-A) xi + b • In software, we would select either single or double precision floating point • On a configurable computer we can select any format in which to store/compute value • Choose the desired precision of the solution • Reconstruct the method for fast computation
Iterative Methods Continued • Re-cast as iterative improvement scheme • ri = b - A xi Compute in n bits • xi = A–1 ri Compute in k bits • xi+1 = xi +xi = A–1 ri Compute in n bits • Use Jacobi to solve for xi in compact, fast k-bit hardware (cost ~ bits2) • Thm: Convergence rate is independent of k • Thm: Optimal choice of k ~ n/(# iterations)1/3
Convergence • Solution Error vs. Number of Iterations • K= 3,6,9 decimal digits • No difference in convergence rate
Performance Advantage • Execution Cost (number of bit operations) vs. the size of the matrix • Compares cost of normal vs. modified algorithm • Convergence for each algorithm is identical
h h Euler Beam Formulation • Cell Neighborhood Control Volume FL FR y F FC ML MC MR wL ,θL wC ,θC wR ,θR x d(x) • Cell Equilibrium
Cellular Automata ModelMultiple Cells per Processing Element
Equilibrium Update • residual Equilibrium Update • error Converged NO YES • correction Design Update • Design Update Converged NO YES End Beam Design
Algorithm Strategy • The limited precision algorithm illustrated for Jacobi’s method earlier is applied to CA • Much smaller, faster circuits for applying CA rule updates in k-bit operations • Built-in 18x18 multipliers compute residual • Built-in high-speed memories provide • Storage for intermediate and permanent quantities • Many customizable word-lengths • Extremely high memory bandwidth
FPGA Performance Cell Updates Per Second (Millions)
y F x latticeh S S S S S E S S S E S S S E E E S S S S S S S E lattice 2h lattice 4h lattice 8h V - cycle W - cycle lattice lattice h h 2h 2h 4h 4h 8h 8h : Equilibrium update to convergence : Restriction (on r) : Prolongation (on e) : Equilibrium updated α times Multigrid Acceleration
latticeh lattice 2h Prolongation
Correction Prolongation • Residual Restriction Prolongation/Restriction Prolongation Operator latticeh lattice 2h where
~ ~ Design with 3 Cells: Nested Iteration for MG accelerated CA Design with 257 Cells: Design with 65 Cells: Design with 17 Cells: ~ Design with 5 Cells: d(x)
CA Design Performance with Full MG 108 107 106 105 104 Total number of cell updates 103 102 101 100 1 10 100 1000 Number of Cells
Concluding Remarks • Summary • CA paradigm has been demonstrated for various structural systems • CA paradigm matches well with Configurable Computing acceleration • Full Multigrid acceleration for CA improves design convergence • Future Work • Expand the design capabilities in terms of structural details and the types of field problems that can be solved • Tools that will enable engineers to effortlessly use configurable computers for CA applications • Continue to investigate algorithms to improve CA performance