610 likes | 782 Views
ME964 High Performance Computing for Engineering Applications. Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011. “The real problem is not whether machines think but whether men do.” B. F. Skinner.
E N D
ME964High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan Negrut, 2011ME964 UW-Madison
Before We Get Started… • Last time • Midterm Project topics 1 and 2 • Discrete Element Method on the GPU. Area coordinator: Toby Heyn • Collision Detection on the GPU. Area coordinator: Arman Pazouki • Today • Midterm Project topics 3 and 4 • Finite Element Method on the GPU. Area coordinators: Prof. Suresh and Naresh Khude • Sparse direct solver on the GPU (Cholesky). Area coordinator: Dan Negrut • Midterm Project Related Issues • Midterm Project is due on 04/13 at 11:59 PM (use Learn@UW drop-box) • Intermediate report due on 03/22 at 11:59 PM (use the same Learn@UW drop-box) • Each area coordinator • Will provide a test problem for you to test your GPU implementation • Will also assist you with questions related to the non-programming aspects (the “theory”) behind the topic you chose • You can continue your Midterm Project (MP) and have it become your Final Project (FP) • In this case you will be expected to show how the FP implementation is superior to your MP implementation • Other issues • HW5 due tonight at 11:59 PM • Use Learn@UW drop-box to submit homework
Finite Element Analysison the GPU?Krishnan Sureshsuresh@engr.wisc.eduAssociate Professor
Finite Element Analysis • Computer simulation of engineering models • Physics: • Structural, thermal, fluid, … • Mode: • Static, modal, transient • Linear, non-linear, multi-physics
[Gordon; JPL] Why GPU? Hours or even days of CPU time.
Question Can one exploit graphics programmable units (GPU) to speed-up Finite Element analysis? +
Discretize Element Stiffness Assemble/ Solve Post- process Structural Static FEA Model
Nonlinear Optimization FEA: Variations Order/Hybrid Direct/Iterative Tet/Hex/… Model Discretize Element Stiffness Assemble/ Solve Post- process
FEA: Challenges Order/Hybrid Direct/Iterative Tet/Hex/… Model Discretize Element Stiffness Assemble/ Solve Post- process • Accuracy • Automation • Speed Nonlinear Optimization
Discretize Element Stiffness Post- process Typical Bottleneck Model Assemble/ Solve
Discretize GPU & Engineering Analysis Model GPU? CPU Not a good candidate for GPU!? Discretization • Data: Small b-rep (+) • Logic: Complex (-) • Threads: Few (-)
Hex 2nd Order Element Stiffness Hex Hybrid Element Stiffness Model Discretize CPU CPU GPU? Element Stiffness • Data: O(N) (+/-) • Logic: Simple (+) • Threads: N (+)
(27 Nodes) Stiffness: Hex 2nd Order (8 Corners) • 8 Corners~100 Bytes Data (x y z) • 27 Nodes~ M = 81 DOF (u v w) • kij ~ Gaussian integration • 30 flops
Discretize Element Stiffness Typical Bottleneck Model Assemble/ Solve
Direct vs. Iterative K is sparse & usually symmetric P.D Iterative Direct (GPU Variation: Assembly-free) Note: Nvidia offers CuBLAS-3 dense matrix library
Direct Sparse on GPU (1) (2006)
Direct Sparse on GPU (2) (2008)
Iterative Sparse on GPU (1) (2008) • Jacobi preconditioned conjugate gradient • ATI GPU • Speed-up 3.5.
Iterative Sparse on GPU (2) • Double precision real world SpMv • CPU (2.3 GHz Dual Xeon): 1 GFLOPS • GPU (GTX 280): 16 GFLOPS • Speedup ~ 16
FEA/GPU Class Projects? • Complete < 6 weeks • Important (publishable) • Pilot code
FEA/GPU Class Projects? • GPU Friendly Preconditioners for Thin Structures • Research papers • OpenCL and ViennaCL Pilot Code • Topology Optimization • Research papers • CUDA code • Others • Can discuss …
Thin Structure? Large K
Preconditioners? • Iterative Methods: • GPU methods available for K*u • Typical preconditioners: simple Jacobi, … • Poor preconditioner … slow convergence • Objective: • GPU friendly preconditioner for thin structures
GPU Friendly Speed-up with Preconditioner Speed-up without Preconditioner
FEA/GPU Class Projects? • GPU Friendly Preconditioners for Thin Structures • Research papers • OpenCL and ViennaCL Pilot Code • Topology Optimization • Research papers • CUDA code • Others • Can discuss …
D Topology Optimization V = 50% Stiffest topology for a given volume? Where to remove material? [Sigmund 2001] Multi Objective + Topology Optimization = MOTO
Demo Matlab code www.ersl.wisc.edu
Pareto Optimal Designs • Purely pareto optimal
SIMP Pareto-Method 3-D
3-D GPU Implementation Multi-grid Topology Optimization on the GPU (IDETC conf. 2011)
The Schur Complement Problem inMulti-Body Dynamics Applications
Formulation Framework • Position: • Orientation: Euler parameters, • Translational Velocity: • Angular velocities
Numerical Solution of the Newton-Euler Constrained Equations of Motion • One has to solve a set of Differential Algebraic Equations (DAEs) to find the time evolution of a mechanical system • Most often the numerical solution of the DAEs requires the solution of a linear system of the form:
Approach Followed • First solve the “Reduced System” for : • Then recover accelerations
Iterative Solution of the Reduced System • Define positive definite Reduced Matrix • Preconditioned Conjugate Gradient • requires computation at time of • requires preconditioning:
Computing Time step n, iteration (k): • A thread is associated with each body • We’ll look at how thread 9 does its share of work to compute