1 / 55

Collision Detection Design & Final Project Topic

Collision Detection Design & Final Project Topic. Brandon Smith November 5, 2008 ME 964. contact_data Allocation. Possible ways to allocate the contact_data array: Allocate contact_data[ N(N-1)/2 ] Allocate contact_data[ n_contacts ]

avinoam
Download Presentation

Collision Detection Design & Final Project Topic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collision Detection Design & Final Project Topic Brandon Smith November 5, 2008 ME 964

  2. contact_data Allocation • Possible ways to allocate the contact_data array: • Allocate contact_data[ N(N-1)/2 ] • Allocate contact_data[ n_contacts ] • To avoid creating a huge array, I chose the second method: • 1st Kernel Call • Find the number of contacts. • 2nd Kernel Call • Calculate the contact_data for each contact.

  3. Kernel Call Setup • The total number of contact tests is: n_tests = N(N-1)/2 • The total number of concurrent threads is: n_concurrent_threads = N_SMs * BLOCKS_PER_SM * THREADS_PER_BLOCK • Each thread will perform several tests: n_test_per_thread = n_tests / n_concurrent_threads + 1

  4. Collide Kernel: Indexing • Given the block number and thread number, a range of test numbers (ki,kf) are generated: thread_id = bx*THREADS_PER_BLOCK + tx; ki = tests_per_thread*thread_id + 1; kf = ki + tests_per_thread - 1; • Given a test number k, the indices (i,j) can be calculated: • k = ( (j-1)2-(j-1) )/2 + I • k <= (j2-j )/2

  5. Collide Kernel: Contact Testing • __global__ function calls __device__ test to actually perform the contact test • In the first pass it simply tests for contact • In the second pass it calculates contact_data. • atomicAdd is used to count the number of contacts • Keeps one contact tall for all concurrent threads • No need for condensation of results from each thread • Hassle to compile: nvcc.exe -ccbin "C:\Program Files\Microsoft Visual Studio 8\VC\bin" -c -arch sm_11 -D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Wp64 /O2 /Zi /MT " - I"C:\CUDA\include" -I"C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc" -o Release\collide.obj collide.cu

  6. Final Project: Monte Carlo Radiation Transport • Objective: • Compute radiation flux or derived quantities over a spatial/temporal domain. • Method: • Follow the life of individual particles through the domain. • Quality of Results: • Statistical error is proportional to 1/sqrt(n_particles) • Difficult to get even particle distribution across the domain • Many particles are required to achieve low statistical error

  7. Example: Fusion Reactor Shielding • The GPU Advantage: • Increase the number of simulated particles • Decrease statistical error

  8. Tasks during a Particle’s Life • Birth: particles are created at a source • Ray-cast: the distance to the next surface is calculated • Collision: the particle interacts with matter • Next volume: the particle crosses a boundary into another material • Death: if the particle is absorbed, it is killed.

  9. Existing Fortran Code • Geometry: • 3-D geometry supporting boxes and spheres • Physics: • Only neutral particles (neutrons, photons) • No energy dependence • No time dependence • Materials: • Simple materials (only a few isotopes) • Sources: • point, line, area, volume • Results: • mesh tallies and volume tallies

  10. Potential for Parallelism • Usually we can assume each particle is independent, unless: • criticality, weight windows, etc… • Each thread could calculate independent particle trajectories • embarrassingly parallel • When enough particles are simulated, condense the results from each thread

  11. Implementation Challenges • Current code is in Fortran 90 • ~1700 lines • Has anyone tried F2C? • Designed for Fortran 77 • Particles are tracked on a large mesh • ~1 M mesh elements, accessed once per particle • Mesh will need to be in global memory • Mesh will be accessed with an atomic function for data sharing? • Ensure that random numbers are not repeated • Use a pseudo-random number generator for each thread • Each thread will need a different random seed • Check to ensure sufficiently large stride • Could schedule rendezvous to check for solution convergence • Stop simulation once statistical error falls below a set value ( 5% )

  12. ME 964: Project ProposalVikalp Mishra

  13. Collision Detection • Aim • Solve collision detection problem given N rigid spheres in 3D space • Approach • Brute Force • Compare each sphere with every other sphere • O(n2) • If distance between centers is • more than sum of radii  No collision • Less than sum of radii  Collision • When collision detected • compute normal and object IDs

  14. Final Project: Bone FEA • Title: • GPU based Finite Element Analysis of Femur • Femur • Thigh bone: Bone between hip and knee joint • Longest/ strongest bone in the body

  15. Why study femur ? • To better understand bone mechanics/ properties • Across species • To understand the impact & extent of injury under various loading • Use in sports medicine & surgery • To study impact of DNA change on bone formation/ growth • Improve the process of cloning to develop better species • To study effect of nutrition cycle on bone development

  16. Background • In past • Experiments were done to study bone behavior / material properties • Test performed • Fracture test • Bending test • Torsion test • Experiments on mouse / pig • Costly and time consuming • Only one experiment per sample possible • Alternative • Capture bone geometry and material properties • Use computational tools for various analysis • Saves time/ money

  17. Typical approach • Given: • CT scan data of bone (geometry) • Material property distribution • Loading scheme • 3 or 4 point loading / Torsion test / Bending test

  18. Use of FEA • Use Finite Element Method • To capture geometry • Physical properties • Hexahedral elements • Tetrahedral elements • Formulate FE problem • Use boundary conditions to define element level • stiffness matrix (Ke) • load vector (Fe) • Assemble elements in global matrix (Kg, Fg) • Solve FE problem • Obtain deflection (u = Kg-1Fg) • Compare with experimental results • Verify model

  19. Bottleneck • Bone geometry is complex • Large number of elements required • For pig bone ~ 0.5 – 1 million elements (coarse mesh)

  20. GPU based approach • Potential for GPU based computation • Same set of computation for each element • Stiffness matrix computation (Ke) • Load vector computation (Fe) • Different data sets for each element • SIMD • Approach • Use GPU for element level computation • Account for 67% of total time • Use CPU for global matrix inversion • Compare results with MATLAB based model

  21. ME 964 – Midterm and Final Projects Saigopal Nelaturi

  22. CUDA Collision detection • Problem – Given n spheres in 3d space, compute all pair-wise collisions • Approach – Brute force algorithm with quadratic complexity • Idea – every pair of spheres can be tested independently, and in parallel

  23. Task Parallelism – pseudo code

  24. Final Project • Constructive operators in SE(3) • SE(3) is the group of 4x4 rigid transformation matrices • Point in SE(3) = matrix • Set in SE(3) = set of matrices • Can devise operators using Boolean algebra and matrix multiplication (group operation)

  25. Example How to compute workspace? Position + orientation of coordinate frame on coupler Use set formulation in SE(3) – Intersection of sets Embarrassingly parallel process! Many other applications in design/geometric modeling/ motion planning …

  26. Goals • For very large sets of 4x4 transformation matrices , implement • Intersection – pairwise comparison between matrices • Convolution – pairwise multiplication between matrices • Show some workspace computations (hopefully in 3d) If possible, implement • Deconvolution – combination of pairwise intersection/multiplication

  27. Midterm Project Ram Subramanian

  28. The Task To solve a collision detection problem: Given an arbitrary number of rigid spheres with known radii, distributed in the 3D space, To find out which spheres are in contact/penetration with which other spheres.

  29. The Algorithm • One pass over array to determine collisions. • One pass over all the collided bodies to compute the values of collision required. • Two Kernel Calls. • O(n.(n-1)/2)

  30. Indexing • Every Thread gets a Reference body (Body A) and a Comparison body (Body B). • Each block has 512 threads (assumption 1). • Each row in a grid has 512 blocks (assumption 2). • Total number of threads is n(n-1)/2. • Compute the index value with the thread ID and block ID. • Using this index value and the number of bodies (using the div and mod) the index of the Body A and Body B, respectively, can be determined.

  31. Final Project - Image Processing on the GPU Goal – Implement Image Processing Algorithms for the GPU. Eventually have an image processing library for the GPUs using CUDA Motivation – Most image processing tasks involve operating on individual pixels or a region of the image. Many of these tasks are embarrassingly parallel.

  32. Proposed Implementations • Harris Corner Detector Motivation – This is an algorithm used in the first stage processing of many other Image Processing and Computer Vision algorithms (e.g. : 3D reconstruction, Scene Stitching, Object Tracking, Visual Servoing, etc… ) Ambitious Goal Implement an image stitching algorithm or 3D reconstruction algorithm that will stitch two images together using the Harris Corner detector.

  33. Harris Corner Detector • At every pixel in the image place a window (larger the better, e.g. 5x5) call it W • Assume either 4 or 8 neighborhood of the current pixel position • Slide the window to each neighboring pixel, giving W1, W2 …Wi (where i = 4 or 8)

  34. Harris Corner Detector Contd.. • Compute the sum of squared differences (SSD) between W and each Wi • A Corner is detected when all SSD values are below a given threshold set by user (or the smallest value is below a given threshold).

  35. Midterm and Final Projects Toby Heyn ME 964 11/06/08

  36. Midterm Project • Spatial Subdivision • Partition space into uniform grid (cells) • For each object, determine which cells the object overlaps • Objects can only collide if they occupy the same cell or adjacent cells

  37. Midterm Project • Construct Cell ID Array • Each thread determines the cell IDs of the cells its sphere occupies, loads into Cell ID Array • Sort Cell ID Array • Radix Sort Algorithm • Create Collision Cell List • Scan sorted Cell ID Array, look for changes in cell ID • Write Collision Cell List with Cell ID Array indices, number of objects in the cell • Traverse Collision Cell List • One thread per Collision Cell • Each thread checks all collision pairs in the Collision Cell • Collisions are written to output

  38. Midterm Project • Radix Sort • Sorts cell IDs in several passes • Sorts low order bits before higher order bits, retaining order of IDs with same cell ID • This helps in a later step • Takes 4 passes to sort the 32 bit (4 byte) integers • Makes use of parallel scan operation

  39. Final Project • Default final project – granular dynamics using collision detection from midterm • Incorporate midterm collision detection into Chrono::Engine multibody dynamics engine • Simulate Mars Rover with many (millions) of bodies

  40. Final Project • Chrono::Engine • C++ API • Commands for creating simulation environment, populating with bodies, creating constraints, etc • Uses Bullet for collision detection • Has been used to solve systems with ~100,000 bodies • Has a CUDA parallelized dynamics solver (based on LCP formulation)

  41. Final Project • Each wheel is a union of primitives • Terrain consists of ~5000 spheres (much too coarse) • Obstacles: • Non spherical bodies in wheels • Large mass difference between small grain and large rover

  42. Final Project • Handling non-spherical bodies • Represent the surface of the body as a composite of smaller spheres • New representation has more bodies, but only spheres • Maintain same dimensions, mass, inertia properties

  43. Final Project • Parallelism • Collision detection • Many bodies/collision pairs to check • Spatial sub-division: geometric decomposition, task decomposition • Dynamics • Many equations of motion to solve • Geometric decomposition • Potentially many non-spherical bodies to process in parallel

  44. Final Project • Remaining Issues • Re-use of data • After solving the collision detection problem once, can data be reused to reduce the size of the problem to be solved in subsequent steps? • Automate handling of non-spherical geometry • Can an automated method be created to represent arbitrary geometry with spheres?

  45. ME 964 Midterm & Final Project Justin Madsen

  46. Outline • Midterm & final are the same project • “default scheme” • Collision detection method • Baraff • Brief overview of 2 phase algorithm • Ideas for CUDA implementation • Ideas for final project • Integrating CUDA collision detection with other dynamics programs

  47. Efficient collision detection • Baraff method • Axis Aligned bounding boxes (AABB) • Simple yet efficient • Only dealing with spheres • Can be extended to convex polyhedra • (actually don’t need bounding boxes for spheres, it’s a special case) Figure 1. AABB size and orientation depends on the local coordinate system

  48. Overview of method • One dimensional case (x-axis) • Sort & Sweep • Each object has a length along the axis according to the AABB • Data: beginning and end values (b and e) of each box • Sorted lowest to highest according to these values Figure 2. Six objects and their AABB axes [1]

  49. Determine possible contacts • After sorting, collision detection happens in two phases • Phase 1: broad phase • Traverse the axis; add objects to “possible contact list” when biis encountered • For one dimensional case, when biadded to the list, it means contact occurs with all other objects in the list

  50. Three dimensional case • Phase 1 for 3-D: • Extend one dimensional contact check by checking b and e for values along the y and z axes of the other objects in the list • If contact check comes back positive for all 3 axes, add the object to the “possible contact list” • Possible because…

More Related