1 / 39

Development of a Ray Casting Application for the Cell Broadband Engine Architectur e

This project explores programming challenges and communication methods in creating a ray casting application for the Cell processor, focusing on SPE optimization and data transfer with DMA.

carolynnj
Download Presentation

Development of a Ray Casting Application for the Cell Broadband Engine Architectur e

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of a Ray Casting Application for the Cell Broadband Engine Architecture Shuo Wang University of Minnesota Twin Cities Matthew Broten Institute of Technology, University of Minnesota Twin Cities Professor David A. Yuen

  2. Overview • General Overview • Programming for the Cell Architecture • Ray Casting Theory and Mathematics • Ray Casting Application Development

  3. Overview: Why the Cell? • Novel • 2005 was when Linux support became available • Affordable • A PlayStation 3 costs $650 • Fast • 25.6 GFLOPs

  4. Overview: What To Do • Computationally challenging even if it’s mathematically simple • Accuracy is less crucial than speed • Easy to visualize

  5. Overview – Results of Internship • IEEE 2007 in Sacramento, CA • SuperComputing 2007 in Reno, NV

  6. Programming for the Cell Architecture

  7. Programming for the Cell Architecture: Challenges • Cooperation between PPE and SPEs • SPE memory limitations • SPE side code vectorization

  8. Programming for the Cell Architecture:Introductory Knowledge • Application Organization • Division of Computational Labor • SPE Program Initialization • Communication Between PPE and SPEs • Data Transfer with DMA • Manual Optimization • Automatic Optimization

  9. Programming for the Cell Architecture:Application Organization Top Level ./spu

  10. Programming for the Cell Architecture:Parallelism Models Task Parallelism Data Parallelism

  11. Programming for the Cell Architecture:SPE Thread Creation • PPE program uses interface provided by libspe, a SPE runtime management library extern spe_program_handle_t MyProgram_spu; int main(int argc, char **argv) { ... speid_t spe_id; spe_id = spe_create_thread(threadGroup, &MyProgram_spu, &controlBlock, ... ); ... }

  12. Programming for the Cell Architecture:Communication Using Mailboxes • PPE Mailbox Queue • PPE Interrupt Mailbox Queue • SPU Mailbox Queue Mailboxes provide a method of communication between the PPE and the SPEs 3 Mailbox queues are provided in the Memory Flow Controller of each SPE

  13. Programming for the Cell Architecture:Communication Using Mailboxes PPE Mailbox Queue: SPE writes message, PPE reads message SPE side code to write a message: unsigned int value; spu_write_out_mbox(value); PPE side code to receive a message: unsigned int value; value = spe_read_out_inbox(spe_id); • PPE Interrupt Mailbox Queue works similarly

  14. Programming for the Cell Architecture:Communication Using Mailboxes SPU Mailbox Queue: PPE writes message, SPE reads message PPE side code to write a message: unsigned int value; spe_write_in_mbox(spe_id, value); SPE side code to receive a message: unsigned int value; value = spu_read_in_mbox();

  15. Programming for the Cell Architecture:Data Transfer with DMA SPU tells DMA engine that data is needed in main memory DMA engine requests data from main memory DMA engine copies data from main memory to the local store

  16. Programming for the Cell Architecture:Data Transfer with DMA • Each MFC can process a queue of 24 DMA commands • Each transfer must be a multiple of 16 bytes • Maximum of 16 KB per transfer

  17. Programming for the Cell Architecture:Data Transfer with DMA Primary Operations: The GET command copies data from main memory to local store The PUT command copies data from local store to main memory GET PUT SPE SIDE mfc_get mfc_put PPE SIDE spe_mfc_get spe_mfc_put

  18. Programming for the Cell Architecture:Control Blocks What is a control block? Example control block: typedef struct _control_block { uintptr32_t arrayAddress1; unsigned int value1; uintptr32_t arrayAddress2; unsigned int value2; } control_block;

  19. Programming for the Cell Architecture:Data Transfer with DMA General Approach (main memory to local store): PPE: define and initialize control block in main memory PPE: pass reference to control block when creating SPE thread SPE: allocate memory in local store for control block and other data to be transferred SPE: copy control block from main memory to local store SPE: use address in control block to copy other data from main memory to local store

  20. Programming for the Cell Architecture:Pipelining Optimization not in place: Pipeline Optimization:

  21. Programming for the Cell Architecture:Compilers GCC vs IBM XLC Data from Eric Rollins “Ray Tracing” Application Graph provided by Eric Rollins: http://eric_rollins.home.mindspring.com/ray/ray.html

  22. Ray Casting Theory and Mathematics - Overview

  23. Ray Casting Theory and Mathematics: Math • Triangles defined by three vertex points A, B, and C in R3 • If there is an intersection between the ray and the triangle, then P = E + tV, where • P = point of intersection between ray and triangle • E = location of eye • V = directional ray from the eye to the pixel of interest • t represents the distance from the point of intersection to E along V

  24. Ray Casting Theory and Mathematics: Math • If <N, V> is 0, where N is the normal of the triangle, then there is no intersection, try the next pixel. • Else, compute P • D= -<N,A> (A is a point of the triangle)t = -(<N,Q> - D) / <N,V>P = E + tV • Check that P lies in the triangle defined by A,B,C:if P is in the triangle ABC the sign of these three will be the same:inA = <N,(P X A)>inB = <N,(P X B)>inC = <N,(P X C)> • Calculate diffusions

  25. Ray Casting Theory and Mathematics – Pseudo Code • http://lilli.msi.umn.edu/ps3svn/ray/branches/esevre/spu/trace.h • For (width of screen) { • For (height of screen) { • For (all objects in screen) { • Find edges of objects • if (ray crosses object) { • Calculate Reflections • } • } • } • }

  26. Ray Casting Application Development

  27. Ray Casting Application Development • Overview • Development Roadmap • Current Capabilities • Implementation Details • Future Goals

  28. Ray Casting Application: Overview • Created an enhanced version of Eric Rollins' open source “Real-Time Ray Tracing” application (1) (2) (3)

  29. Ray Casting Application:Development Roadmap (1) Learning and exploration of Eric Rollins' “Ray Tracing” package (2) Enhancement of “trace algorithm” for rendering of triangles (2) Implementation of translation and rotation functionality (3) Implementation of triangle initialization and transfer mechanism

  30. Ray Casting Application:Development Roadmap (1) Exploration of Eric Rollins' open source application

  31. Ray Casting Application:Development Roadmap (2) Enhancement of “trace algorithm” for rendering of triangles

  32. Ray Casting Application:Development Roadmap (3) Implementation of translation and rotation functionality

  33. Ray Casting Application:Development Roadmap (4) Implementation of triangle initialization and transfer mechanisms • Each triangle structure: - contains 3 float vectors; each float vector contains three coordinates (X, Y, Z) and represent a point of the triangle - consumes 48 bytes of memory since each float vector requires 16 bytes • DMA transfers must be 16 KB or less and a size that’s a multiple of 16 bytes • - This amounts to a max of 336 triangles per transfer • About 189 KB free in local store • - Enough room for 11 transfers of 336 triangles which is a total of 3969 triangles

  34. Ray Casting Application:Current Capabilities

  35. Ray Casting Application:Implementation Details Application Organization: two programs: - one executes on the PPU - one runs on each SPU Division of Labor: task parallelism where each SPE: - holds identical data in its local store - is responsible for doing computations for 1/6 of lines rendered to screen

  36. Ray Casting Application:PPE Program Life Cycle (1) (2) (3) (4) (5)

  37. Ray Casting Application:SPE Program Life Cycle (1) (2) (3) (4) (5)

  38. Ray Casting Application:Future Goals • Visualize larger datasets- Now: limited to the rendering of about 4000 triangles - Goal: develop mechanisms to render hundreds of thousands of triangles • Distribute computation over several PS3s- Now: all computation performed on single PS3- Goal: build a cluster of PS3s and increase application performance by dividing workload among PS3s in the cluster

  39. For more information: http://marina.geo.umn.edu/ps3-wiki PS3 Wiki URL

More Related