1 / 16

GPU Functional Simulator

GPU Functional Simulator. Yi Yang yangyi@eecs.ucf.edu CDA 6938 term project Orlando April. 20, 2008. Outline. Motivation and background Software design Implementation Test cases Future work. Motivation and background. Motivation Better understanding of GPU

ling
Download Presentation

GPU Functional Simulator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Functional Simulator Yi Yang yangyi@eecs.ucf.edu CDA 6938 term project Orlando April. 20, 2008

  2. Outline • Motivation and background • Software design • Implementation • Test cases • Future work

  3. Motivation and background • Motivation • Better understanding of GPU • Improving the GPU architecture. • Background • Two GPU manufacturers: Nvidia and ATI • Similar programming mode: • block vs group • share memory vs lds • ATI uses vliw • We want to work on both.

  4. Software design • Programming Model Layer • Platform independent • Define abstract part, most of ISA: ISA, Register • Implement similar most of resource: group, wavefront, … • Hardware Implementation Layer • Implement the abstract part of PML for different platform • ATI • NVIDIA

  5. Programming Model Layer • Code parser to get instruction list • Allocate resource by the configuration file: group, thread, share memory, memory, wavefront schedule. • Load input stream from txt file. • Wavefront schedule executes instruction list on the wavefronts • When instruction is executed on one thread, instruction update the resource: register of thread, share memory of group, texture(global) memory of gpuprogram. • Save the output memory to txt file

  6. Code Parser(HIL) • read the assembly and parse it into instructions • INST LABEL NO: unique # of instruction • Stream core label: one of x, y, z, w, t • INST: • Operand

  7. Operand(HIL) • Global Purpose Register: 0 y: ADD ____, R0.x, -0.5 • Previous Vector(x, y, z, w) and Previous Scalar (t) 3 t: F_TO_I ____, R0.x 4 t: MULLO_UINT R1.z, 1, PS3 • Temporary Register 3 t: RCP_UINT T0.x, R1.x • Constant Register 1 z: AND_INT ____, R0.x, (0x0000003F, 8.828180325e-44f).x

  8. Instruction (HIL) • Opcode dst, src1, src2, … • ADD_INT R0.x, R1.x, R2.x • Dst, src1, src2 is Operand • GPUProgram hold instruction lists. • Instruction implement the execution • Receive the thread as parameter, and execute on the thread • For example: • ADD_INT R0.x, R1.x, R2.x • Instruction get value of R1.x, R2.x from thread • Save value of R1.x+R2.x as R0.x to thread

  9. Memory Handle (HIL) • Texture Memory • 0 SAMPLE R1, R0.xyxx, t0, s0 UNNORM(XYZW) • EXP_DONE: PIX0, R0 • Cache support (future work) • Global Memory • 6 RD_SCATTER R3, DWORD_PTR[0+R2.x], ELEM_SIZE(3) UNCACHED BURST_CNT(0) • 03 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R1.x].x___, R0, ELEM_SIZE(3) • Coalesced support: First thread handle (future work) • Use the text file as input and output.

  10. Thread(PML) • Belong Group • Hold Data Unit(HIL) • 128 bit (x, y, z, w) + 32 bit ( t ) • Most of resource is 4 component: register • One thread processor is five-way, and have 5 output (x,y,z,w,t) • Hold mapping table of Register (GPR, CR, TR) to Data Unit

  11. Wavefront(PML) • Hold Program counter • Hold the thread id list • Belong to Group

  12. Group(PML) • Hold threads • Hold wavefront • Belong to GPUProgram • Hold Share memory(PML) • Instruction access the share memory through Group • Instruction (HIL) • 12 LOCAL_DS_WRITE (8) R0, STRIDE(16) SIMD_REL • 17 LOCAL_DS_READ R2, R2.xy WATERFALL

  13. Wavefront Schedule(PML) • Current version (function simulator) • Pick up one instruction, let all wavefronts execute this operation. • for time simulator • Decided by the hardware capacity and software request • Decided by the static instruction list • Decided by execution result

  14. GPUProgram(PML) • Code parser parses instruction list • Load input stream from txt file to memory. • Allocate resource by the configuration file: group, thread, share memory, memory, wavefront schedule. • Wavefront schedule executes instruction list on the wavefronts • When instruction is executed on one thread, instruction update the resources: register of thread, share memory of group, texture(global) memory of gpuprogram. • Save the output memory to txt file

  15. Test case • Sum, division, subtract, multiplication • Support texture memory • Support different data types (int, float, uint, int1, int4…) • Support fundamental ALU operations (+-*/, shift, and, compare, cast) • domain_sum • Support global memory read and write • Sum_share_memory • support share memory read and write • Support group, wavefront • Branch and Loop: to be done • Support constant buffer • Loop operation

  16. Future work • Now support 30 of 200 instructions for ATI • Support Nvidia, optimize two layers design • Timing simulator

More Related