1 / 15

EXAMPLE: Adding Prefetch Inst.

EXAMPLE: Adding Prefetch Inst. HPArch Research Group. Study The Impact of S/W Pref. in GPUs. PTX supports non-binding Prefetch instructions since V2.0 prefetch {.space}.level – Prefetch to L1/L2 from global/local memory prefetchu.L1 – Prefetch to Uniform cache For the study

hisa
Download Presentation

EXAMPLE: Adding Prefetch Inst.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXAMPLE: Adding Prefetch Inst. HPArch Research Group

  2. Study The Impact of S/W Pref. in GPUs • PTX supports non-binding Prefetch instructions since V2.0 • prefetch{.space}.level – Prefetch to L1/L2 from global/local memory • prefetchu.L1 – Prefetch to Uniform cache • For the study • Ocelot should parse, decode and emulate Prefetch instructions • Trace Generator should emit Prefetch instructions • new opcodes should be added • MacSim should detect new opcodes, generate memory requests and fill the appropriate cache with the returned cache block MacSim Tutorial (In ICPADS 2013)

  3. Changes to Trace Generator MacSim Tutorial (In ICPADS 2013)

  4. Changes to Trace Generator – I enumTR_OPCODE_ENUM_{ XED_CATEGORY_INVALID = 0, XED_CATEGORY_3DNOW, … … PREF_GM_L1, PREF_GM_L2, PREF_LM_L1, PREF_LM_L2, PREF_UNIFORM, TR_OPCODE_LAST } TR_OPCODE_ENUM; • Add new opcodes for new instructions • This enum is the list of opcodes supported by MacSim • New opcodes added to enum in trace_read.h in MacSim as well MacSim Tutorial (In ICPADS 2013)

  5. Changes to Trace Generator – II … if (ptx_inst->opcode == ir::PTXInstruction::Prefetch) { switch (ptx_inst->addressSpace) { case ir::PTXInstruction::Global: if (ptx_inst->cacheLevel == ir::PTXInstruction::L1) inst_info->opcode = PREF_GM_L1; else inst_info->opcode = PREF_GM_L2; break; case ir::PTXInstruction::Local: if (ptx_inst->cacheLevel == ir::PTXInstruction::L1) inst_info->opcode = PREF_LM_L1; else inst_info->opcode = PREF_LM_L2; break; } else if (ptx_inst->opcode == ir::PTXInstruction::Prefetchu) { inst_info->opcode = PREF_UNIFORM; } … • Ocelot uses its own set of opcodes for PTX instructions • During trace generation, translate opcodes used by Ocelot into opcodes used by MacSim MacSim Tutorial (In ICPADS 2013)

  6. Changes to MacSim MacSim Tutorial (In ICPADS 2013)

  7. Changes to MacSim – I enumMem_Type_enum { NOT_MEM, MEM_LD, … … MEM_SWPREF_GM_L1, MEM_SWPREF_GM_L2, MEM_SWPREF_LM_L1, MEM_SWPREF_LM_L2, MEM_SWPREF_UNIFORM, NUM_MEM_TYPES } Mem_Type; enumMem_Req_Type_enum{ MRT_IFETCH, MRT_DFETCH, … … MRT_SW_DPRF_L1, MRT_SW_DPRF_L2, MRT_SW_DPRF_UNIFORM, MAX_MEM_REQ_TYPE } Mem_Req_Type; Add new opcodes to enum in trace_read.h Add new types to Mem_Typeenum in uop.h and request types Mem_Req_Type in memreq_info.h MacSim Tutorial (In ICPADS 2013)

  8. Changes to MacSim – II … switch (pi->m_opcode) { … case PREF_GM_L1: trace_uop[0]->m_mem_type = MEM_SWPREF_GM_L1; break; case PREF_GM_L2: trace_uop[0]->m_mem_type = MEM_SWPREF_GM_L2; break; case PREF_LM_L1: trace_uop[0]->m_mem_type = MEM_SWPREF_LM_L1; break; case PREF_LM_L2: trace_uop[0]->m_mem_type = MEM_SWPREF_LM_L2; break; case PREF_UNIFORM: trace_uop[0]->m_mem_type = MEM_SWPREF_UNIFORM; break; … } … During translation of instructions into uops in trace_read.cc set memory access type of Prefetch instructions to appropriate type MacSim Tutorial (In ICPADS 2013)

  9. Changes to MacSim – III if (uop->m_mem_type == MEM_LD_CM || uop->m_mem_type = MEM_SWPREF_UNIFORM) { uop_latency= core->get_const_cache()->load(uop); } … … … // MEM_LD, MEM_SWPREF_GM_L1, MEM_SWPREF_GM_L2, MEM_SWPREF_LM_L1, MEM_SWPREF_LM_L2 else { uop_latency= m_simBase->m_memory->access(uop); } • Modify exec_c::exec() to issue requests to the appropriate cache • On cache miss, a request is forwarded to L2 • Define priority value of prefetch requests in g_mem_priority[] in memory.cc MacSim Tutorial (In ICPADS 2013)

  10. Changes to MacSim – IV //L1 cache access dcu_c::access() { … // on cache miss req_type = <appropriate_type> function<bool(mem_req_s*)> done_func = dcache_fill_line_wrapper; new_mem_req(req_type, …, done_func, …); … } new_mem_req(req_type, …, done_func, …) { … init_new_req(…, req_type, …, done_func, …); … } init_new_req(…, req_type, …, done_func, …) { … req->m_type = req_type; req->m_done_func= done_func; … } • For each request sent to L2, the request type and the function pointer, done_funcare set MacSim Tutorial (In ICPADS 2013)

  11. Changes to MacSim - V // process_fill_queue() for L2 dcu_c::process_fill_queue() { … switch (req->m_state) { … case MEM_FILL_NEW: … if (req->m_type != MRT_SW_DPRF_L1) { … m_cache->insert_cache() … } … } … } • All returned cache blocks are inserted into the L2 fill queue • process_fill_queue() processes entries in the fill queue • Here skip insertion into L2 for prefetches into L1 cache MacSim Tutorial (In ICPADS 2013)

  12. Changes to MacSim - V // process_fill_queue() for L2 dcu_c::process_fill_queue() { … if (req->m_done_func && !req->m_done_func(req)) { … } … } process_fill_queue() also calls the function pointed to by done_func MacSim Tutorial (In ICPADS 2013)

  13. Changes to MacSim - VI // process_in_queue() for L2 instance dcache_fill_line_wrapper(mem_req_s* req)() { … if (req->m_type != MRT_SW_DPRF_L2) { if (m_simBase->m_memory->done(req)) { … } } … } • Default done_func is dcache_fill_line_wrapper() which calls done() which does insertion into L1 • for prefetchesinto L2 you can skip calling done() MacSim Tutorial (In ICPADS 2013)

  14. Changes to MacSim – VII booldc_frfcfs_c::sort_func::operator()(constdrb_entry_s* req_a, constdrb_entry_s* req_b) { boolis_prf_a = <>; boolis_prf_b = <>; … if (!is_prf_a && is_prf_b) return true; if (is_prf_a&& !is_prf_b) return false; … } • Assigning lower priority to prefetches in DRAM • sort() function of your DRAM controller class MacSim Tutorial (In ICPADS 2013)

  15. Questions? MacSim Tutorial (In ICPADS 2013)

More Related