180 likes | 250 Views
An Analytical Model to Exploit Memory Task Scheduling. Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory. Motivation. Off-chip bandwidth on CMPs is a precious resource
E N D
An Analytical Model to Exploit Memory Task Scheduling Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory
Motivation • Off-chip bandwidth on CMPs is a precious resource • If too many cores execute memory operations simultaneously • Bandwidth contention ↑ memory access latency ↑
Objective • Software task scheduling to reduce bandwidth contention and improve system performance • Utilize stream programming property to decouple threads into memory and compute tasks • Avoid too many concurrent memory tasks • Challenge: how many concurrent memory tasks is allowable to maximize system performance?
Stream Programming Style • Decouple computation and memory access • Gather Compute Scatter • Example
Exploiting Stream Programming Properties • Task division according to stream programming property • Memory tasks • Fetch data from off-chip memory to on-chip caches • Compute tasks • Directly access data from on-chip cache without cache misses
Memory Task Scheduling • Main Idea • Restrict Memory Task Limit (MTL) to reduce memory bandwidth contention • MTL : number of memory tasks that can be scheduled simultaneously • MTL↓ bandwidth contention↓ memory access latency ↓ • MTL↓ scheduling constraint↑ CPU may unnecessarily stay idle
Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=1 performs best
Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=2 performs best 10
Performance Modeling for Different MTLs • Develop an analytical model to analyze performance speedup under different MTL constraint • Given Tmk, Tc, MTL=k, n, t • Tmk: average execution time of memory tasks under MTL=k • Tc: average execution time of compute tasks • n: number of processor cores • t: number of memory tasks • Estimate performance speedup under MTL=k
Would CPU Idle under MTL=k? • If then CPU always busy • If then CPU sometimes idle • Example: n=4 • MTL=1 CPU won’t idle if • MTL=2 CPU won’t idle if • MTL=3 CPU won’t idle if
If CPU always busy If CPU sometimes idle Performance Model
Performance Trend • Comparing workloads with same Tmk, same optimal MTL, but different Tc • Optimal MTL: MTL that achieves the best speedup
Experimental Setup • Workloads • Experimental environment
Thank You Hsiang-Yun Cheng r96027@csie.ntu.edu.tw