1 / 17

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems. Tsinghua University Tsinghua National Laboratory for Information Science and Technology. Background. “Memory-wall” High memory access latency DRAM Structure Channel, Rank, Bank, Row, Column …

stu
Download Presentation

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information Science and Technology

  2. Background • “Memory-wall” • High memory access latency • DRAM Structure • Channel, Rank, Bank, Row, Column … • Various timing constraint • Challenge of multi-core • High parallelism • More data contention • Solution • More memory channels • Efficient memory scheduler

  3. Motivation • Threads classification [TCM:Kim:2008] • Latency-sensitive threads • Bandwidth-sensitive threads • A memory scheduler should • Improve system throughput • Avoid starvation • Keep fair among different threads

  4. Goals • Requests of latency-sensitive threads • To be issued ASAP • Requests of bandwidth-sensitive threads • Avoid unfairness • Our proposal: PBFS • Prioritize latency-sensitive threads • Avoid starvation of bandwidth-sensitive threads

  5. Basic Idea • Each thread gets a priority • Range from -1 to n • Top-priority (n) • latency sensitive threads • Bottom-priority (0) • intermediate threads • Medium-priority (1,n-1) • latency sensitive threads • Idle (-1) • finished threads or compute-intensive threads

  6. Priority Updating Rules • Dynamically update • Once a request is issued • The corresponding thread priority - 1 • When there no thread has top-priority • All thread’s priorities +1 • When a time threshold is arrived • Identify Idle threads, • Adjust top-priority • Extremely unbalance: increase top-priority • Extremely balance: decrease top-priority • Other case: unchanged • Upper/lower boundaries are adjusted by active threads

  7. System throughput • Latency-sensitive threads • Easy to get top-priority • Issued as soon as possible • Example • 2-core CMP • Thread A, latency-sensitive • Thread B, bandwidth-sensitive • Top-priority = 2 • Init, both threads’ priorities are 2

  8. Example Rq 0 Rq 1 Thread A Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Rq 6 Rq 7 Rq 8 Rq 9 Thread B Rq 0 Rq 1 Rq 2 Rq 0 Rq 3 Rq 4 Rq 5 Rq 6 Rq 1 Rq 7 Rq 8 Rq 9 Execution

  9. Starvation Avoidance • When a thread continuously issued too many requests • It will be classified as bandwidth-sensitive thread • Other threads may have more chance to promote their priorities • Example • 2-core CMP • Thread A, less bandwidth-sensitive • Thread B, bandwidth-sensitive • Top-priority = 2 • Init, both threads’ priorities are 2

  10. Example Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Thread A Thread B Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Rq 6 Rq 7 Rq 8 Rq 9 Rq 0 Rq 1 Rq 2 Rq 0 Rq 1 Rq 3 Rq 2 Rq 4 Rq 3 Rq 5 Rq 4 Rq 6 Rq 5 Rq 7 Rq 8 Rq 9 Execution

  11. Hardware overhead • Need hardware support to • record the priority of each thread • monitor the threads’ behavior (read counts within a time interval) • maintain the flags that whether a row buffer can close • The storage overhead is small and easy to implement

  12. Evaluation • Usimm-1.3 • Memory configuration • 1 channel • 4 channel • Benchmarks • Metrics • Execution time • Maximum slowdown • EDP

  13. Execution Time • Overall • CLOSE: 4.2% reduction • PBFS: 7.5% reduction

  14. Maximum Slowdown • Overall • CLOSE: 4.7% reduction • PBFS: 7.0% reduction

  15. EDP • Overall • CLOSE: 9.1% reduction • PBFS: 13.8% reduction

  16. Summary • We proposed PBFS • Classify threads with priority • Dynamically update threads’ priorities • Guarantee system throughput • Avoid starvation of bandwidth-sensitive threads • Low hardware overhead

  17. Thanks

More Related