170 likes | 325 Views
The Journal of Instruction-Level Parallelism 1st JILP Workshop on Computer Architecture Competitions (JWAC-1): Cache Replacement Championship. International Symposium on Computer Architecture ( ISCA – 2010 ). Submission Requirements. Cache replacement algorithm
E N D
The Journal of Instruction-Level Parallelism1st JILP Workshop onComputer Architecture Competitions (JWAC-1):Cache Replacement Championship International Symposium on Computer Architecture ( ISCA – 2010 )
Submission Requirements • Cache replacement algorithm • Code that fits into provided framework • Maximum of 3 versions of code were allowed • 4-page paper
Statistics • Submissions • 26 total papers • 35 distinct code submissions • Distribution • Asia – 12 • North America - 11 • Europe - 3
Metrics • Performance Ranking • Overall Paper Quality • Adherence to Competition Rules • Qualitative Assessment of Logic Complexity • Intuition provided
Process • Reviews • 26 papers • 3 reviews per paper -> 78 reviews • 6 reviewers -> ~13 reviews per reviewer • 8 reviewers -> ~10 reviews per reviewer • Phone program committee • Shared Google docs to manage process 10 Papers Accepted
Types of Policies • Cache Replacement Strategies: • Insertion Policies • Reuse Distance Prediction • Dead Block Prediction • Memory Region Based Prediction • Counter-based Prediction • Frequency-based Prediction
Thanks • Organizing Committee • Aamer Jaleel, Intel (Chair) • Alaa Alameldeen, Intel • Moin Qureshi, IBM • Sponsorship/Web • Eric Rottenberg • Program Committee • Doug Burger, Microsoft • Mainak Chaudhuri, IITK • Aamer Jaleel, Intel • Gabriel Loh, Georgia Tech • Moinuddin Qureshi, IBM • Yan Solihin, NC State
Experimental Framework • Common framework • Allows for comparison of competing algorithms • Trace driven performance model • 4-way OoO core • 3-level Cache Hierarchy • 32KB L1, 256KB L2 • Competition Focus: Replacement Policies for LLC (L3) • Private Cache: 1MB LLC (single core) • Shared Cache: 4MB LLC (4-core CMP)
Workloads • Workload Classes • SPEC CPU2006 – Reference Inputs (29) • PC Games and Multimedia (22) • Enterprise Server (14) • Tracing Methodology: • SPEC workload traces captured with Pin (using Sim Points) • Non-SPEC workloads captured on a HW tracing system • Simulation Methodology: • Warm up: 100M instructions • Detailed Simulation: 100M instructions • Shorter traces were divided 50/50
Experiments • Single Threaded Workloads • All 65 traces • Heterogeneous Multi-Programmed Workloads • 7 workloads selected from the three workload classes • 4-core combinations for each class created (7 choose 4=35) • 35 random selection created from all 21 workloads • Total # of Workloads For Shared Caches: 140 • Metrics: • ST Workloads: Throughput • Multi-Core Workloads: Weighted Speedup All workloads kept secret from ALL contestants
Private Cache Championship Awards • 3rd Place: • D. Jimenez. Dead Block Replacement and Bypass with a Sampling Predictor • 2nd Place: • P. Michaud. The 3P and 4P cache replacement policies • Champion: • H. Gao and C. Wilkerson. A Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing
Shared Cache Championship Awards • 3rd Place: • P. Michaud. The 3P and 4P cache replacement policies • 2nd Place: • Y. Ishii, M. Inaba, and K. Hiraki. Map-based Adaptive Insertion Policy • Champion: • H. Gao and C. Wilkerson. A Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing