210 likes | 332 Views
Workload Design: Selecting Representative Program-Input Pairs. Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002, September 23, 2002. Introduction. Microprocessor design: simulation of workload = set of programs + inputs
E N D
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002, September 23, 2002
Introduction • Microprocessor design: simulation of workload = set of programs + inputs • constrained in size due to time limitation • taken from suites, e.g., SPEC, TPC, MediaBench • Workload design: • which programs? • which inputs? • representative: large variation in behavior • benchmark-input pairs should be “different” PACT 2002
Main idea • Workload design space is p-D space • with p = # relevant program characteristics • p is too large for understandable visualization • correlation between p characteristics • Idea: reduce p-D space to q-D space • with q small (typically 2 to 4) • without losing important information • no correlation • achieved by multivariate data analysis techniques: PCA and cluster analysis PACT 2002
Goal • Measuring impact of input data sets on program behavior • “far away” or weak clustering: different behavior • “close” or strong clustering: similar behavior • Applications: • selecting representative program-input pairs • e.g., one program-input pair per cluster • e.g., take program-input pair with smallest dynamic instruction count • getting insight in influence of input data sets • profile-guided optimization PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Workload characterization (1) • Instruction mix • int, logic, shift&byte, load/store, control • Branch prediction accuracy • bimodal (8K*2 bits), gshare (8K*2 bits) and hybrid (meta: 8K*2 bits) branch predictor • Data and instruction cache miss rates • Five caches with varying size and associativity PACT 2002
Workload characterization (2) • Number of instructions between two taken branches • Instruction-Level Parallelism • IPC of an infinite-resource machine with only read-after-write dependencies • In total: p = 20 variables PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
PCA • Many program characteristics (variables) are correlated • PCA computes new variables • p principal components PCi • linear combination of original characteristics • uncorrelated • contain same total variance over all benchmarks • Var[PC1] > Var [PC2] > Var[PC3] > … • most have near-to-zero variance (constant) • reduce dimension of workload space to q = 2 to 4 PACT 2002
Interpretation Principal Components (PC) along main axes of ellipse Var(PC1) > Var(PC2) > ... PC2 is less important to explain variation over program-input pairs Reduce No. of PC’s throw out PCs with negligible variance PCA: Interpretation Variable 2 PC 1 PC 2 Variable 1 PACT 2002
Hierarchic clustering Based on distance between program-input pairs Can be represented by a dendrogram Cluster analysis PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Methodology • Benchmarks • SPECint95 • Inputs from SPEC: train and ref • Inputs from the web (ijpeg) • Reduced inputs (compress) • TPC-D on postgres v6.3 • Compiled with –O4 on Alpha • 79 program-input pairs • ATOM • Instrumentation • Measuring characteristics • STATISTICA • Statistical analysis PACT 2002
GCC: principal components 2 PC’s: 96,9% of total variance PACT 2002
7 inputs GCC High I-cache miss rates High branch prediction accuracy explow High D-cache miss rates Many control & shift insn recog toplev emit-rtl protoize cp-decl expr insn-emit varasm insn-recog reload1 dbxout Many LD/STs and ILP print-tree PACT 2002
ijpeg, compress and go are isolated Workload space: 4 PCs -> 93.1% Go: low branch prediction accuracy Compress: high data cache miss rate Ijpeg: high LD/STs rate, low ctrl ops rate PACT 2002
strong clustering Workload space PACT 2002
Small versus large inputs • Vortex: • Train: 3.2B insn • Ref: 92.5B insn • Similar behavior: linkage distance ~ 1.4 • Not for m88ksim • Linkage distance ~ 4 • Reference input for compress can be reduced without significantly impacting behavior: 2B vs. 60B instructions PACT 2002
Impact of input on behavior • For TPC-D queries: • Weak clustering • Large impact • I-cache behavior • In general: variation between programs is larger than the variation between input sets for the same program • However: there are exceptions where input has large impact on behavior, e.g., TPC-D and perl PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Conclusion • Workload design • representative • not long running • Principal Components Analysis (PCA) and cluster analysis help in detecting input data sets resulting in similar or different behavior of a program • Applications: • workload design: representativeness while taking into account simulation time • impact of input data sets on program behavior • profile-guided optimizations PACT 2002