250 likes | 404 Views
Core-Selectability in Chip-Multiprocessors. Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg. Dividing the Design A definition. Processing Cores. All levels of cache Interconnect Ports to Memory and IO. What this Talk is About. How to improve performance of a CMP.
E N D
Core-Selectabilityin Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg
Dividing the DesignA definition Processing Cores All levels of cache Interconnect Ports to Memory and IO
What this Talk is About How to improve performance of a CMP by enabling exploitation of the full potential of the interconnection the interconnect is not fully utilized by all workloads by improving the processing if it is, there’s nothing to gain here
need ports to the interconnect If the same interconnect is enough for a quad-core, then it was over-provisioned for a dual-core. The Provisioning FactorBalance in provisioned resources
some technique that boosts general performance If the design is well provisioned with the same interconnect, then it must have been over-provisioned in the baseline. The Provisioning FactorBalance in provisioned resources
The Underutilization FactorInterconnect not fully utilized by all applications workloads that depend the most on interconnect have a louder say in what a well-provisioned design constitutes
RISC v. CISC wide v. narrow issuing deep v. shallow pipelining large v. small issue queue The One-size-fits-all FactorA single solution has limited performance Changing these trade-offs will improve performance for some workloads and degrade it for others. He’s not much for a conversation. But if he was, it would be a conversation about saving you execution time.
The Shrinking FactorProgressively less die area for the cores better return on increasing the interconnection resources `
The Shrinking FactorProgressively less die area for the cores
The Shrinking FactorProgressively less die area for the cores Intel 8088 Intel386 100% Intel Intel IBM 8086 80286 Intel 486DX 90% Power3 80% Intel 70% Intel Pentium IV Pentium Intel 60% Pentium III 50% IBM IBM Power4 Power5 Intel Core Duo 40% 30% IBMPower6 Niagara-2 - - 20% Niagara-1 10% 1990 1995 2000 2005 2010
Program 2 Program 1 The Diversity FactorCan provide diversity in the core designs Single Core Design: Optimized for all workloads
Code 2 Code 1 The Diversity FactorCan provide diversity in the core designs Heterogeneous Cores: Optimized for workload
Program 2 Program 1 Core-Selectability Core-Selectability: Optimized for workload.
Core-Selectability Selectability
One-size-fits-all Factor Provisioning Factor Shrinking Factor Underutilization Factor Diversity Factor Core-Selectability Recap Port Sharing can improve performance without increasing power density results in a homogeneous design can reduce verification effort by splitting up workload space
Empirical EvaluationBased on Fabscalar • A library of the synthesized implementation of different configurations for different microarchitectural units of a contemporary superscalar processor.
The selection of cores normalized exec. time
On Individual Benchmarks normalized execution time
The Effect of Selectability normalized exec. time
Under Different Task Arrival Patterns Average task turnaround time for (a) normal traffic, and (b) bursty traffic.
L1 Data Cache extra switching core-selection extra wire (100fF) Core A Core B Implementation of Port Sharing 26ps added propagation delay
Overhead of Reconfigurability • With reconfigurability, change is implemented within a core – with complex coupling between pipeline stages. • With Core-Selectability, change is implemented at the core level – with less complex coupling between core and interconnect.
Thank you It’s as if he knows you like to save execution time.