230 likes | 426 Views
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance. Presented by: Peyman Nov 2007. Overview. Previous Architectures New Hybrid Architecture Possible Benefits Scrutiny Experimental Results Relation to Project. Something Old. CMP (single-Chip Multi-core Processors)
E N D
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007
Overview • Previous Architectures • New Hybrid Architecture • Possible Benefits • Scrutiny • Experimental Results • Relation to Project
Something Old • CMP (single-Chip Multi-core Processors) • Two or more independent cores • Single ISA heterogeneous multiprocessors • Cores of varying size, performance • Same ISA • Improve throughput for multi-threaded • Single-Threaded?
Superscalar • Increase performance w/o recompiling • Efficiently handle runtime events • Branch Direction • Target Address • Load Latency • Memory Dependency • Limited ILP: Hardware Instruction Window
VLIW • Very Long Instruction Word • Shift Hardware complexity to compiler • High Clock Frequency • Energy-Efficient • No need to analyze data dependency • No scheduling of independent instruction
Something New • Dual-Core Architecture [1] • Bus-based snooping • Communicate Using L2 • In Future: • Interconnections • Small operand transfer buffer
Potential Benefits • VLIW core can operate at high clock rate • Simple Superscalar core • More aggressive compiler optimization • Due to the superscalar speculative operations • Simple hardware • Energy Efficient • Scalable
Hybrid Compiler • At TLP aware of: • Execution Bandwidth • Frequencies • At ILP: • Architectural details of Superscalar? • # functional units and latencies of VLIW • Helper threads
Optimization Phases • Phase 1 • Exploit speculative threads (helper threads) • Phase 2 • Extract non-speculative multi-grain parallelism • Partition source code • Predictable (static analysis or profiling) • Unpredictable (suitable for superscalar core) • A lot more …
Did that sound right? • Will the data be in the L2 cache when the VLIW core needs it?
Pre-Execution • Not a new idea • Using superscalar core to minimize L2 miss stalls • Stalling VLIW pipelines • Predictable load latencies? • Cache profiling
Definitions • Delinquent Loads • Small number of load operations are responsible for the majority of data cache misses. • Delinquent Loads Threshold • A pre-set threshold for number of allowable stall cycles caused by a static load instruction
Pre-Execution Thread • Make load operations non-faulting • Remove all store operations
Evaluation • Simulated Cores [1]
Evaluation (2) • Hybrid compiler built upon Trimaran compiler • A cycle-accurate model • Based on integration of • VLIW simulator from Trimaran • Superscalar simulator: simplescalar
Evaluation (3) • Seven single-threaded applications from • SPEC 2000 INT • SPEC 92 FP
Relation? • Relation to course project • Project focuses on scalability of optimization techniques • Relation to course • How multi-cores can help single-threaded applications
Reference • [1] Yan J., Zhang W., "Hybrid multi-core architecture for boosting single-threaded performance", ACM SIGARCH Computer Architecture News 35(1): 141-148, 2007