260 likes | 424 Views
Toward a Sustainable Architecture at Extreme Scale. Zhimin Tang, CTO tangzhm@sugon.com. Outline. Sustainable (Cost Effective) HPC Counter-examples in the history Current and Future Challenges New computing forms from sensor to cloud
E N D
Toward a Sustainable Architecture at Extreme Scale Zhimin Tang, CTO tangzhm@sugon.com
Outline • Sustainable (Cost Effective) HPC • Counter-examples in the history • Current and Future Challenges • New computing forms from sensor to cloud • Silicon based IC process approaching its physical limit • Strategy • Abandon HPC only acceleration features • Design sustainable architecture for HPC and other applications
Considerations of Cost Effectiveness or Sustainability • Application (Algorithm) Requirements • High performance • Technology Constraints • CMOS vs. bipolar, Moore’s Law • Commercial MPU vs. customed ASIP • Economical Feasibility • Good eco-system • Mass production • Low energy consumption
HPCs in the History • Vector Supercomputers • CMOS Dominated, SIMD Weakness
Connection Machine • SIMD PE Array • Optimal only for someAlgorithms • Custom chips, tiny processor
MIMD with Custom CPUs • Chip Level Integration (SoC) • nCube/2, KSR-1 (COMA), … • High NRE cost due to custom design without mass production • Low node processor performance
Why No Cost Effectiveness • HPC Is a Small Market • Architectures Designed Only for HPC • Lower volume, higher cost (NRE) • No enough resource to implement a top level (wrt performance) solution • Longer time-to-market, behind Moore’s Law • Result: COTS Solutions in Last 20 Years • Commercial off-the-shelf • Co-design with the IT Ecosystem • From Cloud computers to sensors
Ecosystem Requirements • High Performance and Low Cost • Low cost is continuing a must • New factors of cost: energy/power, big NRE • Performance no longer the bottleneck • for most applications • like car, train, airplane in transportation • New appearances of performance • Computing: MIPS/MFLOPS • Transaction processing: TPM • Cloud applications: requests serviced in unit time
Energy Efficiency • Two Ends of Computing System • Cloud: large scale power dissipation • Terminal: limited battery life • Energy: compute < memory < communication • For each FLOP in Linpack • FPU spends 10pJ, Memory access 475pJ • Wireless Sensor Network • RF radio consumes most of the power • What We Need Besides Locality?
Needs New Architecture • Architecture Consuming Less Energy • Many core, custom designed for applications • Flattened software stack • Architecture for New Performance Metrics • High volume throughput computers • New Algorithms and Methodology • Complexity of computation • Complexity of memory access and communication
Constraints to Innovation • Existing Software Ecosystem • standard or de facto interfaces • e.g., ISA: Instruction Set Architecture • Pro: Compatibility of Software • Con: Obstacles of Innovation, legacy • Huge Expenses of Development • new architecture needs new processors • NRE of chip development increasing rapidly, as CMOS process approaching its limit • NRE: Non-Recurring Engineering
CMOS Technology • Approaching Limit, And No Replacement! • Moore’s law:7nm@2024, ~30 atoms • Different with the Transfer in 1990’s • Bipolar (ECL/TTL) is faster, but consumes much power • CMOS developed for 20 years, no too slow, low cost, and low power • But Now, Liquid Cooling for CMOS • In the foreseeable future, still CMOS
More and More than Moore 2011 ITRS Exec. Summary Fig. 4
Dark Silicon • At 8nm, above half of transistors must be turned off • Speedup of 4-8 for 5 process generations ISCA’11, IEEE Micro’12, CACM’13
Economical Feasibility • Moore’s Law Provides More Transistors • But switching speed no longer faster • Process development in nanometer scale increases NRE tremendously • Mass Production Is Essential • Otherwise, chip business is not sustainable • Advantages of general-purposed processors • How about Many-core Processors? • GPU, Tilera, MIC, …
Pros and Cons of MPU • Most Advanced Process, Mass Product • Stable, reliable, low cost • Mature ecosystem and solutions • Not Optimal for Many Applications • Aim: not too bad for most applications • Over allocation of resources • Waste of resources, Consumption of more energy
MPU not good for Cloud • High L1-I Cache Miss Rate • Processor idle (instruction starvation) • Small ILP and MLP • Wide issue not effective • Low Efficiency of Memory Access • Large L3 takes ½ chip area, no help to improve performance • Useless High Bandwidth On-chip • Few Data sharing among cores
Low Utilization of Resources • Only 1/3 are frequently used GPU L2 Cache L2 Cache L2 Cache L2 Cache OOOFPU OOOFPU OOOFPU OOOFPU L3 Cache
Pros and Cons of ASIP • Optimal Designed for Some Applications • high efficiency, low resource, low power • But No Lunches Are Free • Much design/verification work • Stability/Reliability? • May affect the time to market • How to amortize the huge NRE • Small market means high cost
MPU + Accelerator • GPU • Pro: mass production • Con: PCIE overhead, small memory size • MIC PHI • Mass production possible? • FPGA • Resource utilization • Ease of programming • MPU interface, e.g., QPI or PCIE
Design of New Processors • Crossing the Gap between Generaland Special • Many Simple Cores • Reduce power consumption • Multiple Hardware Thread in Each Core • Massive threads on chip • Exploit concurrency, tolerate latency • Dynamic Scheduling of On-chip Threads • Improve performance for general apps
Combining Multithreadingand Vector Pipelining 流水向量处理引擎 Vector Registers IR RF ID I$ D$/SPM Switch to single thread Deep scalar pipeline Switch to vector pipeline
Thread Parallelism and DataParallelism in Two dimensions Deep thread parallelism and data parallelism Vector Register File IR RF ID I$ D$/SPM Wide data parallelism Wide thread parallelism IR RF ID I$ D$/SPM
In Conclusion • A Universal Architecture • Scalable and reconfigurable processor array • Supports thread and data level parallelism • Fulfill All Requirements from Terminal to Cloud Data Center • High performance computers • Cloud computing servers • Equipment in Core network • Terminals for Cloud and mobile Internet