300 likes | 442 Views
Architectural Musings Rethinking Computer Systems Architecture. Christopher Vick cvick@qualcomm.com June 3, 2012. Introduction. Vision Talk Mobile computing and current technologies fundamentally change key parameters and constraints for computer system architecture
E N D
Architectural MusingsRethinking Computer Systems Architecture Christopher Vick cvick@qualcomm.com June 3, 2012
Introduction • Vision Talk • Mobile computing and current technologies fundamentally change key parameters and constraints for computer system architecture • Vast new opportunities for research of great interest to and great relevance for industry
Outline • Computer System Architecture • Then (Circa 1970) • Scarce Resources & Bottlenecks • Optimizations • Now (Mobile Computing Platforms) • Scarce Resources & Bottlenecks • Optimizations? • Qualcomm Research • Questions?
Computer System Architecture • Hardware • The 5 classic components (Patterson & Hennessy) • Input,Output, Memory, Datapath, Control • Software • System Virtual Machine (Hypervisor, VM, or VMM) • Operating System • Compilers & Tools • Definitions • The way components fit together • The arrangement of the various devices in a complete computer system or network • The instruction set plus a model of the execution of the instruction set (Amdahl et al) • Computer System Architecture • The selection and combination of hardware and software components to assemble an effective computer system
Effective • An optimization problem • Many variables • Selection of hardware/software components • Selection of interfaces/interconnects • Many constraints • Physical, sociological, technical & cost constraints • Scarce Resources and Bottlenecks • Maximize utilization of scarce resources • Minimize impact of bottlenecks
Scarce Resources • CPU Cycles • CPUs expensive • Slow clock rates • Memory Locations • Random Access Memory expensive • Address/Data paths into CPU expensive • Skilled Programmers • Relatively new discipline • Poor language and tools support
Bottlenecks • Programmer Productivity • Software development slow and expensive • Low level programming paradigms • Memory Latency • RAM latency gated overall speed (~2-3 MHz) • Small RAM backed by vastly slower storage • I/O Bandwidth • Limited CPU connectivity • Crude communication mechanisms
Optimizations • Time Sharing • Effective sharing of limited resource • Virtual Memory • Effective sharing, and backing with cheaper alternative • Hardware Improvements • Smaller features provide more resource and faster clock • Large Scale Integration • Better signaling to improve bandwidth • High Level Programming Languages • Broadens productive programmer community • Abstracts away some hardware complexity
Examples • Digital PDP 11 • 16-bit address space • Orthogonal instruction set • Memory mapped I/O • Unix, DOS, many others • IBM System 370 • 24-bit address space • Virtual Memory • VMS, VM/370, DOS/VS • Backward compatibility with System 360
Scarce Resources • Energy • Fixed Energy Budget for mobile devices • Thermal issues at all scales • Tradeoff between performance and energy • Shrinks no longer significantly improving consumption • Memory Bandwidth • Providing bandwidth is expensive • Memory interconnect consumes significant energy
Bottlenecks • Memory Latency • Increasing gap between CPU speed and DRAM latency • Physical distance to DRAM devices a factor • Concurrency • Shortage of programmers who can handle this • Inadequate language/tools support • I/O Bandwidth/Latency • Wireless bandwidth lower than wired • Consumes large amounts of energy
Example • HTC One • Processor: 1.5 GHz Dual Core Qualcomm MSM8960 • OS: Android™ 4.0 (ICS) • Memory RAM: 1 GB DDR2 • Memory Storage: 16 GB onboard storage • Display: 4.7" HD super LCD 1280 x 720 • Network: LTE CAT3 - DL 100 /UL 50 LTE: 700/AWS WCDMA: 2100/1900/AWS/850 EDGE: 850/900/1800/1900 • Battery: 1800 mAh • Camera (Main): 8 MP, f/2.0, BSI, 1080p HD Video (Front): 1.3 MP with 720p video • Dimensions: 134.8 x 69.9 x 8.9mm • This is a General Purpose Computer!
Optimizations? • Multi-core • Aggressive addition of cores and threads • Hardware concurrency outstripping software • New Concurrent Programming Models/Tools? • Memory Subsystem • Significant contributor to total energy consumption • Adding bandwidth is expensive • New technologies addressing some energy issues • Wireless bandwidth enhancements (LTE Advanced,etc.) • Solutions from desktop/server or embedded worlds may not directly apply in mobile space!
Memory System Energy • Retaining data (one second) • DRAM: ~1-10 pJ/bit self-refresh • SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009] • 4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] • Flash, PCM, STT RAM…:Zero ! • Moving Data • 32-bit value: • Recompute: 60 pJ (Razor) • Send 1mm: 10 pJ • Retain in cache for 1 ms: 38 pJ • Retain in DRAM for 1 second: 32+ pJ
Reducing Memory System Energy • Move less! • Caches physically close to CPU • Locality, locality, locality (the first rule of chip real estate) • Retain less! • Power off unused caches lines [Kaxiraset al., ISCA ‘01] • “Drowsy” caches [Flautneret al., ISCA ‘02] • … with compiler analysis[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] • Don’t refresh unused DRAM • … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]
Extending the Memory Model • Maintaining the illusion of a single flat memory address space is too expensive • On-chip caches can be major consumers of area and energy • Coherence protocols are expensive and difficult to scale • Alternative: software-managed memory hierarchies • Tightly-coupled memory (TCM), scratchpads • Do not require tag memory, address comparison logic • More area- and energy-efficient • Help bridge gap between bandwidth and throughput
New Challenges and Opportunities • Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas • Major implications on memory management • Scratchpad allocation strategies • Data partitioning strategies • Dynamic relocation between scratchpad and DRAM to track the program’s locality characteristics • Opportunities for compile-time and runtime optimization • Challenges in both Hardware and Software!
Qualcomm ResearchExcellence in Wireless May | 2012 www.qualcomm.com/research
State of the Art Capabilities Fostering Innovation Human Resources Complete Development Labs • 30% of engineers with PhD, 50% Masters • Systems, HW, SW, Standards, Test Engineering • Ventures, Bus Dev, Technical Marketing, Program Mgmt. • Prototype Development Facilities • CPU Simulation Clusters • Antenna Ranges • Outdoor Field Systems
Qualcomm Research & University Relations • ACADEMIC Collaboration to Foster Advanced RESEARCH RESEARCH Ongoing relations with more than 30 US and 25 International Universities • Current funding includes MIT, UC Berkeley, Stanford, UCSD, UT Austin, ASU, UIUC, Univ. of Michigan, EPFL, IISc Bangalore, KAIST, Tsinghua Research collaboration spans variety of technical areas • Computer vision, multicore processing, context aware computing, machine learning, low power devices,, wireless networks and signal processing, etc.. Qualcomm Innovation Fellowship (QInF) invests on innovative ideas • Close interactions between Qualcomm Researchengineers,graduate students and professors
Qualcomm Research For The Wireless Future Take WWAN to the next level Innovatebeyond WAN Enable Smart Applications Breakthrough performance Application Enablers Processors & Devices Wireless Local Area 3G/4G RE-ARCHITECTING NEXT-GEN MOBILE DEVICES EXCELLING IN ALL FORMS OF WIRELESS TRANSFORMING THE MOBILE USER EXPERIENCE IMPROVING WWAN TECHNOLOGY
Innovate Beyond WAN Wireless Local Area
Enable Smart Applications Elevate the wireless user experience
Breakthrough Device Performance Re-architecting nex-gen devices