1.45k likes | 1.74k Views
The Limits of Semiconductor Technology & Coming Challenges in Microarchitecture and Architecture. Mile Stoj čev, Teufik Tokić, Ivan Milentijević. Faculty of Electonic Engineering, Niš. Outline. Technology Trends Process Technology Challenges – Low Power Design Microprocessors’ Generations
E N D
The Limits of Semiconductor Technology & Coming Challenges in Microarchitecture and Architecture Mile Stojčev, Teufik Tokić, Ivan Milentijević Faculty of Electonic Engineering, Niš
Outline • Technology Trends • Process Technology Challenges – Low Power Design • Microprocessors’ Generations • Challenges in Education
Outline – Technology Trends • Moore’s Law 1 • Moore’s Law 2 • Performance and New Technology Generation • Technology Trends – Example • Trends in Future • Processor Technology • Memory Technology
Moore's Law 1 In 1965, Gordon Moore, director of research and development at Fairchild Semiconductor, later founder of Intel corp., wrote a paper for Electronics entitled “Cramming more components onto integrated circuits”. In the paper Moore observed that “The complexity for minimum component cost has increased at a rate of roughly a factor of two per year”. This observation became known as Moore's law. In fact, by 1975 the leading chips had maybe one-tenth as many components as Moore had predicted. The doubling period had stretched out to an average of 17 months in the decade ending in 1975, then slowed to 22 months through 1985 and 32 months through 1995. It has revived to a now relatively peppy 22 to 24 months in recent years.
Moore’s Law 1 continue Similar exponential growth rates have occurred for other aspects of computer technology – disk capacities, memory chip capacities, and processor performance. These remarkable growth rates have been the major driving forces of the computer revolution. Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years
One of the key drivers behind the industries ability to double transistor counts every 18 to 24 months, is the continuous reduction in linewidths. Shrinking linewidths not only enables more components to fit onto an IC (typically 2x per linewidth generation) but also lower costs (typically 30% per linewidth generation). Moore’s Law 1 - Linewidths
Moore’s Law 1 - Die size Shrinking linewidths have slowed the rate of growth in die size to 1.14x per year versus 1.38 to 1.58x per year for transistor counts, and since the mid nineties accelerating linewidth shrinks have halted and even reversed the growth in die sizes.
Moore's Law in Action The number of transistors on chip doubles annually
Improving frequency via pipelining Process technology and microarchitecture innovations enable doubling the frequency increase every process generation The figure presents the contribution of both: as the process improves, the frequency increases and the average amount of work done in pipeline stages decreases
Process Complexity Shrinking linewidths isn’t free. Linewidth shrinks require process modifications to deal with a variety of issues that come up from shrinking the devices - leading to increasing complexity in the processes being used.
Moore’s Law 2 (Rock’s Law) In 1996 Intel augmented Moore’s law (the number of transistor on processor double approximately every 18 mounts) with Moore’s law 2. Law 2 says that as sophistication of chip increases, the cost of fabrication rises exponentially. The cost of semiconductor tools doubles every four years. By this logic, chip fabrication plants, or fabs, were supposed to cost $5 billion each by the late 1990s and $10 billion by now
Moore’s Law 2 (Rock’s Law) - continue For example: In 1986 Intel manufactured 386 that counted 250 000 transistors in fabs costing $200 million. In 1996 for Pentium processor that counted 6 million transistors $2 billion facility to produce was needed.
Moore’s Law 2 (Rock’s Law)The Cost of Semiconductor Tools Doubles Every Four Years
Metcalfe’s Law A network’s value grows proportionately to the Number of its users squared
Wirth’s Law Software is slowing faster than hardware is accelerating
Performance and new technology generation According to the Moore’s law each new generation has approximately doubled logic circuit density and increased performance by about 40 % while quadrupling memory capacity. The increase in component per chip comes from following key factors: The factor of two in component density come from 20.5 shrink in each lithography dimensions(20.5 per x and 20.5 per y). An additional factor of 20.5 comes from an increase in chip area. A final factor of 20.5 comes from device and circuit cleverness.
Semiconductor Industry Association Roadmap Summary for high-end Processors
Clock Frequency Versus Year for Various Representative Machines
Limiting in Clocking Traditional clocking techniques will reach their limit when the clock frequency reaches the 5-10 GHz range For higher frequency clocking (>10GHz) new ideas and new ways of designing digital systems are needed
Technology Trends - Example As an illustration of just how computer technology is improving, let’s consider what would have happened if automobiles had improved equally quickly. Assume that an average car in 1977 had a top speed of 150 km/h and an average fuel economy of 10 km/l. If both top speed and efficiency improved at 35% per year from 1977 to 1987, and by 50% per year from 1987 to 2000, tracking computer performance, what would the average top speed and fuel economy of car be in 1987? In 2000?
Solution In 1987:The span 1977 to 1987 is 10 years, so both traits would have improved by factor of (1.35)10 = 20.1 giving a top speed of 3015 km/h and fuel economy of 201 km/l In 2000: Thirteen more years elapse, this time at a 50% per year improvement rate, for a total factor of (1.5)13 = 194.6 over the 1987 values This gives a top speed of 586 719 km/h and fuel economy of 39 114.6 km/l. This is fast enough to cover the distance from the earth to the moon in under 39 min, and to make round trip on less than 10 liters of gasoline.
Future size versus time in silicon ICs The semiconductor industry itself has developed a “roadmap” based on the idea of Moore’s law. The National Roadmap for Semiconductors (NTRS) and most recently the International Technology Roadmap for semiconductors (ITRS) now extend the device scaling and increased functionality scenario to the year 2014, at which point minimum future size are projected to be 35 nm and chips with > 1011 components are expected to be available.
Processor technology today The most advanced processor technology today (year 2003) is 0.10 mm=100nm Ideally, processor technology scales by a factor of ~0.7 all physical dimensions of devices (transistors and wires) • With such scaling, typical improvement figures are the following: • 1.4 – 1.5 times faster transistors • two times smaller transistors • 1.35 times lower operating voltages • three times lower switching power
Processor Technology and Microprocessors Process technology is the most important technology that drives the microprocessor industry. It is characterized by growing 1000 times in frequency (from 1MHz to 1GHz) and integration (from ~10K to 1M devices) in 25 years Microarchitecture attempts to increase both IPC and frequency
Process technology and microarchitecture Microarchitecture techniques such as caches, branch prediction, and out-of-order execution can increase instruction per cycle (IPC) Pipelining, as microarchitecture idea, help to increase frequency Modern architecture (ISA) and good optimizing compiler can reduce the number of dynamic instructions executed for a given program
Frequency and performance improvements While in-order microprocessor used four to five pipe stages, modern out-of-order microprocessors use over ten pipe stages With frequencies higher than 1 GHz more than 20 pipeline stages are used
Performance of memory and CPU Memory in computer system is hierarchically organized In 1980 microprocessors were often designed without caches. Nowadays, microprocessors often come with two levels of caches.
Processor-DRAM Gap Microprocessor performance improved 55% per year since 1987, and 35% per year until 1986 Memory technology improvements aim primarily at increasing DRAM capacity not DRAM speed
An anecdote In recent database benchmark study using TPC-C, both 200MHz Pentium Pro and 21164 Alpha systems were measured at 4.2 – 4.5 CPU cycles per instruction retired. IN other words, three out of every four CPU cycles retired zero instructions: most were spent waiting for memory... Processor speed has seriously outstripped memory speed. Increasing the width of instruction issue and increasing the number of simultaneous instruction streams only makes the memory bottleneck worse
An anecdote - continue If a CPU chip today needs to move 2GBytes/s (say, 16 bytes every 8ns) across the pins to keep itself busy, imagine a chip in the foreseeable future with twice the clock rate, twice the issue width, and two instruction streams. All this factors multiply together to require about 16 GBytes/s of pin bandwidth to keep this chip busy. If is not clear whether pin bandwidth can keep up – 32 bytes every 2ns?
Memory system In 1GHz microprocessor, accessing main memory can take about 100 cycles. Such access may stall a pipelined microprocessor for many cycles and seriously impact the overall performance. To reduce memory stalls at a reasonable cost, modern microprocessor take advantage of the locality of references in the program and use a hierarchy of memory components
Expensive Memory Called a Cache A small, fast, and expensive (in $/bit) memory called a cache is located on – die and holds frequently used data A somewhat bigger, but slower and cheaper cache may be located between the microprocessor and the system bus which connects the microprocessor to the main memory
Two Levels of Caches Most advanced microprocessors today employ two levels of caches on chip The first level is ~ 32 – 128kB – it takes two to three cycles to access and typically catches about 95% of all accesses The second level is 256kB to over 1MB – it typically takes six to ten cycles to access and catches over 50% of misses of the first level
Memory Hierarchy Impact on Performance Of – chip memory access may elapse about 100 cycles The cache miss that eventually has to go to the main memory can take about the same amount of time as executing 100 arithmetic and logic unit (ALU) instructions, so the structure of memory hierarchy has a major impact on performance Cache a made bigger and heuristics are used to make sure the cache contains portions of memory that are most likely to be used in the near future of program execution.
As conclusion concerning memory - problems Today’s chip are largely able to execute code faster than we can feed then with instruction and data There are not longer performance bottlenecks in the floating-point multiplier or in having only a single integer unit. The real design action is in memory subsystems – caches, busses, bandwidth and latency.
As conclusion concerning memory – problems, continue If the memory research community would follow the microprocessor community’s lead by learning more heavily on architecture – and system level solutions in addition to technology – level solutions to achieve higher performance, the gap might begin to close On expect that over the coming decade memory subsystems design will be the only important design issue for microprocessors.
Memory Hierarchy Solutions Organization choices (CPU architecture, L1/L2 cache organizations, DRAM architecture, DRAM speed) can affect total execution time by a factor of two. • System level parameters most affect performance: • The number of independent channels and banks connecting the CPU to the DRAMs can effect a 25% performance change • Burst – width – refers to data access granularity can effect a 15% performance change • Magnetic RAM (MRAM) – new type of memory