130 likes | 243 Views
10/27: Lecture Topics. Survey results Current Architectural Trends Operating Systems Intro What is an OS? Issues in operating systems. Superscalar Pipelines. Superscalar pipelines can execute multiple instructions at once 2+ instructions in any stage of the pipeline
E N D
10/27: Lecture Topics • Survey results • Current Architectural Trends • Operating Systems Intro • What is an OS? • Issues in operating systems
Superscalar Pipelines • Superscalar pipelines can execute multiple instructions at once • 2+ instructions in any stage of the pipeline • Some processors allow 8 instructions to be issued at once • Most programs can only take advantage of 1 or 2 issue slots
Out-of-Order Execution • Allows you to execute any instruction that you can • Enables more issue slots to be filled • Often out-of-order execution, but in-order commit • that is, write back results in the order they should have occurred • Note: IA-64 is in-order
Longer Pipelines • Pipelines are getting longer • original RISC pipelines had 5 stages • pipelines now have up to 20 stages • Allows the clock cycle to be very fast • Okay as long as you can accurately predict branches (or get rid of them)
Speculation • Prediction • better branch predictors (95% accurate) • predict many levels of branches • predict variable values • predict load addresses • Simultaneously execute both paths of a branch • Execute instructions even if there could be a dependency • sw after lw could be the same address, but probably not • let the sw execute and then fix it if you were wrong
Predicated Execution • Predicated execution allows conditional moves and conditional adds instead of only conditional branches • Avoids branches, which are bad because pipelines are so long • IA-64 almost everything in IA-64 is predicated (many 1-bit predicate registers) • HW problem with movn and movz was an example of this
VLIW • Long Instruction Words (LIW) and Very Long Instruction Words (VLIW) • each instruction contains multiple smaller instructions that execute in parallel • (V)LIW instructions can be 128 to 1024 bits long and contain 3 to 16 instructions • It's the compiler's job to find independent instructions to execute
Register Windows • Saving registers on the stack during procedure call hurts performance • Register windows use a stack of registers that are allocated to a procedure as it needs it Baz() Bar() Foo()
Smarter Compilers • VLIW requires good compilers • Predicated execution and speculation needs help from the compiler • Old architectures had instructions to emulate high-level constructions (bad) • New architectures provide many general instructions and instruction options • IA-64 will keep compiler writers busy for a decade
Multiple CPUs on a Chip • Chip multiprocessors • multiple simple CPUs, but share a cache • can run multiple programs simultaneously • single programs are no faster • like a multiprocessor machine but cheaper • Simultaneous Multithreading (SMT) • more complex CPUs • like chip multiprocessors + superscalar + out-of-order • also improves single program performance • developed at UW • memory bandwidth is an issue for both
Funky Hardware on a Chip • We can squeeze more and more transistors on a chip • What do we do with them? • Bigger caches (boring) • Put programmable hardware on the CPU • FPGAs can be (re)programmed quickly • hardware runs 1000X faster than software • Graphics specific hardware • Instruction Co-Processors • Simultaneously run two copies of all programs to avoid hardware glitches
Low Power • CPUs are being put in everything, even devices that have very small batteries (tiny sensors) • Need to make CPUs that use very little power (only as much as they need) • reduce the CPU clock frequency • allow the OS to turn off part of the chip • Transmeta is building chips that emulate Intel x86, but with less power
Time to Market • It used to be solely about being the fastest • Now being adequate is enough • Being the first technology to fill a need is the most important