1 / 22

OR682/Math685/CSI700

OR682/Math685/CSI700 Lecture 12 Fall 2000 High Performance Computing Computer architectures Computer memory Floating-point operations Compilers Profiling Optimization of programs My Goals Provide you with resources for dealing with large computational problems

ivanbritt
Download Presentation

OR682/Math685/CSI700

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OR682/Math685/CSI700 Lecture 12 Fall 2000

  2. High Performance Computing • Computer architectures • Computer memory • Floating-point operations • Compilers • Profiling • Optimization of programs

  3. My Goals • Provide you with resources for dealing with large computational problems • Explain the basic workings of high-performance computers • Talk about compilers and their capabilities • Discuss debugging (this week) and profiling (12/13) tools in Matlab

  4. Changes in Architectures • Then (1980s): • supercomputers (cost: $10M and up) • only a few in existence (often at government laboratories); custom made • (peak) speed: several hundred “megaflops” (millions of floating-point operations per second) • Now: • (clusters of) microprocessors (inexpensive) • can be easily assembled by almost anyone • commercial, off-the-shelf components • (peak) speed: gigaflops and higher

  5. Modern “Supercomputers” • Multiprocessor • Based on commercial RISC (reduced instruction set computer) processors • Linked by high-speed interconnect or network • Communication by message passing (perhaps disguised from the user) • Hierarchy of local/non-local memory

  6. Why Learn This? • Compilers have limited ability to match your algorithm/calculation to the computer • You will be better able to write software that will execute efficiently, by playing to the strengths of the compiler and the machine

  7. Some Basics • Memory • main memory • cache • registers • Languages • machine • assembly • high-level (Fortran, C/C++) • Matlab?

  8. Microprocessors • Old Technology: CISC (complex instruction set computer) • assembly language instructions that resembled high-level language instructions • many tasks could be performed in hardware • reduced (slow) memory fetches for instructions • reduced (precious) memory requirements

  9. Weaknesses of CISC? • None until relatively recently • Harder for compilers to exploit • Complicated processor design • hard to fit on a single chip • Hard to pipeline • pipeline: processing multiple instructions simultaneously in small stages

  10. RISC Processors • Reduce # of instructions, and fit processor on a single chip (faster, cheaper, more reliable) • Other operations must be performed in software (slower) • All instructions the same length (32 bits); pipelining is possible • More instructions must be fetched from memory • Programs take up more space in memory

  11. Early Examples • First became prominent in (Unix-based) scientific work stations: • Sun • Silicon Graphics • Apollo • IBM RS-6000

  12. Characteristics of RISC • Instruction pipelining • Pipelining of floating-point operations • Uniform instruction length • Delayed branching • Load/Store architecture • Simple addressing modes • Note: modern RISC processors are no longer “simple” architectures

  13. Pipelines • Clock & clock speed (cycles) • Goal: 1 instruction per clock cycle • Divide instruction into stages, & overlap: • instruction fetch (from memory) • instruction decode • operand fetch (from register or memory) • execute • write back (of results to register or memory)

  14. Complications • Complicated memory fetch • stalls pipeline • Branch • may be a “no op” [harmless] • otherwise, need to flush pipeline (wasteful) • Branches occur every 5-10 instructions in many programs

  15. Pipelined Floating-Point • Execution of a floating-point instruction can take many clock cycles (especially for multiplication and division) • These operations can also be pipelined • Modern hardware has reduced the time for a f-p operation to 1-3 cycles

  16. Uniform Instruction Length • CISC instructions came in varying length • length not known until it was decoded • this could stall the pipeline • For RISC processors, instructions are uniform length (32 bits) • no additional memory access required to decode instruction • Pipeline flows more smoothly

  17. Delayed Branches • Branches lead to pipeline inefficiencies • Three possible approaches: • branch delay slot • potentially useful instruction inserted (by compiler) after the branch instruction • branch prediction • based on previous result of branch during execution of program • conditional execution (next slide)

  18. Conditional Execution • Replace a branch with a conditional instruction: IF (B<C) THEN A=D ELSE A=E END becomes COMPARE B<C IF TRUE A=D IF FALSE A=E • Pipeline operates effectively.

  19. Load/Store Architectures • Instructions limit memory references: • only explicit load and store instructions (no implicit or cascaded memory references) • only one memory reference per instruction • Keeps instructions the same length • Keeps pipeline simple (only one execution stage) • Memory load/store requests are already “slower” (complications would further stall the pipeline) • by the time the result is needed, the load/store is complete (you hope)

  20. Simple Addressing Models • Avoid: • complicated address calculations • multiple memory references per instruction • Simulate complicated requests with a sequence of simple instructions

  21. 2nd Generation RISC Processors • Faster clock rate (smaller processor) • “Superscalar” processors: Duplicate compute elements (execute two instructions at once) • hard for compiler writers, hardware designers • “Superpipelining”: double the number of stages in the pipeline (each one twice as fast) • Speculative computation

  22. For Next Class • Homework: see web site • Reading: • Dowd: chapters 3, 4, and 5

More Related