510 likes | 954 Views
Chapter 1 Uniprocessor Architecture Overview. 1.1 A uniprocessor model . Structure of a typical uniprocessor computer system. Memory ALU CU(control unit) I/O unit. Von-Neumann Architecture. Memory Unit. A single port device MAR MBR (or MDR)
E N D
1.1 A uniprocessor model Structure of a typical uniprocessor computer system. • Memory • ALU • CU(control unit) • I/O unit
Memory Unit • A single port device • MAR • MBR (or MDR) • Word the unit of data that can be read or written.
CPU • ALU • ACC • CU • PC • IR • A set of registers
General Computer Structure • It shows a more generalized computer system structure. • Address bus • Data bus • Control bus • Device interfaces • Multiple bus structure vs. single bus structure • To allow simultaneous operations on the buses • Higher throughput • Complexity of structure • Speed-for-cost tradeoff
1.1 A uniprocessor model (continued) • The characteristics of von Neumann model • Programs and data are stored in a single sequential memory. • There is no explicit distinction between data and instruction representation in the memory. • The memory, being one dimensional arrays, requires that some data structures such as multidimensional array, such data structures be linearized for representation. • The data representation does not retain any information on the type of data. • Semantic gap: the redundant operations requiring excessive mapping by compiler.
1.2 Enhancements to the uniprocessor model • The Harvard architecture • Separate storage for data and data • Current Harvard architecture • Do not use separate storage for data and data • Have separate paths and buffers to access data and instructions
Major performance parameters • Arithmetic Logic Unit • functionality • the speed of operations • Memory • the access speed • the capacity • the cost • Control unit • speed • complexity • flexibility
1.2.1 ALU • Enhancements of ALU • Faster algorithm for ALU operations • Use of large number of general purpose registers • Stack-based ALUs • Pipelining • Multiple functional units • Multiple ALUs
1.2.2 Memory • Enhancements of Memory subsystem • wider word fetch • blocking (interleaved and banked organization) • Low-order interleaving • High-order interleaving(banking) • instruction/data buffers • cache memories • virtual memories • multiport memories
1.2.3 control unit • Two popular implementations of the control unit • hardwired: for speed • microprogrammed: for flexibility
1.2.4 I/O subsystem • The popular I/O structures • programmed I/O • interrupt mode I/O • DMA(direct memory access) • Channels • Selector channel • Multiplexed channel • I/O processors • Front-end processors
1.2.5 Interconnection structures • Bandwidth: major performance measure of the bus structure • bus width • speed of interface hardware • bus protocols
1.2.6 System considerations • Large instruction sets, large numbers of general purpose registers, large memories • The availability of low-cost processors • Multiple processors • Two multiple processor structures • each processor is dedicated for a specialized function. • all processors in the system could be operating simultaneously.parallel processing
1.3 Two architecture styles • Two processor architecture styles that try to reduce the semantic gap • RISCs • HLL architectures
1.3.1 HLL Architectures • Figure 1.4 shows the evolution starting from the compilation. • Compilation • Interpretation • Two-level Interpretation • Direct execution
Direct Execution Language • SYMBOL machine • directly to execute Symbol Programming Language • Iowa State University in 1971. • Advantage: • Very high translate-load speed • Not proved that they offer execution speeds higher that conventional architectures. • Disadvantage: • Only one source language can be used in programming the machine. • HLL architectures have never been successful commercially. • Texas Instruments’s Explore and Symbolics’ 3600 series: LISP, for symbolic processing application.
1.3.2 RISC • The characteristics of RISC architectures • relatively low number of instructions • a small number of addressing modes • a small number of instruction format • fast execution of all instructions • minimized memory access • support for most frequently used operations and optimizing compilers • the Berkeley RISC, the Stanford University MIPS, IBM 801, Sparc, ...
1.4 Performance Evaluation • The performance is measured by the bandwidth provided by its memory, processor, and I/O subsystems. • The most common ones: MIPS, MOPS, MFLOPS, MLIPS. • TFLOPS machines are building. • The performance rating: peak rate, average rate, comparative rate • Evaluating factors of architectures except performance: generality, ease of use, expandability( or scalability), openness, cost, etc. • A practical method for estimating the performance: using benchmarks.
1.4.1 Benchmarks • Benchmarks are useful in evaluating hardware as well as software and single processor as well as multiprocessor systems. • Common benchmarks • Kernel Benchmarks: Linpak, Lawrence Livermore loops • Local Benchmarks • Partial Benchmarks • Recursive Benchmarks • Unix Utility and Application Benchmarks: SPECmarks • Synthetic Benchmarks: Dhrystone, Whetstone • Parallel Benchmarks: NIST recommended several suites. • Stanford Small Programs • PERFECT: PERFormance Evaluation for Cost-Effective Transformations • SLALOM: for measuring the parallel computer performance
1.5 Cost Factors • The cost of a computer system: a composite of its software and hardware cost. • The cost of hardware has fallen rapidly as the hardware technology progressed. • The software costs are steadily rising as the software complexity grows. • The cost is dependent on two factors: an upfront development cost and a per unit manufacturing cost • The life spans of systems are getting shorter.
1.6 Example systems • DEC Alpha • Table 1.1 • Figure 1.5 • Figure 1.6 • Figure 1.7
DEC Alpha • The DEC Alpha, also known as the AXP, is a RISCmicroprocessor originally developed and fabbed by DEC. DEC used it in their own line of workstations and servers. Designed as a successor to the VAX line of computers, it supported the VMS operating system, as well as the DEC favor of UNIX. Later open source operating systems also ran on the Alpha, notably certain BSD systems. Microsoft supported the processor in earlier versions of Windows NT. • The 64-bit processor was introduced in 1992 running at 200MHz. It was designed as a 64-bit architecture with super-pipelining and superscalar design. At the time, DEC touted it as the world's fastest processor. In July 1996 it was clocked at 500 MHz (the 21164PC), in March 1998 at 666 MHz and in May 2000 at 731MHz (the 21264PC). 1GHz and faster pieces were announced in 2001 (the 21364PC or EV-7), and are available since 2003 at 1.1GHz and upwards. Around 500,000 Alpha based systems were sold to end-2000.
DEC Alpha(continued) • The production of Alpha chips was licensed to Samsung Electronics Company. Following the purchase of Digital by Compaq a lot of the Alpha products were placed with API NetWorks, Inc. (previously Alpha Processor Inc.), a private company funded by Samsung and Compaq. In October 2001 Microway became the exclusive sales and service provider of API NetWorks' Alpha-based product line. • Compaq announced that computers using Alpha would be phased out by 2004 in favour of Intel's Itanium. Windows NT support was halted with NT4 SP6 following the Compaq takeover. HP, new owner of Compaq, announced to support the Alpha series for a few more years, including a new EV79 chip, but this will be the end of the lifetime. The IA-64 is supposed to be the replacement of this series. • Ironically, in mid-2003, when the Alpha is about to be phased out, the fastest computer in the U.S., and second fastest in the world, is a cluster of 4096 Alpha processors.
1.6 Example systems (continued) • Intel i860 • Figure 1.8 • Figure 1.9 • Figure 1.10 • Table 1.2
Intel i86064-Bit Microprocessor • The Intel i860 (N10) microprocessor delivers supercomputer performance in a single VLSI component. The 64-bit design of the i860 balances integer, floating point, and graphic performance. The Intel i860 has features of both a digital signal processor and a data processor. However, because of its speed in doing typical DSP operations, it has been extensively used in the DSP role. Its architecture also makes it suitable for other applications including engineering workstations, scientific computing, 3-D graphics workstations, and multi-user systems. The i860 is used as the data processor in Intel's massively-parallel Touchstone and Paragon supercomputers.
Intel i86064-Bit Microprocessor(continued) Features • Parallel architecture that supports up to three operations per clock • One integer or control instruction per clock • Up to two floating-point results per clock • High performance design • 33.3/40 MHz clock rates • 64-bit external data bus • 64-bit internal instruction cache bus • 128-bit internal data cache bus • High level of integration on one chip • 32-bit integer and control unit • 32/64-bit pipelined floating point adder and multiplier units • 64-bit 3-D graphic unit
Intel i86064-Bit Microprocessor(continued) Performance • 80 peak single precision MFLOPS (40MHz i860) • 60 peak double precision MFLOPS (40MHz i860) • 80 peak double precision MFLOPS (40MHz i860XR) • 42 SPECmark (40MHz i860XR) • The i860XP (N11) is an extension to i860, with MP support (enable physical snooping), new process, and better performance.
Intel i86064-Bit Microprocessor(continued) Functional Description The i860 microprocessor consists of 9 units: • Core Execution Unit • Floating-Point Control Unit • Floating-Point Adder Unit • Floating-Point Multiplier Unit • Graphics Unit • Paging Unit • Instruction Cache • Data Cache • Bus and Cache Control Unit
Intel i86064-Bit Microprocessor(continued) Functional Description • The core execution unit controls overall operation of the i860 microprocessor. A set of 32 x 32-bit general-purpose registers are provided for the manipulation of integer data. • The floating-point hardware is connected to a separate set of floating-point registers, which can be accessed as 16 x 64-bit registers, or 32 x 32-bit registers. • The floating-point control unit controls both the floating-point adder and the floating-point multiplier, issuing instructions, handling all source and result exceptions, and updating status bits in the floating-point status register. • The floating-point adder performs addition, subtraction, comparison, and conversions on 64- and 32-bit floating-point values. • The floating-point multiplier performs floating-point and integer multiply and floating-point reciprocal operations on 64- and 32-bit floating-point values.
Intel i86064-Bit Microprocessor(continued) Features • Paging unit with translation lookaside buffer • 32x32-bit integer register file • 16x64-bit FPU register file • 4 Kbyte instruction cache • 8 Kbyte data cache • Compatible with industry standards • On-chip debug register • Assembler, Linker, Simulator, Debugger, C and FORTRAN Compilers, FORTRAN Vectorizer, Scalar and Vector Math Libraries for both OS/2 and UNIX environments
1.6 Example systems(continued) • MIPS R4000 • Table 1.3 • Figure 1.11-1.15 • Table 1.4
MIPS R4000 • A company which designs, develops, and licenses reduced instruction set computer (RISC) microprocessors and compilers. MIPS Technologies, Inc. is a wholly-owned subsidiary of Silicon Graphics, Inc. and operates as an independent unit. MIPS is the successor to the processor business of MIPS Computer Systems which was founded in 1984 and merged with Silicon Graphics on 29 June 1992. • MIPS Technologies developed the world's first RISC VLSI microprocessors (1985) (or was it the ARM?), the first commercial 64-bit microprocessor (MIPS R4000, 1992), announced MIPS R4300i - the first 64-bit RISC processor designed for interactive consumer applications (April 1995). They announced the MIPS R10000 - the next generation general-purpose MIPS microprocessor and the most powerful processor in the world (October 1994).
MIPS R4000 (continued) • MIPS' semiconductor company partners participate in the design and development of MIPS processors and software and then produce, market, and support the processors. MIPS itself does not fabricate or sell products. MIPS' semiconductor partners are: Integrated Device Technology, LSI Logic Corporation, NEC Corporation, NKK Corporation, Philips Semiconductors, Siemens AG, and Toshiba Corporation.
MIPS R4000 (continued) MIPS' products • R4000 - 100 MHz; 1.35M transistors, primary i/d cache 8KB/8KB, SPECint92 58.3/ SPECfp92 61.4. • R4300i - 133 MHZ, 1.35M transistors; primary i/d cache, 16KB/8KB, SPECint92 80, SPECfp92 60. • R4400 - 250 MHz, 2.3M transistors, primary i/d cache 16KB/16KB, SPECint92 175.8, SPECfp92 164.4. • R4600 - 133 MHz, 1.9M transistors, primary i/d cache 16KB/16KB, SPECint92 85, SPECfp92 75. • R8000/R8010 - 90 MHz, 2.6M, .83M transistors, primary i/d cache, 16KB/16KB, SPECint92 132, SPECfp92 396. • R10000 - 200 MHz, 6.7M transistors, primary i/d cache 32KB/32KB, SPECint92 >300, SPECfp92 >600. • MIPS' processor chips were used in the DEC 3100 series of workstations.
Intel Research - Microprocessor • We research advanced microarchitecture and system architecture concepts and techniques for future generation IA-32 and IA-64 designs. We are located at Intel's centers for microprocessor development including Santa Clara (California), Hillsboro (Oregon), Haifa (Israel) and Barcelona (Spain). We work side by side with engineers developing current and next generation microprocessors.
Intel Research – Microprocessor(continued) Research areas • Multithread MicroarchitectureResearch into various flavors of multithreading from CMP (chip multiprocessor), SMT (simultaneous multithreading) to DMT (dynamic multithread). • Memory HierarchyResearch into multilevel caches, prefetching, multiprocessor cache behavior, and external memory bandwidth and latency bottlenecks. • Improving Instruction Level ParallelismResearch areas include improving ILP through novel instruction supply and prediction techniques, techniques for bypassing memory latency and improving memory hierarchy organization. • Low Power Architecture and MicroarchitecturesIn the low-power area, we investigate techniques for cutting local and global power and novel architecture design for low power. • SimulatorsWe are also investigating IA-32 and IA-64 based simulation frameworks to evaluate design and performance characteristics.