800 likes | 1.07k Views
מבנה מחשבים 0368-2159 Lecture 1 הקדמה נתן אינטרטור ויהודה אפק מתרגלים: הילל אבני נועה בן-עמוס. מה זה מבנה מחשבים?. חומרה - טרנזיסטורים מעגלים לוגיים ארכיטקטורת מחשבים. על מה נדבר היום:. Introduction : Computer Architecture Administrative Matters History
E N D
מבנה מחשבים0368-2159Lecture 1הקדמהנתן אינטרטור ויהודה אפק מתרגלים: הילל אבני נועה בן-עמוס
מה זה מבנה מחשבים? חומרה - טרנזיסטורים מעגלים לוגיים ארכיטקטורת מחשבים
על מה נדבר היום: • Introduction : Computer Architecture • Administrative Matters • History • ממוליכים וחשמל ועד פעולות בינריות בסיסיות במחשב • מתח חשמלי • מוליכים • סיליקון: מוליך למחצה • טרנזיסטור • פעולות בינריות ברכיבים אלקטרוניים
Computing Devices Then… EDSAC, University of Cambridge, UK, 1949
Computing Devices Now Sensor Nets Cameras Games Set-top boxes Media Players Laptops Servers Robots Routers Smart phones Automobiles Supercomputers
מבנה מחשבים, מה זה?
The paradigm (Patterson) Every Computer Scientist should master the “AAA” • Architecture • Algorithms • Applications
Computer Architecture: GOAL Fast, Effective and Cheap The goal of Computer Architecture • To build “cost effective systems” • How do we calculate the cost of a system ? • How we evaluate the effectiveness of the system? • To optimize the system • What are the optimization points ? Fact: most of the computer systems still use Von-Neumann principle of operation, even though, internally, they are much different from the computer of that time.
Anatomy: 5 components of any Computer (since 1946) Personal Computer Keyboard, Mouse Computer Processor Memory (where programs, data live when running) Devices Disk(where programs, data live when not running) Input Control (“brain”) Datapath (“brawn”) Output Display, Printer
Computer System Structure Cache Mem BUS Memory CPU BUS CPU Bridge I/O BUS Scsi/IDE Adap Lan Adap USB Hub Graphic Adapt Scsi Bus KeyBoard Mouse Scanner Hard Disk LAN Video Buffer
The Instruction Set: a Critical Interface software instruction set hardware
מה זה “Computer Architecture” ? Computer Architecture = • Instruction Set Architecture + • Machine Organization + … • = הנדסה + ארכיטקטורה
What are “Machine Structures”? מבנה מחשבים Application (ex: browser) • Coordination of many levels (layers) of abstraction Operating Compiler System (Linux, Win, ..) Software Assembler Instruction Set Architecture Hardware Processor Memory I/O system Datapath & Control Digital Design Circuit Design transistors Physics
lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Levels of Representation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program Compiler Assembly Language Program Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program Machine Interpretation Control Signal Specification ALUOP[0:3] <= InstReg[9:11] & MASK ° °
Computer Architecture’s Changing Definition • 1950s to 1960s Computer Architecture Course • Computer Arithmetic • 1970s to mid 1980s Computer Architecture Course • Instruction Set Design, especially ISA appropriate for compilers • 1990s Computer Architecture Course • Design of CPU, memory system, I/O system, Multi-processors, Networks • 2000s Computer Architecture Course: • Special purpose architectures, Functionally reconfigurable, Special considerations for low power/mobile processing • 2005 – futue (?) Multi processors, Parallelism • Synchronization, Speed-up, How to Program ??? !!!
Forces on Computer Architecture Technology Programming Languages Applications Computer Architecture Cleverness Operating Systems History
As reported in Microprocessor Report, Vol 13, No. 5: Emotion Engine: 6.2 GFLOPS, 75 million polygons per second Graphics Synthesizer: 2.4 Billion pixels per second Claim: Toy Story realism brought to games! Computers in the News: Sony Playstation 2000 The Playstation 3 will deliver nearly 2 teraflops overall performance, said Ken Kutaragi, president and group CEO of Sony Computer Entertainment
Ray Kurzweil: By 2029 reverse engineer the Human Brain http://singules-atarityhub.com/2010/01/25/kurzweil-discusses-the-future-of-brain-computer-interfac-x-prize-lab-video/
Where are We Going?? Arithmetic Single/multicycle Datapaths IFetch Dcd Exec Mem WB µProc 60%/yr. (2X/1.5yr) 1000 CPU IFetch Dcd Exec Mem WB “Moore’s Law” IFetch Dcd Exec Mem WB 100 Processor-Memory Performance Gap:(grows 50% / year) IFetch Dcd Exec Mem WB 10 Performance DRAM 9%/yr. (2X/10 yrs) DRAM 1 Pipelining 1980 1982 1984 1987 1988 1989 1990 1991 1993 1996 2000 1981 1983 1985 1986 1992 1994 1995 1997 1998 1999 I/O Time Memory Systems מבנה מחשבים
Course Administration • Instructors: Nathan Intrator (nin@post.tau.ac.il) • TA: KirilSolovey(kirilsolo@gmail.com ) http://cs.tau.ac.il/~nin/Courses/CompStruct/CompStruct.htm http://virtual.tau.ac.il Books: • V. C. Hamacher, Z. G. Vranesic, S. G. ZakyComputer Organization.McGraw-Hill, 1982 • H. Taub Digital Circuits and Microporcessors. McGraw-Hill 1982 • מערכות ספרתיות בהוצאות האוניברסיטה הפתוחה • Hennessy and Patterson, Computer Organization Design, the hardware/software interface, Morgan Kaufman 1998
Grading ציון: • מבחן סופי 80% • תרגילים 20% 6-7 תרגילים
Architecture & Microarchitecture Elements • Architecture: • Registers data width (8/16/32/64) • Instruction set • Addressing modes • Addressing methods (Segmentation, Paging, etc...) • Architecture: • Physical memory size • Caches size and structure • Number of execution units, number of execution pipelines • Branch prediction • TLB • Timing is considered Arch (though it is user visible!) • Processors with the same arch may have different Arch
Compatibility • Backward compatibility • New hardware can run existing software • Example: Pentium 4 can run software originally written for Pentium III, Pentium II, Pentium , 486, 386, 286 • Forward compatibility • New software can run on existing (old) hardware • Example: new software written with MMXTM must still run on older Pentium processors which do not support MMXTM • Less important than backward compatibility • New ideas: architecture independent • JIT – just in time compiler: Java and .NET • Binary translation
Benchmarks – Programs for Evaluating Processor Performance • Toy Benchmarks • 10-100 line programs • e.g.: sieve, puzzle, quicksort • Synthetic Benchmarks • Attempt to match average frequencies of real workloads • e.g., Winstone, Dhrystone • Real programs • e.g., gcc, spice • SPEC: System Performance Evaluation Cooperative • SPECint (8 integer programs) • and SPECfp (10 floating point)
CPI – to compare systems with same instruction set architecture (ISA) #cycles required to execute the program #instruction executed in the program CPI = • The CPU is synchronous - it works according to a clock signal. • Clock cycle is measured in nsec (10-9 of a second). • Clock rate (= 1/clock cycle) is measured in MHz (106 cycles/second). • CPI - cycles per instruction • Average #cycles per Instruction (in a given program) • IPC (= 1/CPI) : Instructions per cycles • Clock rate is mainly affected by technology, CPI by the architecture • CPI breakdown: how many cycles (on average) the program spends for different causes; e.g., in executing, memory I/O etc.
CPU Time • CPU Time • The time required by the CPU to execute a given program: CPU Time = clock cycle #cyc = clock cycle CPI IC • Our goal: minimize CPU Time • Minimize clock cycle: more MHz (process, circuit, Arch) • Minimize CPI: Arch (e.g.: more execution units) • Minimize IC: architecture (e.g.: MMXTM technology) • Speedup due to enhancement E
Amdahl’s Law Fractionenhanced ExTimenew = ExTimeold x (1 - Fractionenhanced) + Speedupenhanced Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTimeold ExTimenew 1 = Speedupoverall = Fractionenhanced (1 - Fractionenhanced) + Speedupenhanced
Amdahl’s Law: Example 1 Speedupoverall = = 1.053 0.95 • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold Corollary: Make The Common Case Fast
Instruction Set Design software instruction set hardware The ISA is what the user and the compiler sees The ISA is what the hardware needs to implement
Why ISA is important? • Code size • long instructions may take more time to be fetched • Requires large memory (important in small devices, e.g., cell phones) • Number of instructions (IC) • Reducing IC reduce execution time (assuming same CPI and frequency) • Code “simplicity” • Simple HW implementation which leads to higher frequency and lower power • Code optimization can better be applied to “simple code”
The impact of the ISA RISC vs CISC
CISC Processors • CISC - Complex Instruction Set Computer • The idea: a high level machine language • Characteristic • Many instruction types, with many addressing modes • Some of the instructions are complex: • Perform complex tasks • Require many cycles • ALU operations directly on memory • Usually uses limited number of registers • Variable length instructions • Common instructions get short codes save code length • Example: x86
CISC Drawbacks • Compilers do not take advantage of the complex instructions and the complex indexing methods • Implement complex instructions and complex addressing modes complicate the processor slow down the simple, common instructions contradict Amdahl’s law corollary: Make The Common Case Fast • Variable length instructions are real pain in the neck: • It is difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts • An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures) • Examples: VAX, x86 (!?!)
RISC Processors • RISC - Reduced Instruction Set Computer • The idea: simple instructions enable fast hardware • Characteristic • A small instruction set, with only a few instructions formats • Simple instructions • execute simple tasks • require a single cycle (with pipeline) • A few indexing methods • ALU operations on registers only • Memory is accessed using Load and Store instructions only. • Many orthogonal registers • Three address machine: Add dst, src1, src2 • Fixed length instructions • Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM
RISC Processors (Cont.) • Simple architecture Simple micro-architecture • Simple, small and fast control logic • Simpler to design and validate • Room for on die caches: instruction cache + data cache • Parallelize data and instruction access • Shorten time-to-market • Using a smart compiler • Better pipeline usage • Better register allocation • Existing RISC processor are not “pure” RISC • e.g., support division which takes many cycles
RISC and Amdhal’s Law (Example) • In comparison to the CISC architecture: • 10% of the static code, that executes 90% of the dynamic has the same CPI • 90% of the static code, which is only 10% of the dynamic, increases in 60% • The number of instruction being executed is increased in 50% • The speed of the processor is doubled • This was true for the time the RISC processors were invented • We get • And then
So, what is better, RISC or CISC • Today CISC architectures (X86) are running as fast as RISC (or even faster) • The main reasons are: • Translates CISC instructions into RISC instructions (ucode) • CISC architecture are using “RISC like engine” • We will discuss this kind of solutions later on in this course.
Technology Trends: Microprocessor Complexity Itanium 2: 410 Million Athlon (K7): 22 Million Alpha 21264: 15 million PentiumPro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law 2X transistors/Chip Every 1.5 years Called “Moore’s Law”
Technology Trends: Processor Performance Intel P4 2000 MHz (Fall 2001) 1.54X/yr Performance measure year
Technology Trends: Memory Capacity(Single-Chip DRAM) year size (Mbit) 1980 0.0625 1983 0.25 1986 1 1989 4 1992 16 1996 64 1998 128 2000 256 2002 512 • Now 1.4X/yr, or 2X every 2 years. • 8000X since 1980!
Technology Trends Imply Dramatic Change • Processor • Logic capacity: about 30% per year • Clock rate: about 20% per year • Memory • DRAM capacity: about 60% per year (4x every 3 years) • Memory speed: about 10% per year • Cost per bit: improves about 25% per year • Disk • Capacity: about 60% per year • Total data use: 100% per 9 months! • Network Bandwidth • Bandwidth increasing more than 100% per year!