710 likes | 839 Views
Computer Organization and Architecture ( 3 Credits/ SKS). Prof. Dr. Bagio Budiardjo Semester Genap 2010/20 1 1. About the Course :.
E N D
Computer Organization and Architecture(3Credits/SKS) Prof. Dr. Bagio Budiardjo Semester Genap 2010/2011
About the Course : Course Objectives: After completing this course the students are expected to understand and to be able to analyze the computer architecture, in particular the instruction-set design (e.g. addressingmodes), and its influence to performance. The students are also expected to understand the meaning of computerorganization, that is, the interconnections of computer sub-systems : CPU, memory, bus and I/O from a computing system. The student is expected to understand the more advanced technique in processor design : pipelining. Key words : architecture, instruction-set design, computer organization, performance, processor design and, pipelining techniques
About the grading scheme : • This part is actually not too rigid but it will appear as the combination of : homework, quiz, exercise, mid-test and final-test; whenever possible. • One scheme possible is : Homework : 15% (4) Mid test : 40 % Final Test : 45 % • Grading the homework : Maximum point , 5 point each. Three levels of grading :Good(5), OK(3), and Bad(2).
The books and supporting materials : • Williams Stalling’s book titled Computer Organization and Architecture, Seventh Edition, Prentice Hall 2006; will be used as the main reference for this lecture. There is a new edition of this book, issued in 2010 but up till now is still unavailable in Jakarta. • The classic book is good (Logic and Computer Design Fundamentals) , by Morris M Manno and CharlesKilme - Pearson Asia – 2004), but too many stresses on digital logics. We use materials from this book to explain the hardware design of computer components, whenever possible • Chapters covered will be : Chapters: 1, 2, 3, 4, 5, 10 and 11 and 13 (Stalling’s). Additional materials about pipelining are taken from another book.
Books and supporting materials - continued • There will be no handouts (unless it is very important). • Lecture notes are given through memory stick/CD, SAP could be downloadedfromSIAK-NG • Students are encouraged to read books/papers in this field of study. Schedule of class : • At scheduled time and place (K-102) for about 120 minutes • Lecture will be given mainly using LCD projector
About the“course direction” Why do we study Computer Architecture ? History : Course under this name has been taught in many universities long before the microprocessors exist. Years ago, people studied mainframe architectures : IBM S/370, CDC Cyber, CRAY, Amdahl, etc. Since the microprocessors emerge, this course is changed slightly to cope with more advanced topics: Computer design and performance issues
About the“course direction” Computer Organization & Architecture OAK Micro & Embedded Processors Architecture & Design Analyzing processor design emphasizing on how to obtain better processing speed (Cost effectiveness) Processors Architecture & Design Analyzing & Implementing Computer Systems to achieve best processing speed – Cost effectiveness Microprocessors Application of µproc Parallel & Distributed Computing Systems Organizing Processors/Computing systems to obtain better speed up with different processing paradigm Embedded Systems embeddingµprocbased intelligence to new system/device
About the“course direction” -continued This course is aimed at : 1.Explaining the phenomena of computer architecture and computer design Knowing the basic instruction cycle and its implication to processing speed 2. Studying the “key” problems : a. CPU memory bottleneck b. CPU I/O devices problems 3. Studying how the “performance” could be improved example : CPU-memory : cache memory 4. How could we improve execution speed with other techniques ? Example : pipelining
Reasons for studying Computer Architecture(Stalling’s arguments) • Able to select “proper” computer systems for a particular environment (cost and effectiveness) • Able to analyzed a processor “embedded” to an environment. Able to analyzed the use of processor in automobile, able to use proper tools to analyzed • Able to choose proper software for a particular computer system
CPU : Central Processing Unit –Processor Organization : Another view Control Unit MMU : Mem Mng. Unit IR PC R1 To/from memory Cache memory MAR MBR R2 ALU1 ALU2 R3 ADDER Issues : Clock speed, Gating signal ALU3 FPU : Floating Point Unit BUS
Frequently Asked Question What is the role of CPU clock ? What is the difference between P IV/2.4 G & P IV/3.0 G ? (CPU - clock speed 2.4 and 3.0 Ghz) Consider an instruction of a CPU : AR R1, R2 (add register, content of R1 and content of register R2, place result in R1)
– Execution steps of AR R1,R2 The “possible” micro-execution steps are : a. ALU1 [R1] {content of R1 is moved to ALU1} b. ALU2 [R2] {content of R2 is moved to ALU2} c. ADD {content of ALU1 + ALU2 = ALU3} d. R1 [ALU3]{Result of addition is moved to R1} If, each micro-step is executed in “one” clock-cycle, then this AR instruction needs 4 clock-cycles. For the time being, we ignore the fetch cycle
ADD R1, R2 a. ALU1 [R1] b. ALU2 [R2] c. ADD d. R1 [ALU3] –Processor Organization – continued.1 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU3 ALU1[R1] : jalur/unit tidak aktif BUS
ADD R1, R2 a. ALU1 [R1] b. ALU2 [R2] c. ADD d. R1 [ALU3] –Processor Organization – continued.2 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU2 [R2] ALU3 : jalur/komponen tdk aktif BUS
ADD R1, R2 a. ALU1 [R1] b. ALU2 [R2] c. ADD d. R1 [ALU3] –Processor Organization – continued.3 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ADD ALU3 : jalur/komponen tdk aktif BUS
ADD R1, R2 a. ALU1 [R1] b. ALU2 [R2] c. ADD d. R1 [ALU3] –Processor Organization – continued.4 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER R1 [ALU3] ALU3 : jalur/komponen tdk aktif BUS
ADD R1, R2 a. ALU1 [R1] b. ALU2 [R2] c. ADD d. R1 [ALU3] –Processor Organization – Microprogram Control Unit IR To/from memory A1 PC R1 MAR MBR Microprogram B2 A2 B1 R2 ALU1 ALU2 A2 A3 B1 B2 B3 A1 ADD R3 A3 1 0 0 1 0 0 0 a ADD ADDER 0 1 0 0 1 0 0 b 0 0 0 1 0 0 0 c ALU3 B3 1 0 0 0 0 0 1 d BUS 1 = open; 0 = closed
Analysis of Instruction Cycle • With single bus, it is slow, since in each “clock” only one transfer could be executed • Is there any other way to “improve” the speed? • Dual bus processor may be faster • Additional processor cost
Dual processor-bus : A way to improve speed 1. ALU1 [R1] (bus1) ALU2[R2] (bus2) 2. ADD 3. R1 [ALU3] (bus1) 1 2 Other components (Control Unit,IR,PC, MAR,MBR) R1 Only 3 clocks cycles needed, 25% faster R2 ALU1 ALU2 How about this : R3 1. ALU1 [R1] (bus1) ALU2[R2] (bus2) ADD 2. R1 [ALU3] (bus1) ADDER ALU3 Only 2 clocks cycles needed, 50% faster DUAL BUS
Dual processor-bus : Microprogram level representation 1 2 Other components (Control Unit,IR,PC, MAR,MBR) A1 R1 A2 A3 B1 B2 B3 B4 How do we create the Microprogram for instruction SUB R3, R2 ? R2 A4 ALU1 ALU2 A5 R3 A6 SUB ADDER B5 ALU3 B6 DUAL BUS
Microprogram for SUB R3, R2 on dual bus Processor • Assume that Subtraction and transfer back the • result of SUB operation are done in separate clock 2. Assume that Subtraction and transfer back the result of SUB operation are done in the same clock
Triple processor-bus : Can the processing speed imrpoved? 1 2 3 Other components (Control Unit,IR,PC, MAR,MBR) R1 Please notice the direction of arrows R2 ALU1 ALU2 R3 If all the CPU components (registers, ALUs and adder) could work in a one third (1/3) clock cycle (transfer of bits, adding numbers), how many clock (s) needed to complete an addition operation (ADD R1,R2) ? Write down the “register transfer” and the microprogram for your register transfer language ADDER ALU3 Triple Bus
Program Execution • A scientific program using assembly language is run on a microprocessor with 1 Ghz clock. To complete the program , it needs to execute : a. 150.000 arithmetic instructions (e.g ADD R1,R2; MUL R1,R3; etc) b. 250.000 register transfer instructions (e.g MOV R1,R2; etc) c. 100.000 memory access instructions (e.g LOAD R1,X; STORE R2,Y; etc). If, average arithmetic instructions need 2 clocks (to complete), average register transfer instructions need 1 clock and average memory access instructions need 10 clocks; calculate the average CPI (clock per instruction) of the above mentioned program. How many times it needs to complete the program (in seconds)?
Can it be “one clock?” – Yes it can !Views of Other Books on “Micro Operations” • The Bus is called “data path” • It is not only consist of bus (a bunch of wires), but other digital devices • Enable signals is forced to fasten execution • Additional (processor) cost
Datapath Example : Taken from Morris Manno’s book Load enable A select B select Write A address B address n D data Load R0 2 2 • Four parallel-loadregisters • Two mux-based register selectors • Register destination decoder • Mux B for external constant input • Buses A and B with externaladdress and data outputs • ALU and Shifter withMux F for output select • Mux D for external data input • Logic for generating status bitsV, C, N, Z n n Load R1 0 n 1 MUX 2 n 3 0 1 MUX Load 2 R2 3 n n Load R3 n n 0 1 2 3 n Register file Decoder A data B data D address n n 2 Constant in Destination select n 1 0 MB select MUX B Address n Bus A Out n Bus B Data A B Out n G select H select B A B 4 2 S S || C 2:0 in I I 0 0 V Shifter Arithmetic/logic R L unit (ALU) C H G N n n Zero Detect Z 0 1 MF select Function unit MUX F F Data In n n 0 1 MD select MUX D Bus D n
Load enable A select B select Write A address B address n D data • Apply 01 to A select to place contents of R1 onto Bus A Load R0 2 2 n n • Apply 10 to B select to place contents of R2 onto B data and apply 0 to MB select to place B data on Bus B Load R1 0 n 1 MUX 2 n 3 0 1 MUX Load 2 R2 • Apply 0010 to G select to perform addition G = Bus A + Bus B 3 n n Load R3 • Apply 0 to MF select and 0 to MDselect to place the value of G onto BUS D n n 0 1 2 3 n Register file Decoder A data B data D address n n 2 Constant in Destination select n 1 0 MB select • Apply 00 to Destination select to enable the Load input to R0 MUX B Address n Bus A Out n Bus B Data A B Out n G select H select • Apply 1 to Load Enable to force the Load input to R0 to 1 so that R0 is loaded on the clock pulse (not shown) • The overall microoperation requires1 clock cycle (!) B A B 4 2 S S || C 2:0 in I I 0 0 V Shifter Arithmetic/logic R L unit (ALU) C H G N n n Zero Detect Z 0 1 MF select Function unit MUX F F Data In n n 0 1 MD select MUX D Bus D n Datapath Example: Performing a Microoperation Microoperation: R0 ← R1 + R2
Lesson Learned • We could improve the instruction execution speed by increasing processor clock speed (can we?) • We could improve the instruction execution speed by implementing dual bus (can we?) • We can overcome (partly) the CPU-Memory bottleneck by inserting cache memory between CPU and Main Memory (can we?) • Is there any other way to improve instruction execution speed (increasing performance)? - pipelining • Are these improvements need extra cost? (cost vs performance issue)
What do we get after studying Computer Architecture ? • It is always a complicated problem to answer. • Basically we learn about the processor design issues, namely hardware of a computer but it was taught through “software” logics. • At least we know about basic building blocks of a computer • We know the design development trends
Question : How do we fetch the instruction? (from memory) • There is a procedure to bring an instruction from memory to CPU (IR), is called the instruction fetch • PC always hold the address of (next) instruction in memory • PC tranfer the address to MAR, and READ memory • PC ususally is icremented by 1 (point to next instruction) • Instruction is placed by memory in MBR • Content of MBR is transferred to IR (instruction is fetched, ready to be executed)
Question : How do we fetch the instruction? (from memory) - continued • Or with register transfer language, we could express the fetch cycle as 1. MAR ← [PC] 2. READ (memory) and wait for completion 3. IR ← [MBR] In terms of CPU clock, this steps may take up to 50 CPU clocks depending on the memory clock speed.
Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout What is our topic ? Intruction Set Architecture(ISA)
1. 1. Introduction : Organization & Architecture • Organization and Architecture : two jargons that are often confusing • Computer organization refers to the operational units and their interconnections that realize the architectural specifications (!) • Computer Architecture refers to those attributes of a system visible to a programmer, or put another way, those attributes that have a direct impact on the logical execution of a program (!) • The later definition (architecture) concerns more about the performance, compared to the first one (organization)
1. 1. Introduction - continued • Architecture concerns more about the basic instructiondesign, that may lead to better performance of the system • Organization, is the implementation of computer system, in terms of its interconnection of functional units : CPU, memory, bus and I/O devices. • Example : IBM/S-370 family architecture. There are plenty of IBM products having the same architecture (S-370) but different organization, depending on its price/performance measures. Cost and performance differs the organizations • So, organization of a computer is the implementation ofits architecture, but tailored to fit the intended price and performance measures.
Chapter 2 : Computer Evolution and Performance
ENIAC - background • Electronic Numerical Integrator And Computer • Eckert and Mauchly • University of Pennsylvania • Trajectory tables for weapons • Started 1943 • Finished 1946 • Too late for war effort • Used until 1955
ENIAC - details • Decimal (not binary) • 20 accumulators of 10 digits • Programmed manually by switches • 18,000 vacuum tubes • 30 tons • 15,000 square feet • 140 kW power consumption • 5,000 additions per second
IAS - details • 1000 x 40 bit words • Binary number • 2 x 20 bit instructions • Set of registers (storage in CPU) • Memory Buffer Register • Memory Address Register • Instruction Register • Instruction Buffer Register • Program Counter • Accumulator • Multiplier Quotient
2. 1.Evolution and Performance - history • 1946 Von Neuman and his gang proposed IAS (Institute for Advanced Studies) • The design included : • main memory • ALU • Control Unit • I/O • First Stored Program, able to perform : +, -, x, : • The “father” of all modern computer/processor
2. 1. Evolution and Performance -history IAS components are : • MBR (memory buffer register), MAR (memory address register), IR (instruction register), IBR (instruction buffer register), PC (program counter), AC (accumulator and MQ (multiplier quotient), memory (1000 locations) • 20 bit instruction : 8 bit opcode, 12 bit address (addressing one of 1000 memory locations - 0 to 999) • 39 bit data (with sign bit - 1 bit) • Operations : data transfer between registers and ALU, unconditional branch, conditional branch, arithmetic, address modify
2.1. Evolution - History of Commercial computers • First Generation : 1950 Mauchly & Eckert developed UNIVAC I, used by Census Beureau • Then appeared UNIVAC II, and later grew to UNIVAC 1100 series (1103, 1104,1105,1106,1108) - vacuum tubes and later transistor • Second Generation : Transistors, IBM 7094 (although there are NCR, RCA and others tried to develop their versions - commercially not successful) • Third Generation : Integrated Circuit (IC) - SSI. IBM S/360 was the successful example • Later generations (possibly fourth and fifth) : LSI and VLSI technology