330 likes | 441 Views
CMPT 250 Computer Architecture. Instructor: Yuzhuang Hu yhu1@cs.sfu.ca. Another Design Example: PIG (Chapter 7-10). PIG is a single dice game. Two players roll the dice in turns. When 1 is rolled, the current total becomes 0. The first player to reach or exceed 100 wins. Turn Total. Player 1.
E N D
CMPT 250 Computer Architecture Instructor: Yuzhuang Hu yhu1@cs.sfu.ca
Another Design Example: PIG (Chapter 7-10) • PIG is a single dice game. Two players roll the dice in turns. When 1 is rolled, the current total becomes 0. The first player to reach or exceed 100 wins. Turn Total Player 1 Player 2 ROLL HOLD NEW GAME RESET
State-Machine Diagram for PIG Default: P1=CP, P2=CP RESET DIE000, FP<-0 INIT TR1<-0, TR2<-0, CP<-FP SUR<-0 ROLL BEGIN ROLL ROLL If (DIE=110) DIE<-001 Else DIE<-(DIE+1) ROL CP<-CP ROLL DIE=1 ONE SUR<-SUR+DIE ROLL·HOLD DIE<>1 ROLL CP/(TR1<-TR1+SUR), CP/(TR1<-TR1+SUR) ROH CP(TR1<1100100) +CP(TR2<-1100100) ROLL·HOLD TEST CP<-CP FP<-FP CP(TR1>=1100100)+CP(TR2>=1100100) NEWGAME WIN CP/P1=BLINK, CP/P1=BLINK NEWGAME
Algorithmic State Machine (ASM) • The ASM is like state diagrams but less formal and thus easier to be understood. An ASM chart consists of a set of blocks. Each block can be viewed as a directed graph with three types of nodes. • State Box (node). • Binary Decision Box (node). • Conditional Action Box (node).
ASM contd. • State box: represented by a labeled rectangle. It may contain several register transfer statements or variables. • Binary decision box: represented by a hexagon. It indicates that a condition needs to be tested. It is similar to the input condition defined for State-Machine Diagrams. • Conditional output action box: represented by an oval box. It contains several register transfer statements or variables. It is similar to the output condition defined for State-Machine Diagrams.
Boxes in ASM Charts State name Register transfer 0 (False) 1 (True) Condition statements expression (Moore type) (b) Binary Decision box (a) State box Conditional outputs or actions (Mealy type) (c) Conditional output action box
A Design Example using ASM • Problem: find the sum for N numbers. algorithm sum_n(S) input: a list S consisting N numbers. output: the sum of the N numbers in S. [1] sum = 0; [2] N = get_input(); [3] while ( N > 0 ) [4] sum = sum + get_input(); [5] N = N – 1; [6] endwhile;
ASM Diagram for the Sum Machine S0 rdy S1 0 1 data N=0 0 1 rdy Sum <- 0 N <- in_bus ack S2 0 data s1 1 N <- N-1 Sum <- sum+in_bus ack s1
Digital System Design Control Signals Control Unit Data Path Control inputs Status Signals Data outputs Control outputs Data inputs In most digital system designs, we partition the system into two types of modules: a datapath, and a controlunit.
Design from ASM processor Data in Data out status control pts Clock CTRL PTS SELECTOR External control inputs SEQ
Datapath of the Sum Machine in_bus ls ln SUM N cs dn out_bus FA N=0 eq0 overflow
ASM Design Guidelines • Write an algorithm for the problem. • Translate the algorithm to a sequence of register transfer statements. • Group adjacent independent register transfer statements. • Draw the ASM diagram, and introduce control signals.
Datapath Definition (Chapter 9) • The datapath is defined by three basic components: • A set of registers. • The micro-operations performed on data stored in the registers. • The control interface.
A Generic Datapath • Four parallel-loadregisters • Two mux-based register selectors • Register destination decoder • Mux B for external constant input • Buses A and B with externaladdress and data outputs • ALU and Shifter withMux F for output select • Mux D for external data input • Logic for generating status bitsV, C, N, Z
Datapath Examples • What to do for R1 <- R2 + R3? • A select, choose R2. • B select, choose R3. • G select, choose A+B. • MF select, choose the ALU output. • MD select, choose MUX F ouput. • Destination select, choose R1. • Load enable, to enable R1.
Other Micro-operation Alternatives • MF=1: shift operation. • MB=1: using a constant. • Load enable=0: no register loading, e.g. when providing an address out or data out. • MD=1: read from memory.
The Arithmetic/Logic Unit • ALU performs arithmetic/logic micro-operations.
The Arithmetic Circuit • The arithmetic circuit consists of a parallel n-bit adder and a selection logic.
Function Table for Arithmetic Circuit • It is easy to see that Yi=BiS0+BiS1.
More on ALU • The ALU has a fairly high number of logic levels and contributes to propagation delay in the circuit. However simple ripple-carry adders can incur large propagation delays.
Carry Look-Ahead • Carry look-ahead is designed to reduce the carry propagation delay in the ALU. • For a single bit full adder: • Generate a carry out when x=y=1:g=x·y. • Propagate the carry in through the carry out when x or y is 1: p=x xor y. • In terms of p and g, the carry out co=g+ p·ci.
A Slight Optimization • Redefine p to be x+y. • We can do this because of the following reasoning: we only need to consider the case when x=y=1. However when x=y=1, g=1, therefore no matter p takes 0 or 1, co is always equal to 1.
Computing the Carry In for Each Bit • ci(1)=co(0)=g(0)+p(0)·ci. • ci(2)=co(1)=g(1)+p(1) ·g(0)+p(1) ·p(0) ·ci(0). • ci(3)=co(2)=g(2)+p(2) · g(1)+p(2) · p(1) ·g(0)+p(2) · p(1) ·p(0) ·ci(0). • ci(4)=co(3)=g(3)+p(3) · g(2)+p(3) · p(2) · g(1)+p(3) · p(2) · p(1) ·g(0)+p(3) · p(2) · p(1) ·p(0) ·ci(0).
Faster Four-bit Addition • p(3:0) and g(3:0) are available after 1 gate delay. • co(3:0) are available after 2 more gate delays. • s(3:0) are available after 1 more gate delay. • In total 1+2+1=4 gate delays.
Explanation of P and G • Considerthe msb position of a bit vector (3:0). Under what condition will a carry be generated out of that position? Under what condition will a carry be propagated through that position? • Define • G=g(3)+p(3) · g(2)+p(3) · p(2) · g(1)+p(3) · p(2) · p(1) · g(0) • P= p(3) · p(2) · p(1) · p(0)
A 16-bit CLA Adder • Use the 4-bit CLA adder as a building box and design a second level CLA logic to build a 16-bit CLA adder.
Delay of the 16-bit CLA Adder • p(15:0) and g(15:0) are available after one gate delay. • It takes 2 more gate delays for the P and G signals for each of the 4-bit box. • It takes 2 more gate delays for the second layer to produce ci(12), ci(8) and ci(4). • It takes 2 more gate delays for the first layer to produce the rest carry in values. • It takes one more gate delay for the sum. • In total 1+2+2+2+1=8 gate delays. • In general the total delay is 1+2+4( (log n)/2 )+1.