430 likes | 636 Views
Topics. Basics of register-transfer design: data paths and controllers. High-level synthesis. Register-transfer design. A register-transfer system is a sequential machine.
E N D
Topics • Basics of register-transfer design: • data paths and controllers. • High-level synthesis.
Register-transfer design • A register-transfer system is a sequential machine. • Register-transfer design is structural—complex combinations of state machines may not be easily described solely by a large state transition graph. • Register-transfer design concentrates on functionality, not details of logic design.
Register-transfer system example A register-transfer machine has combinational logic connecting registers: combinational logic Q D combinational logic combinational logic D Q D Q
Block diagrams Block diagrams specify structure: wire bundle of width 5
Data path-controller systems • One good way to structure a system is as a data path and a controller: • data path executes regular operations (arithmetic, etc.), holds registers with data-oriented state; • controller evaluates irregular functions, sets control signals for data path.
Data and control are equivalent • We can rewrite control into data and visa versa: • control: if i1 = ‘0’ then o1 <= a; else o1 <= b; end if; • data: o1 <= ((i1 = ‘0’) and a) or ((i1 = ‘1’) and b); • Data/control distinction is useful but not fundamental.
Data and control ctrl carry select +
Data operators • Arithmetic operations are easy to spot in hardware description languages: • x <= a + b; • Multiplexers are implied by conditionals. Must evaluate entire program to determine which sources of data for registers. • Multiplexers also come from sharing adders, etc.
Conditionals and multiplexers if x = ‘0’ then reg1 <= a; else reg1 <= b; end if; code register-transfer
Alternate data path-controller systems controller controller controller data path data path data path one controller, one data path two communicating data path-controller systems
Pipelines • Provide higher utilization of logic: Combinational logic
Pipeline metrics • Throughput: rate at which new values enter the system. • Initiation interval: time between successive inputs. • Latency: delay from input to output.
Simple pipelines • Pure pipelines have no control. • Choose latency, throughput. • Choose register locations with retiming. • Overhead: • Setup, hold times. • Power.
Complex pipelines • Actions in pipeline depend on data or external events. • Actions on pipe: • Stall values. • Abort operation. • Bypass values.
High-level synthesis • Sequential operation is not the most abstract description of behavior. • We can describe behavior without assigning operations to particular clock cycles. • High-level synthesis (behavioral synthesis) transforms an unscheduled behavior into a register-transfer behavior.
Tasks in high-level synthesis • Scheduling: determines clock cycle on which each operation will occur. • Allocation: chooses which function units will execute which operations.
Functional modeling code in Verilog assign o1 = i1 | i2; if (! I3) then o1 = 1’b1; o2 = a + b; else o1 = 1’b0; end; clock cycle boundary can be moved to design different register transfers
Data dependencies • Data dependencies describe relationships between operations: • x <= a + b; value of x depends on a, b • High-level synthesis must preserve data dependencies.
Data flow graph • Data flow graph (DFG) models data dependencies. • Does not require that operations be performed in a particular order. • Models operations in a basic block of a functional model—no conditionals. • Requires single-assignment form.
original code: x <= a + b; y <= a * c; z <= x + d; x <= y - d; x <= x + c; single-assignment form: x1 <= a + b; y <= a * c; z <= x1 + d; x2 <= y - d; x3 <= x2 + c; Data flow graph construction
Data flow graph construction, cont’d Data flow forms directed acyclic graph (DAG):
Goals of scheduling and allocation • Preserve behavior—at end of execution, should have received all outputs, be in proper state (ignoring exact times of events). • Utilize hardware efficiently. • Obtain acceptable performance.
Data flow to data path-controller One feasible schedule for last DFG:
Binding values to registers registers fall on clock cycle boundaries
Allocation creates multiplexers • Same unit used for different values at different times. • Function units. • Registers. • Multiplexer controls which value has access to the unit.
Choosing function units muxes allow function units to be shared for several operations
Building the sequencer sequencer requires three states, even with no conditionals
Verilog for data path module dp(reset,clock,a,b,c,d,muxctrl1,muxctrl2,muxctrl3, muxctrl4,loadr1,loadr2,loadr3,loadr4,x3,z); parameter n=7; input reset; input clock; input [n:0] a, b, c, d; // data primary inputs input muxctrl1, muxctrl2, muxctrl4; // mux control input [1:0] muxctrl3; // 2-bit mux control input loadr1, loadr2, loadr3, loadr4; // register control output [n:0] x3, z; reg [n:0] r1, r2, r3, r4; // registers wire [n:0] mux1out, mux2out, mux3out, mux3bout, mux4out, mult1out, mult2out; assign mux1out = (muxctrl1 == 0) ? a : r1; assign mux2out = (muxctrl2 == 0) ? b : r4; assign mux3out = (muxctrl3 == 0) ? a : (muxctrl3 == 1 ? r4 : r3); assign mux4out = (muxctrl4 == 0) ? c : r2; assign mult1out = mux1out * mux2out; assign mult2out = mux3out * mux4out; assign x3 = mult2out; assign z = mult1out; always @(posedge clock) begin if (reset) r1 = 0; r2 = 0; r3 = 0; r4 = 0; end if (loadr1) r1 = mult1out; if (loadr2) r2 = mult2out; if (loadr3) r3 = c; if (loadr4) r4 = d; end • endmodule
Choices during high-level synthesis • Scheduling determines number of clock cycles required; binding determines area, cycle time. • Area tradeoffs must consider shared function units vs. multiplexers, control. • Delay tradeoffs must consider cycle time vs. number of cycles.
Finding schedules • Two simple schedules: • As-soon-as-possible (ASAP) schedule puts every operation as early in time as possible. • As-late-as-possible (ALAP) schedule puts every operation as late in schedule as possible. • Many schedules exist between ALAP and ASAP extremes.
ASAP and ALAP schedules ASAP ALAP
Verilog model of ASAP schedule reg [n-1:0] w1reg, w2reg, w6reg1, w6reg2, w6reg3, w6reg4, w3reg1, w3reg2, w4reg, w5reg; always @(posedge clock) begin // cycle 1 w1reg = i1 + i2; w3reg1 = i4 + i5; w6reg1 = i7 + i8; // cycle 2 w2reg = w1reg + i3; w3reg2 = w3reg1; w6reg2 = w6reg1; // cycle 3 w4reg = w3reg2 + w2reg; w6reg3 = w6reg2; // cycle 4 w5reg = i6 + w4reg; w6reg4 = w6reg3; // cycle 5 o1 = w6reg4 + w5reg; end
Verilog of ALAP schedule reg [n-1:0] w1reg, w2reg, w6reg, w6reg2, w6reg3, w3reg, w4reg, w5reg; always @(posedge clock) begin // cycle 1 w1reg = i1 + i2; // cycle 2 w2reg = w1reg + i3; w3reg = i4 + i5; // cycle 3 w4reg = w3reg + w2reg; w6reg3 = w6reg2; // cycle 4 w5reg = i6 + w4reg; w6reg = i7 + i8; // cycle 5 o1 = w6reg + w5reg; end
Critical path of schedule Longest path through data flow determines minimum schedule length:
Operator chaining • May execute several operations in sequence in one cycle—operator chaining. • Delay through function units may not be additive, such as through several adders.
Control implementation • Clock cycles are also known as control steps. • Longer schedule means more states in controller. • Cost of controller may be hard to judge from casual inspection of state transition graph.
Controllers and scheduling functional model: x <= a + b; y <= c + d; one state two states
Distributed control two distributed controllers one centralized controller
Synchronized communication between FSMs To pass values between two machines, must schedule output of one machine to coincide with input expected by the other:
Hardwired vs. microcoded control • Hardwired control has a state register and “random logic.” • A microcoded machine has a state register which points into a microcode memory. • Styles are equivalent; choice depends on implementation considerations.
Data path-controller delay Watch out for long delay paths created by combination of data path and controller: