270 likes | 431 Views
RGB_YUV 硬體設計. 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. Program code. void main(void) { int a, b, c; ……. RGB_2_Y(I_Frame, O_Frame); ……. } void RGB_2_Y(I_Frame, O_Frame); { int y; for (i=1, i<64, i++) {
E N D
RGB_YUV 硬體設計 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C
Program code void main(void) { int a, b, c; ……. RGB_2_Y(I_Frame, O_Frame); ……. } void RGB_2_Y(I_Frame, O_Frame); { int y; for (i=1, i<64, i++) { y=0.257*a +0.504*b+0.098*c+16; write(y) to O_Frame;} }
Pipelining Schedulingfor 6 Pipeline Latency c b a 1 c8 64 s1 >= 0.504 0.257 s2 * Each cycle 1 adders 2 multipliers c7 + * status c1 c2 V7 s3 V2 0.098 V1 s4 16 c3 + V3 c4 * s5 c5 + V5 b a V4 c s6 c6 64 + c8 0.504 0.257 s7 y >= status + s8 * * V7 c2 c7 c1 s9 0.098 V2 Loop body V1 16 s10 c4 c3 + * V3 s11 c5 + V5 c b c6 a V4 s12 + 64 c8 s13 y >= status 0.504 0.257 s14 c2 * * + c7 c1 s15 0.098 V2 V1 s16 + c4 * c3 16 V3 s17 c5 V5 + V4 s18 + c6 y
Lifetimes of Values Left edge algorithm to allocate values into registers
Lifetimes of Operations * * * * 乘法器 + * * + + + + + 加法器 + + + +
IPData Path R1 = {V1, V5,} ,R2 = {V2, V3,V4} C1, C4 multiplier 1 C2 → multiplier 2 R3.ena R3 status 0.257 c a 0.098 AlU_op s1 0.504 s2 0 0 1 1 1 0 R2.ena s3 clk R2 R1.ena Controller 64 s4 1 16 R1 rst b s1 R1.ena 2 s2 1 0 2 R2.ena M1 M2 s3 s4 M4 M3 R3.ena * * +/- AlU_op valid busy multiplier 1 multiplier 2 out
IPController status = 0 S0 S1 S2 S3 S4 S5 S6 rst status = 1
Pre-allocation:設計方法 • 根據loop body直接設計成硬體,總共有7個暫存器(R1~R6,counter),4個加法器以及3個乘法器。 • 乘法運算部分,先將小數乘上 256(2的八次方),也就是左移8位元。再與8bits 輸入資料相乘,得到的結果會是16位元,此時將後8位元捨去,留下來的就是整數部分。 • 控制單元有7個狀態(s0~s6) • S0: reset 。 • S1:接收input data R,G,B 判斷counter 是否大於等於 0,如果成立則繼續做, 否則跳出。 • S2:讀取R、G,並開始運算 RGB_R*0.257,RGB_G*0.504, counter減1 。 • S3: 運算RGB_R*0.257值存入V1,RGB_G*0.504值存入V2 。 • S4: 讀取input data c ,並開始運算 RGB_B*0.098, V3=V1+V2 。 • S5:運算 V3+16 ,運算RGB_B*0.098 值存入V5 • S6: Y=V4+V5。
Verilog Code for Pre-allocation Design(1/5) `timescale 1ns / 1ps module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid); //Input and output port 宣告 input clk,reset; input [23:0] rgb_in; output [7:0] Y; output busy; output valid; reg busy; reg valid; reg [6:0] counter; reg [7:0] RGB_R, RGB_G, RGB_B; reg [2:0] present_state,next_state; reg [7:0] R3_tmp,R4_tmp,R6_tmp; wire [7:0] R1_tmp,R2_tmp,R5_tmp; reg [15:0] m1,m2,m3; // for 3 mutiplier reg[7:0] R1,R2,R3,R4,R5,R6; //sate parameter parameter [2:0] s0=3'd0,s1=3'd1, s2=3'd2, s3=3'd3,s4=3'd4, s5=3'd5, s6=3'd6; 輸入和輸出埠 當busy為high時,rgb_in暫停輸入直到busy為low。 當valid為high時,輸出的值才是有效得。 用來計數做的次數,並判斷是否該結束
Verilog Code for Pre-allocation Design(2/5) //counter always @(posedge clk) begin if(reset) counter<=7‘d0; else if (present_state==s6) counter<=counter+7'd1; else counter<=counter; end 用來計數做的次數,並判斷是否該結束執行。 //data or state registers always @ (posedge clk or posedge reset) begin if(reset)begin//初始化 present_state <=s0; RGB_R<=8‘d0; RGB_G<=8’d0; RGB_B<=8‘d0; R1<=8’d0; R2<=8‘d0; R3<=8’d0; R4<=8‘d0; R5<=8’d0; R6<=8‘d0; endend(1/2) else begin present_state <=next_state; if(present_state==s1)//state 1讀值 begin RGB_R<=rgb_in[23:16]; RGB_G<=rgb_in[15:8] ; RGB_B<=rgb_in[7:0] ; end R1<=R1_tmp; R2<=R2_tmp; R3<=R3_tmp; R4<=R4_tmp; R5<=R5_tmp; R6<=R6_tmp; endend(2/2)
Verilog Code for Pre-allocation Design(3/5) //next state logic always @ (present_state) . begin case(present_state) s0: next_state=s1; s1: next_state=s2; s2: next_state=s3; s3: next_state=s4; s4: next_state=s5; s5: next_state=s6; default: next_state=s1; endcase end //control signal always @ (present_state or busy or counter ) begin case(present_state) s0: begin valid=1'b0; busy=1'b0; end s1: begin valid=1'b0; busy=1'b0; end s2: begin valid=1'b0; busy=1'b1; end s3: begin valid=1'b0; busy=1'b1; end s4: begin valid=1'b0; busy=1'b1; end s5: begin valid=1'b0; busy=1'b1; end s6: begin valid=1'b1; busy=1'b1; end default: if(counter==7'd0) begin valid=1'b0;busy=1'bx; end else begin valid=1'b1;busy=1'b0; end endcase end assign R1_tmp=m1[15:8]; assign R2_tmp=m2[15:8]; 捨棄後8bits assign R5_tmp=m3[15:8]; assign Y = (present_state==s6)? R6 : 8‘d0 ; 狀態S6 時 輸出Y
Verilog Code for Pre-allocation Design(4/5) //rgb to y execution always @(* ) begincase(present_state) s0: begin m1=16'd0; m2=16'd0; m3=16'd0; R3_tmp=8'd0; R4_tmp=8'd0; R6_tmp=8'd0; end s1: begin m1={R1,8'd0}; //read data m2={R2,8'd0}; //read data m3={R5,8'd0}; //read data R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end (1/4) s2: begin m1=RGB_R * 8'd66; //action 0.257 m2=RGB_G * 8'd129; //action 0.504 m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end s3: begin m1={R1,8'd0}; m2={R2,8'd0}; m3=RGB_B * 8'd25;//action 0.098 R3_tmp=R1+R2; //action R4_tmp=R4; R6_tmp=R6; end (2/4)
Verilog Code for Pre-allocation Design (5/5) s6: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end default: begin m1=16'd0; m2=16'd0; m3=16'd0; R3_tmp=8'd0; R4_tmp=8'd0; R6_tmp=8'd0; end endcase end (4/4) s4: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R3+8'd16;//action R6_tmp=R6; end s5: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R4+R5; end (3/4)
Post-allocation: 設計方法 • 根據Life time 分析,可找出以下共用的地方: • 乘法器共用後只需2個 • 加法器共用後只需1個 • 暫存器:R1,R5,可共用 ,並重新命名為R1 R2,R3,R4可共用,並重新命名為R2 counter , 重新命名為R3 • 控制電路包含控制四個多工器用的控制訊號、adder加剪法運算控制訊號、暫存器寫入訊號 reg_ena。
Verilog Code for Post-allocation Design(1/6) `timescale 1ns / 1ps module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid); //Input and output port 宣告 input clk,reset; input [23:0] rgb_in; output [7:0] Y; output busy; output valid; reg busy; reg valid; reg [7:0] RGB_R, RGB_G, RGB_B; reg [2:0] present, state,next_state; reg [7:0] R1,R2,R3;//shared registers reg [15:0] mux1, mux2; reg [7:0] mux3, mux4; reg [7:0] mul_reg1, mul_reg2; reg [15:0] mul1, mul2;//two multiplier reg [7:0] add;// one adder wire status; //select line reg R1_ena,sel_12,R2_ena,R3_ena,alu_op ; reg [1:0] sel_3 ,sel_4; 輸入和輸出埠 當busy為high時,rgb_in暫停輸入直到busy為low。 當valid為high時,輸出的值才是有效得。 //sate parameter parameter [2:0] s0=3'd0, s1=3'd1, s2=3'd2, s3=3'd3, s4=3'd4, s5=3'd5, s6=3'd6;
Verilog Code for Post-allocation Design(2/6) //data or state registers always @ (posedge clk or posedge reset) begin if(reset)begin//初始化 present_state <=s0; RGB_R<=8‘d0; RGB_G<=8’d0; RGB_B<=8‘d0; mul_reg1<= 8'd0; . mul_reg2<= 8'd0; R1<=8’d0; R2<=8‘d0; R3<=8’d0; end end(1/2) else begin present_state <=next_state; if(present_state==s1&& status ==1’d0)//state 1讀值 begin RGB_R<=rgb_in[23:16]; RGB_G<=rgb_in[15:8] ; RGB_B<=rgb_in[7:0] ; end mul_reg1 <= mul1 [15:8]; mul_reg2 <= mul2 [15:8]; R1 <= mul_reg1; if (R2_ena==1'b0) R2<=mul_reg2; else if(R3_ena==1'b1&& alu_op==1’b1) R3<=mux3-mux4 ; else R2<=add; endend(2/2) assign status=(R3>=0)?1'b0:1'b1; assign Y = (present_state==s6)? add : 8‘d0 ; 狀態S6時,輸出Y
Verilog Code for Post-allocation Design(3/6) //next state logic always @ (present_state) . begin case(present_state) s0: next_state=s1; s1: next_state=s2; s2: next_state=s3; s3: next_state=s4; s4: next_state=s5; s5: next_state=s6; default: next_state=s1; endcase end //control signal always @ (present_state or busy or counter ) begin case(present_state) s0: begin valid =1'b0; busy =1'b0; R2_ena=1'b0; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b10; sel_4 =2'b01;//64 alu_op=1'b0; //addend s1: begin valid =1'b0; busy =1'b0; R2_ena=1'b1; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b10;//R3 sel_4 =2'b10;//1 alu_op=1'b1;//sub end (1/4)
Verilog Code for Post-allocation Design(4/6) s2: begin valid=1'b0; busy =1'b1; R2_ena=1'b0; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b0; sel_4 =2'b0; alu_op=1'b0;//add end s3: begin valid =1'b0; busy =1'b1; R2_ena=1'b0; R3_ena=1'b0; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; alu_op=1'b0; end (2/4) s4: begin valid=1'b0; busy=1'b1; sel_12=1'b1; R2_ena=1'b1; R3_ena=1'b0; sel_3 =2'b00; sel_4 =2'b00; alu_op=1'b0; end s5: begin valid=1'b0; busy=1'b1; R2_ena=1'b1; R3_ena=1'b0; sel_12=1'b1; sel_3 =2'b01; sel_4 =2'b00; alu_op=1'b0; end (3/4)
Verilog Code for Post-allocation Design(5/6) s6: begin valid=1'b1; busy=1'b1; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; R2_ena=1'b1; R3_ena=1'b0; alu_op=1'b0; end default: begin valid=1'b0; busy=1'b0; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; R2_ena=1'b0; R3_ena=1'b0; alu_op=1'b0; end endcase end (4/4) //Mux1 and Mux2 always@(sel_12 or RGB_R or RGB_B) begin case(sel_12) 1'b0: begin mux1 = RGB_R; mux2 = 16'd66;// 0.257 end default: begin mux1 = RGB_B ; mux2 = 16'd25; //0.098 end endcase end
Verilog Code for Post-allocation Design(6/6) //Mux3 always@(sel_3 or R1 or R3 ) begin case(sel_3) 2'b00: mux3 = R1; 2'b01: mux3 = 8'd16; 2'b10: mux3 = R3; default: mux3 =8'd0; endcase end //Mux4 always@(sel_4 or R2 ) begin case(sel_4) 2'b00: mux4 = R2; 2'b01: mux4 = 8'd64; 2'b10: mux4 = 8'd1; default: mux4 = 8'd0; endcase end //ALU always@(mux1 or mux2 or mux3 or mux4 or RGB_R or RGB_G or alu_op or R1 or R2 or R3 ) begin mul1 = mux1 * mux2; mul2 = RGB_G* 16'd129; //0.504 if(alu_op==1'b1) add = mux3 - mux4; else add = mux4+mux3; end
波形圖 busy 為high 時 暫停資料輸入 RGB 輸入 (hex) Valid high 輸出為有效的 Control signal alu_op 為high 時 adder 做減法 Status 為high時不再接受任何資料
Pattern 驗證結果 計算完的結果和預期結果比較正確性 總共64筆資料(0~63)。
數據分析 Pre_allocation Post_allocation 由結果可看出,暫存器共用後的結果,totallogic elements 由原先 125 減少為 91。
Pre_allocation合成分析 • Xlinx合成結果使用了3個乘法器、4個加減法器。 乘法器 加法器 State Machine
Post_allocation合成分析 • Xlinx合成結果使用了2個乘法器、1個加減法器。 乘法器 Mux4 Mux2 Mux1 Mux3 加減法器 State Machine
Post sim • Post_sim 後的結果 也符合預期