1 / 27

RGB_YUV 硬體設計

RGB_YUV 硬體設計. 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. Program code. void main(void) { int a, b, c; ……. RGB_2_Y(I_Frame, O_Frame); ……. } void RGB_2_Y(I_Frame, O_Frame); { int y; for (i=1, i<64, i++) {

alyssa
Download Presentation

RGB_YUV 硬體設計

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RGB_YUV 硬體設計 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C

  2. Program code void main(void) { int a, b, c; ……. RGB_2_Y(I_Frame, O_Frame); ……. } void RGB_2_Y(I_Frame, O_Frame); { int y; for (i=1, i<64, i++) { y=0.257*a +0.504*b+0.098*c+16; write(y) to O_Frame;} }

  3. Pipelining Schedulingfor 6 Pipeline Latency c b a 1 c8 64 s1 >= 0.504 0.257 s2 * Each cycle 1 adders 2 multipliers c7 + * status c1 c2 V7 s3 V2 0.098 V1 s4 16 c3 + V3 c4 * s5 c5 + V5 b a V4 c s6 c6 64 + c8 0.504 0.257 s7 y >= status + s8 * * V7 c2 c7 c1 s9 0.098 V2 Loop body V1 16 s10 c4 c3 + * V3 s11 c5 + V5 c b c6 a V4 s12 + 64 c8 s13 y >= status 0.504 0.257 s14 c2 * * + c7 c1 s15 0.098 V2 V1 s16 + c4 * c3 16 V3 s17 c5 V5 + V4 s18 + c6 y

  4. Lifetimes of Values Left edge algorithm to allocate values into registers

  5. Lifetimes of Operations * * * * 乘法器 + * * + + + + + 加法器 + + + +

  6. IPData Path R1 = {V1, V5,} ,R2 = {V2, V3,V4} C1, C4  multiplier 1 C2 → multiplier 2 R3.ena R3 status 0.257 c a 0.098 AlU_op s1 0.504 s2 0 0 1 1 1 0 R2.ena s3 clk R2 R1.ena Controller 64 s4 1 16 R1 rst b s1 R1.ena 2 s2 1 0 2 R2.ena M1 M2 s3 s4 M4 M3 R3.ena * * +/- AlU_op valid busy multiplier 1 multiplier 2 out

  7. IPController status = 0 S0 S1 S2 S3 S4 S5 S6 rst status = 1

  8. Pre-allocation:設計方法 • 根據loop body直接設計成硬體,總共有7個暫存器(R1~R6,counter),4個加法器以及3個乘法器。 • 乘法運算部分,先將小數乘上 256(2的八次方),也就是左移8位元。再與8bits 輸入資料相乘,得到的結果會是16位元,此時將後8位元捨去,留下來的就是整數部分。 • 控制單元有7個狀態(s0~s6) • S0: reset 。 • S1:接收input data R,G,B 判斷counter 是否大於等於 0,如果成立則繼續做, 否則跳出。 • S2:讀取R、G,並開始運算 RGB_R*0.257,RGB_G*0.504, counter減1 。 • S3: 運算RGB_R*0.257值存入V1,RGB_G*0.504值存入V2 。 • S4: 讀取input data c ,並開始運算 RGB_B*0.098, V3=V1+V2 。 • S5:運算 V3+16 ,運算RGB_B*0.098 值存入V5 • S6: Y=V4+V5。

  9. Verilog Code for Pre-allocation Design(1/5) `timescale 1ns / 1ps module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid); //Input and output port 宣告 input clk,reset; input [23:0] rgb_in; output [7:0] Y; output busy; output valid; reg busy; reg valid; reg [6:0] counter; reg [7:0] RGB_R, RGB_G, RGB_B; reg [2:0] present_state,next_state; reg [7:0] R3_tmp,R4_tmp,R6_tmp; wire [7:0] R1_tmp,R2_tmp,R5_tmp; reg [15:0] m1,m2,m3; // for 3 mutiplier reg[7:0] R1,R2,R3,R4,R5,R6; //sate parameter parameter [2:0] s0=3'd0,s1=3'd1, s2=3'd2, s3=3'd3,s4=3'd4, s5=3'd5, s6=3'd6; 輸入和輸出埠 當busy為high時,rgb_in暫停輸入直到busy為low。 當valid為high時,輸出的值才是有效得。 用來計數做的次數,並判斷是否該結束

  10. Verilog Code for Pre-allocation Design(2/5) //counter always @(posedge clk) begin if(reset) counter<=7‘d0; else if (present_state==s6) counter<=counter+7'd1; else counter<=counter; end 用來計數做的次數,並判斷是否該結束執行。 //data or state registers always @ (posedge clk or posedge reset) begin if(reset)begin//初始化 present_state <=s0; RGB_R<=8‘d0; RGB_G<=8’d0; RGB_B<=8‘d0; R1<=8’d0; R2<=8‘d0; R3<=8’d0; R4<=8‘d0; R5<=8’d0; R6<=8‘d0; endend(1/2) else begin present_state <=next_state; if(present_state==s1)//state 1讀值 begin RGB_R<=rgb_in[23:16]; RGB_G<=rgb_in[15:8] ; RGB_B<=rgb_in[7:0] ; end R1<=R1_tmp; R2<=R2_tmp; R3<=R3_tmp; R4<=R4_tmp; R5<=R5_tmp; R6<=R6_tmp; endend(2/2)

  11. Verilog Code for Pre-allocation Design(3/5) //next state logic always @ (present_state) . begin case(present_state) s0: next_state=s1; s1: next_state=s2; s2: next_state=s3; s3: next_state=s4; s4: next_state=s5; s5: next_state=s6; default: next_state=s1; endcase end //control signal always @ (present_state or busy or counter ) begin case(present_state) s0: begin valid=1'b0; busy=1'b0; end s1: begin valid=1'b0; busy=1'b0; end s2: begin valid=1'b0; busy=1'b1; end s3: begin valid=1'b0; busy=1'b1; end s4: begin valid=1'b0; busy=1'b1; end s5: begin valid=1'b0; busy=1'b1; end s6: begin valid=1'b1; busy=1'b1; end default: if(counter==7'd0) begin valid=1'b0;busy=1'bx; end else begin valid=1'b1;busy=1'b0; end endcase end assign R1_tmp=m1[15:8]; assign R2_tmp=m2[15:8]; 捨棄後8bits assign R5_tmp=m3[15:8]; assign Y = (present_state==s6)? R6 : 8‘d0 ; 狀態S6 時 輸出Y

  12. Verilog Code for Pre-allocation Design(4/5) //rgb to y execution always @(* ) begincase(present_state) s0: begin m1=16'd0; m2=16'd0; m3=16'd0; R3_tmp=8'd0; R4_tmp=8'd0; R6_tmp=8'd0; end s1: begin m1={R1,8'd0}; //read data m2={R2,8'd0}; //read data m3={R5,8'd0}; //read data R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end (1/4) s2: begin m1=RGB_R * 8'd66; //action 0.257 m2=RGB_G * 8'd129; //action 0.504 m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end s3: begin m1={R1,8'd0}; m2={R2,8'd0}; m3=RGB_B * 8'd25;//action 0.098 R3_tmp=R1+R2; //action R4_tmp=R4; R6_tmp=R6; end (2/4)

  13. Verilog Code for Pre-allocation Design (5/5) s6: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R6; end default: begin m1=16'd0; m2=16'd0; m3=16'd0; R3_tmp=8'd0; R4_tmp=8'd0; R6_tmp=8'd0; end endcase end (4/4) s4: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R3+8'd16;//action R6_tmp=R6; end s5: begin m1={R1,8'd0}; m2={R2,8'd0}; m3={R5,8'd0}; R3_tmp=R3; R4_tmp=R4; R6_tmp=R4+R5; end (3/4)

  14. Post-allocation: 設計方法 • 根據Life time 分析,可找出以下共用的地方: • 乘法器共用後只需2個 • 加法器共用後只需1個 • 暫存器:R1,R5,可共用 ,並重新命名為R1 R2,R3,R4可共用,並重新命名為R2 counter , 重新命名為R3 • 控制電路包含控制四個多工器用的控制訊號、adder加剪法運算控制訊號、暫存器寫入訊號 reg_ena。

  15. Verilog Code for Post-allocation Design(1/6) `timescale 1ns / 1ps module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid); //Input and output port 宣告 input clk,reset; input [23:0] rgb_in; output [7:0] Y; output busy; output valid; reg busy; reg valid; reg [7:0] RGB_R, RGB_G, RGB_B; reg [2:0] present, state,next_state; reg [7:0] R1,R2,R3;//shared registers reg [15:0] mux1, mux2; reg [7:0] mux3, mux4; reg [7:0] mul_reg1, mul_reg2; reg [15:0] mul1, mul2;//two multiplier reg [7:0] add;// one adder wire status; //select line reg R1_ena,sel_12,R2_ena,R3_ena,alu_op ; reg [1:0] sel_3 ,sel_4; 輸入和輸出埠 當busy為high時,rgb_in暫停輸入直到busy為low。 當valid為high時,輸出的值才是有效得。 //sate parameter parameter [2:0] s0=3'd0, s1=3'd1, s2=3'd2, s3=3'd3, s4=3'd4, s5=3'd5, s6=3'd6;

  16. Verilog Code for Post-allocation Design(2/6) //data or state registers always @ (posedge clk or posedge reset) begin if(reset)begin//初始化 present_state <=s0; RGB_R<=8‘d0; RGB_G<=8’d0; RGB_B<=8‘d0; mul_reg1<= 8'd0; . mul_reg2<= 8'd0; R1<=8’d0; R2<=8‘d0; R3<=8’d0; end end(1/2) else begin present_state <=next_state; if(present_state==s1&& status ==1’d0)//state 1讀值 begin RGB_R<=rgb_in[23:16]; RGB_G<=rgb_in[15:8] ; RGB_B<=rgb_in[7:0] ; end mul_reg1 <= mul1 [15:8]; mul_reg2 <= mul2 [15:8]; R1 <= mul_reg1; if (R2_ena==1'b0) R2<=mul_reg2; else if(R3_ena==1'b1&& alu_op==1’b1) R3<=mux3-mux4 ; else R2<=add; endend(2/2) assign status=(R3>=0)?1'b0:1'b1; assign Y = (present_state==s6)? add : 8‘d0 ; 狀態S6時,輸出Y

  17. Verilog Code for Post-allocation Design(3/6) //next state logic always @ (present_state) . begin case(present_state) s0: next_state=s1; s1: next_state=s2; s2: next_state=s3; s3: next_state=s4; s4: next_state=s5; s5: next_state=s6; default: next_state=s1; endcase end //control signal always @ (present_state or busy or counter ) begin case(present_state) s0: begin valid =1'b0; busy =1'b0; R2_ena=1'b0; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b10; sel_4 =2'b01;//64 alu_op=1'b0; //addend s1: begin valid =1'b0; busy =1'b0; R2_ena=1'b1; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b10;//R3 sel_4 =2'b10;//1 alu_op=1'b1;//sub end (1/4)

  18. Verilog Code for Post-allocation Design(4/6) s2: begin valid=1'b0; busy =1'b1; R2_ena=1'b0; R3_ena=1'b1; sel_12=1'b0; sel_3 =2'b0; sel_4 =2'b0; alu_op=1'b0;//add end s3: begin valid =1'b0; busy =1'b1; R2_ena=1'b0; R3_ena=1'b0; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; alu_op=1'b0; end (2/4) s4: begin valid=1'b0; busy=1'b1; sel_12=1'b1; R2_ena=1'b1; R3_ena=1'b0; sel_3 =2'b00; sel_4 =2'b00; alu_op=1'b0; end s5: begin valid=1'b0; busy=1'b1; R2_ena=1'b1; R3_ena=1'b0; sel_12=1'b1; sel_3 =2'b01; sel_4 =2'b00; alu_op=1'b0; end (3/4)

  19. Verilog Code for Post-allocation Design(5/6) s6: begin valid=1'b1; busy=1'b1; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; R2_ena=1'b1; R3_ena=1'b0; alu_op=1'b0; end default: begin valid=1'b0; busy=1'b0; sel_12=1'b0; sel_3 =2'b00; sel_4 =2'b00; R2_ena=1'b0; R3_ena=1'b0; alu_op=1'b0; end endcase end (4/4) //Mux1 and Mux2 always@(sel_12 or RGB_R or RGB_B) begin case(sel_12) 1'b0: begin mux1 = RGB_R; mux2 = 16'd66;// 0.257 end default: begin mux1 = RGB_B ; mux2 = 16'd25; //0.098 end endcase end

  20. Verilog Code for Post-allocation Design(6/6) //Mux3 always@(sel_3 or R1 or R3 ) begin case(sel_3) 2'b00: mux3 = R1; 2'b01: mux3 = 8'd16; 2'b10: mux3 = R3; default: mux3 =8'd0; endcase end //Mux4 always@(sel_4 or R2 ) begin case(sel_4) 2'b00: mux4 = R2; 2'b01: mux4 = 8'd64; 2'b10: mux4 = 8'd1; default: mux4 = 8'd0; endcase end //ALU always@(mux1 or mux2 or mux3 or mux4 or RGB_R or RGB_G or alu_op or R1 or R2 or R3 ) begin mul1 = mux1 * mux2; mul2 = RGB_G* 16'd129; //0.504 if(alu_op==1'b1) add = mux3 - mux4; else add = mux4+mux3; end

  21. 波形圖 busy 為high 時 暫停資料輸入 RGB 輸入 (hex) Valid high 輸出為有效的 Control signal alu_op 為high 時 adder 做減法 Status 為high時不再接受任何資料

  22. Pattern 驗證結果 計算完的結果和預期結果比較正確性 總共64筆資料(0~63)。

  23. Quartus 參數設定

  24. 數據分析 Pre_allocation Post_allocation 由結果可看出,暫存器共用後的結果,totallogic elements 由原先 125 減少為 91。

  25. Pre_allocation合成分析 • Xlinx合成結果使用了3個乘法器、4個加減法器。 乘法器 加法器 State Machine

  26. Post_allocation合成分析 • Xlinx合成結果使用了2個乘法器、1個加減法器。 乘法器 Mux4 Mux2 Mux1 Mux3 加減法器 State Machine

  27. Post sim • Post_sim 後的結果 也符合預期

More Related