80 likes | 228 Views
Customizable Soft Vector Processors. Peter Yiannacouras, PhD Candidate Connections 2009. Soft Processors in FPGA Systems. Make FPGA technology more easily accessible. Optimize soft processor to application properties. Weeks. Months. Software + Compiler. HDL + CAD.
E N D
Customizable Soft Vector Processors Peter Yiannacouras, PhD Candidate Connections 2009
Soft Processors in FPGA Systems Make FPGA technology more easily accessible Optimize soft processor to application properties Weeks Months Software + Compiler HDL + CAD Used in 25% of designs [source: Altera, 2009] Soft Processor Custom HW Faster Smaller Less Power Easier COMPETE Configurable
Data Level Parallelism Same operation • Commonly found in embedded systems Independent data // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] c[15]=a[15]+b[15] c[14]=a[14]+b[14] Data Level Parallelism c[13]=a[13]+b[13] c[12]=a[12]+b[12] c[11]=a[11]+b[11] c[10]=a[10]+b[10] c[9]= a[9]+b[9] c[8]= a[8]+b[8] c[7]= a[7]+b[7] • Exploit using a Vector Processor c=a+b c[6]= a[6]+b[6] //Processor instructions load r0,a[1] load r1,b[1] add r2,r0,r1 store r2,c[1] c[5]= a[5]+b[5] c[4]= a[4]+b[4] c[3]= a[3]+b[3] c[2]= a[2]+b[2] c[1]= a[1]+b[1] c[0]= a[0]+b[0]
Vector Processing Primer vadd // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] // Vectorized code set vl,16 vload vr0,a vload vr1,b vadd vr2,vr0,vr1 vstore vr2,c vr2[15]=vr0[15]+vr1[15] vr2[14]=vr0[14]+vr1[14] vr2[13]=vr0[13]+vr1[13] vr2[12]=vr0[12]+vr1[12] vr2[11]=vr0[11]+vr1[11] vr2[10]=vr0[10]+vr1[10] vr2[9]= vr0[9]+vr1[9] vr2[8]= vr0[8]+vr1[8] vr2[7]= vr0[7]+vr1[7] vr2[6]= vr0[6]+vr1[6] vr2[5]= vr0[5]+vr1[5] vr2[4]= vr0[4]+vr1[4] Each vector instruction holds many units of independent operations vr2[3]= vr0[3]+vr1[3] vr2[2]= vr0[2]+vr1[2] vr2[1]= vr0[1]+vr1[1] vr2[0]= vr0[0]+vr1[0] 1 Vector Lane
Vector Processing Primer 16x speedup vadd // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] // Vectorized code set vl,16 vload vr0,a vload vr1,b vadd vr2,vr0,vr1 vstore vr2,c 16 Vector Lanes vr2[15]=vr0[15]+vr1[15] vr2[14]=vr0[14]+vr1[14] vr2[13]=vr0[13]+vr1[13] vr2[12]=vr0[12]+vr1[12] Implemented on an FPGA (Soft Vector Processor) Is it scalable? vr2[11]=vr0[11]+vr1[11] vr2[10]=vr0[10]+vr1[10] vr2[9]= vr0[9]+vr1[9] vr2[8]= vr0[8]+vr1[8] vr2[7]= vr0[7]+vr1[7] vr2[6]= vr0[6]+vr1[6] vr2[5]= vr0[5]+vr1[5] vr2[4]= vr0[4]+vr1[4] Each vector instruction holds many units of independent operations vr2[3]= vr0[3]+vr1[3] vr2[2]= vr0[2]+vr1[2] vr2[1]= vr0[1]+vr1[1] vr2[0]= vr0[0]+vr1[0]
Soft Vector Processor Scalability 7 configurations: 14x speed, 9x area => coarse-grained! 9x 14x
More Architectural Parameters Processor Architecture Instruction Set Architecture Memory System
Fine-Grained Trade Off Space Memory System: Weak Moderate Good