180 likes | 321 Views
Written by:. Haim Natan Benny Pano. Supervisor:. Gregory Mironov. Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab. Inverse Matrix Accelerator. Final Presentation. Project no. D0623. Spring 2004. Project Background.
E N D
Written by: Haim Natan Benny Pano Supervisor: Gregory Mironov Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Inverse Matrix Accelerator Final Presentation Project no. D0623 Spring 2004
Project Background Nowadays complex computations are done on a standard processor or a DSP which aren’t optimal for the matrix inversion. In order to decrease the time spent on matrix inversion tasks we use a specific hardware to do the matrix inversion leaving the CPU free for other tasks and using the faster hardware for the complex computation.
Project Goal Designing and implementing an FPGA circuitry that inverses a 625x625 matrix.
Project Requirements • A standalone system • The matrix is of size 625x625 • Matrix elements are of type 64 bits double precision floating point • Calculation time < 20ms
Suggested Solutions • Two algorithms were considered: • Linear algorithm of order O(N^3) • Monte-Carlo algorithm of order O(N^2) • The selected hardware was Virtex II Pro • The selected algorithm was the • Monte-Carlo
N – number of markov chains T – length of each chain b – an inversed element MP() – a chain generator bi,j := 0; For c := 1 to N do { k0 := i ; w0 := 1 ; For t := 1 to T do { kt := MP( kt-1 ) ; wt := sign(dkt-1,kt) * wt-1 * Ekt ; if kt = j then bi,j += wt ; } } bi,j /= N ; The Monte-Carlo Algorithm (simplified version)
The MC Algorithm (continued) • D = I – A • Ei =Σj|di,j| - weights vector • P is a transition probability matrix such that pi,j= |di,j| /Ei - used for generating the marcov chains.
A Small Demonstration A = D = E = P = t rand# kt wt b1,2 0 none 1 1 0 1 0.2 1 -8 0 2 0.9 2 -48 -48 3 0.49 1 -384 -48
T k = i MP MP MP E1 SW SW SW En SW SW SW 0 bi,j A A A Algorithm’s Architecture
SW A Kin Tin Tin Ein Eout Win Wout * Kin Wint Rin Rout Cin Cout Kout Tout Vin Vout Switch & Accumulator Eout = Ein Rout = Rin Kout = Kin If Rin = Kin Then Tout = Ein Else Tout = Tin Cout = Cin Wout = Win * Tin Wint = Wout If Cin = Kin Then Vout = Vin + Wint Else Vout = Vin
Architecture Demonstration k = 1 MP MP MP Kout = 1 Kout = 2 Kout = 1 E1 = 8 SW SW SW Tout=8 Tout=8 E2 = 6 SW SW SW Tout=6 b1,2 = 0 Wout=-8 Wout=-48 Wout=-384 A A A Vout=0 Vout=-48 Vout=-48
Memory Controller RAM Basic Block Diagram FPGA A Elements request Algorithm B Read/Write Elements transfer
Some scales • 64bit * 625 * 625 = 3MB • Two matrices needed 6MB • 20[msec] / (625^2) = 51.2 [nsec] per one matrix element 20Mhz • Considering an O(n^3) algorithm 12.2[Ghz]
Encountered obstacles • Studying the Monte-Carlo algorithm and some of its mathematical basics. • The architecture requires a lot of FPGA cells. • Finding a floating point library and adjusting it to our needs. • Getting to know all the software used in an FPGA development
Encountered obstacles (Cont.) • The floating point units have a big delay time (130ns for the Division unit alone) • Monte-Carlo algorithm needs a delicate tuning and a lot of iterations for achieving a reasonable accuracy • A very large bus is needed in order to transfer the matrix elements.
Project achievements • Studied the Monte-Carlo algorithm and its architecture. • Wrote a C simulation in order to check the Monte-Carlo method. • Studied the VHDL language. • Found and adjusted a floating point library to the project needs. • Ran a simulation for the floating point unit.
Project achievements (cont.) • Implemented the switch and accumulator blocks in VHDL. • Implemented a basic chain using the switch and accumulator block. • Implemented and loaded to the V2P a circuit that used the floating point library.
Things to do • Implement the MP block, the memory controller and the computation control circuit. • Improve FP delays • Design a communication interface to load and send the matrix.