220 likes | 235 Views
Explore the implementation of new algorithms for the Calorimeter Trigger utilizing modern FPGAs, Xilinx Virtex-5 devices, and a unified design platform. Collaborate with physicists and engineers to evaluate tradeoffs and methodologies.
E N D
Calorimeter Algorithm Firmware Calorimeter Trigger Upgrade Firmware Michael Schulte, Katherine Compton, Tony Gregerson, Ben Buchli, and Amin Farmahini-Farahani U. Wisconsin - Madison February 19, 2009 In collaboration with Wesley Smith, Sridhara Dasu, Michail Bachtis, Kevin Flood, Tom Gorski, David Hinkemeyer, Shuvra Bhattacharyya, William Plishker, George Zaki, Nimish Sane, and Soujanya Kedilaya
Introduction • Motivation and Goals • Design Platform and Methodology • Preliminary Designs and Results • Input RocketIO and Input Buffering • Particle Cluster Finder • Cluster Overlap Filter • Planned Implementation on the Calorimeter Trigger Prototype • Planned Tools and Techniques
Motivation and Goals • The upgraded Calorimeter Trigger will require new algorithms • Modern FPGAs provide efficient platforms for these algorithms • Implement Calorimeter Trigger using • A unified design platform • Unified design and test methodologies • Techniques that facilitate future upgrades • Start by implementing a baseline design for the new algorithms
Initial Design Platform • Xilinx Virtex-5 devices contain • Virtex-5 Slices (4 LUTs and 4 flip-flops) • DSP48E Slices (multiplier, adder, and accumulator) • Block RAM (36 Kbits) • RocketIO Transceivers • GTP transfers up to 3.75 Gbps • GTX transfers up to 6.50 Gbps • Initial designs synthesized for • Xilinx Virtex-5 LX110T and TX240T FPGAs
Initial Design Methodology • Designs start with the algorithms • Physicists and engineers collaborate • Evaluate algorithm/implementation tradeoffs • Designs specified using • VHDL, Verilog, and Xilinx Core Generator • Designs implemented and tested using • Xilinx ISE v10.1 • ModelSim Xilinx Edition v6.3 • Gather results for • Input RocketIO, input buffering, particle cluster finder, and cluster overlap filter
Rocket IO and Buffering 16 16 • Our initial design on TX240T FPGAs uses Xilinx’s Aurora protocol for RocketIO inputs • Each GTX Dual Tile de-serializes 2x8x16 = 256 bits every 25ns. • 16 16-bit registers store data for 15 towers for 25ns. ECAL/HCAL Et [0] GTX Dual Tile 8 16-bit Registers 16 16-bit Registers 1 16 Serial RocketIO Tower Input 16 16 8 16-bit Registers 1 16 Serial RocketIO Tower Input 16 ECAL/HCAL Et [14] 16 15 ECAL Finegrain Bits Ref. Clock (640 MHz) RocketIO Ref Clock/16 (40 HMz) Cluster Input Ref . Clock/2 (320 MHz) Input Buffers Particle Cluster Finder Inputs
Rocket IO and Buffering • Each pair of RocketIO links provides 17-bit input data for 15 towers every 25ns • A 10 x 10 grid requires 14 RocketIO links • A 17 x 17 grid requires 40 RocketIO links Tower Virtex-5 Resource Utilization for RocketIO and Input Buffering on TX240T FPGA
Particle Cluster Finder • Process data in 2x2 clusters of towers • Inputs: 17 bits per tower (4x17 bits) • 8 ECAL Et bits • 8 HCAL Et bits • 1 ECAL finegrain bit • Algorithm is applied on overlapping clusters • Step of one tower • Identify if cluster contains “useful” particle energy • Eliminate some noise • Detect particle type 2x2 Tower Cluster (4 x 17 bits) Pattern comparator 17 bits 1 bit Threshold 17 bits 1 bit Threshold 17 bits 1 bit Threshold 1 bit 17 bits Threshold 1 bit match? Pattern Decision Check no Zero (38 bits) yes Finegrain OR 1 bit EPIM 1 bit Tower Energy Sums 4x9=36 bits
Algorithm • Input tower data • Apply threshold • Boolean result, singlebit per tower • Compare Boolean tower pattern to stored patterns • No match: output 38 zeros • Match: output 38 bits • OR of the finegrain bits • e/γ compatibility bit • Energy sums • 4 Towers (4x9 bits, E+H) 2x2 Tower Cluster (4 x 17 bits) Pattern comparator 17 bits 1 bit Threshold 17 bits 1 bit Threshold 17 bits 1 bit Threshold 1 bit 17 bits Threshold 1 bit match? Pattern Decision Check no Zero (38 bits) yes Finegrain OR 1 bit EPIM 1 bit Tower Energy Sums 4x9=36 bits
Electron/Photon Identification • The electron/photon identification module (EPIM) • Is the most complex module in the particle cluster finder • Currently sets the e/γ compatibility bit if • Various implementations were investigated • Multiplier based – can easily change Egamma_Threshold • Static tables – reconfigure FPGA to change EPIM algorithm • Dynamic tables – change EPIM algorithm by reloading table
Cluster Particle Finder Resource Usage for a Single EPIM on TX240T FPGA
Cluster Particle Finder Frequencies and maximum grid sizes for Particle Cluster Finder on TX240T FPGA
Particle Cluster Finder Resource utilization for Particle Cluster Finder with Partial Dynamic Tables on TX240T FPGA • Particle Cluster Finder • Synthesized for a 200 MHz clock (5 ns cycle time) • Latency of nine cycles (45 ns @ 200 MHz)
Cluster Overlap Filter • Applied on clusters produced by the Particle Cluster Finder • Ensure that a tower only “belongs” to a single cluster • Input: 9 clusters • A central cluster • The 8 neighboring clusters • Determine to which cluster each tower should belong • Keep towers in clusters with the most energy • Prune towers from other clusters Cluster origin (holds all cluster info) Central cluster Pruned tower 38 bits per input Neighbor cluster NE E SE S SW W NW N
Algorithm • For each “centeral” cluster, • Consider each neighbor • If central Et < neighbor Et, neighbor cluster is “stronger” • Remove overlapping towers from central cluster • Otherwise central cluster is “stronger” • Remove overlapping towers from neighbor • If no towers removed from central cluster, set its “central” bit • Next apply threshold to cluster energy • Output: 14 bits • 11 bits of cluster energy,1 Finegrain bit, 1 e/γ bit, 1 central bit Cluster origin (holds all cluster info) Central cluster Pruned tower 38 bits per input Neighbor cluster NE E SE S SW W NW N
Cluster Overlap Filter Design E1 E2 Finegrain,e/γ 2 bits Energy Adder 11b tower bit sequence (4x9 bits) E3 E4 Central 1bit Cluster Threshold E>X? Central < NE? E1 E2 E3 E4 Energy Adder 11b NE Central (1 bit) 1bit Central <= E? E1 E2 E3 E4 Energy Adder 11b E 1bit Central <= SE? E1 E2 E3 E4 Energy Adder 11b Energy (11bits) SE 1bit Central <= S? E1 E2 E3 E4 Energy Adder 11b S 1bit Central <= SW? E1 E2 E3 E4 Energy Adder 11b SW 1bit Central <W ? E1 E2 E3 E4 Energy Adder 11b Energy Adder E1+E2+E3+E4 W 1bit Central < NW? E1 E2 E3 E4 Energy Adder 11b NW 1bit Central < N? E1 E2 E3 E4 Energy Adder 11b N
Cluster Overlap Filter Results • Cluster Overlap Filter • Synthesized for a 200 MHz clock (cycle time of 5 ns) • Latency of five cycles (25 ns @ 200 MHz) • Operates in parallel with EPIM • No DSP48E or Block RAM resources needed Virtex-5 Slice Utilization for Cluster Overlap Filter
Latency Estimates • Estimated latencies are given in the table below • Clock rate of 200MHz (cycle time of 5 ns) • Cluster Overlap Filter operated in parallel with part of Particle Cluster Finder Estimated Latencies on TX240T FPGAs
Overall Resource Estimates • Estimated resources are given in the table below • Includes input RocketIO, input buffers, particle finder, and overlap filter • Additional grid sizes and FPGA devices should be considered Overall Resource Utilization on TX240T FPGA
Calorimeter Trigger Prototype • Implement the rest of the Calorimeter Trigger • Particle Isolation and Particle ID • Jet Reconstruction • Particle Sorter • MET,HT,MHT Calculation • Perform more in-depth testing and analysis of the designs • Enhance the initial designs • Prototype the Calorimeter Trigger designs
New Tools and Techniques • We are working with U. of Maryland researchers to investigating new tools and techniques to design, test, and upgrade the CMS firmware • Dataflow languages • DIF and OpenDF • Tools and techniques for • Unit testing and automated testing • Efficient designs with multiple FPGAs • Generating FPGA firmware and simulator code from a single high-level specification • Web-base repositories and version tracking • Consistent (automated) documentation practices
Conclusions • The preliminary firmware for the Calorimeter Trigger Upgrade has been developed • Initial results look promising • Additional designs are planned for this spring and summer • Still need to work on • Making the designs more easily upgradable • Experimenting with new algorithms • Helping to establish a unified platform plus unified design and test methodologies • New tools and techniques to facilitate future firmware development and upgrades