250 likes | 385 Views
The ATLAS Liquid Argon Calorimeters ReadOut Drivers. A 600 MHz TMS320C6414 DSPs based design. The LHC. LHC is an accelerator ring, where the protons beams are accelerated to energy of 7 TeV. The LHC goal will be to have protons from 1 beam collide with the protons from the other.
E N D
The ATLAS Liquid Argon Calorimeters ReadOut Drivers A 600 MHz TMS320C6414 DSPs based design Julie PRAST, LAPP CNRS, FRANCE
The LHC • LHC is an accelerator ring, where the protons beams are accelerated to energy of 7 TeV. • The LHC goal will be to have protons from 1 beam collide with the protons from the other. • 4 experiments. LHC : Large Hadron Collider (27 km diameter) Julie PRAST, LAPP CNRS, FRANCE
The ATLAS experiment • Goal: explore the fundamental nature of matter and the basic forces that shape our universe. • About the size of a five story building. • Collaboration of 2000 physicists. • 150 universities and laboratories in 34 countries. Julie PRAST, LAPP CNRS, FRANCE
The electromagnetic calorimeter • ATLAS : Several sub-detectors • Electromagnetic calorimeter • Identifies electrons and photons. • Measures energy carried by these particles. • 200 000 cells to be read at 40 MHz. Electromagnetic calorimeter Julie PRAST, LAPP CNRS, FRANCE
The calorimeter electronic chain Timing Trigger Control (TTC) FRONT END ELECTRONICS BACK END ELECTRONICS 1600 optical links Glink DETECTOR 800 Optical links Slink ROD ROB ANALOG MEMORY (SCA) AMPLI 12 Bits ADC Shaping FEB Julie PRAST, LAPP CNRS, FRANCE
The ROD modules • Calculate precise energy and timing of calorimeter signals from discrete time samples (t= 25 ns). • Perform monitoring. • Format data for the following element in the electronics chain. Julie PRAST, LAPP CNRS, FRANCE
The ROD modules goals 200 modules, each receiving data from 1024 calorimeter cells. Calculate energy for these data using optimal filtering weights: E = ai (Si - PED) If E > threshold, calculate timing and pulse quality factor: (< 10% cells) E = bi (Si - PED) 2 = (Si - PED - E gi) 2 Performs histograms of E, , 2, ... During calibration runs, perform signal averaging to calculate calibration constants for each channel. Julie PRAST, LAPP CNRS, FRANCE
Requirements • The ROD module must be able to process an event in less than 10 µs, including histograms. • Use of commercial programmable processor. • A natural choice is Digital Signal Processor • Efficient power calculation for that kind of algorithm. • High I/O bandwidth. • Modular design. Basic components should be easily changed/upgraded. • Low power consumption. Julie PRAST, LAPP CNRS, FRANCE
The ROD : a 9U VME board Julie PRAST, LAPP CNRS, FRANCE
The ROD Motherboard Julie PRAST, LAPP CNRS, FRANCE
At the beginning of LHC. ROD equipped with half of the PU. Level 1 trigger rate <50 kHz. Data from 4 FEB are routed to one PU. 1 DSP process 256 channels instead of 128. The Staging Mode Julie PRAST, LAPP CNRS, FRANCE
The DSP Processing Unit config TMS320C6414 FIFO 4k*16 Input FPGA Apex 20k160 FEB1 EMIF B 16 16 64 16 EMIFA McBSP2 FEB3 EXT_INT 16 HPI 16 Acex 1k30 McBSP0 McBSP1 TTC TTC interface VME TMS320C6414 Input FPGA Apex 20k160 VME interface FEB2 16 HPI BCID FIFO 4k*16 McBSP0 McBSP1 FEB4 64 EMIFA TType 16 EXT_INT 16 16 McBSP2 EMIF B 16 JTAG config Data stream TTC VME Julie PRAST, LAPP CNRS, FRANCE
The DSP Processing Unit FIFO Output FPGA DSP Input FPGA Julie PRAST, LAPP CNRS, FRANCE
PU Software Summary in ROD out Input data : Serial data in FEB format. Input FPGA : Parallelized data In DSP format DSP : For 128 channels per events E calculation or E, t, 2 Output data : Integer 16 bit E or Integer 16 bit E 32 bit t, 2and gain or 32 bit E 32 bit t, 2and gain Output FPGA : TTC data VME Interface Histograms « Programmable » Part Fixed part Julie PRAST, LAPP CNRS, FRANCE
Cache Memory 16kB data External Memory Interface Instruction Decoding 64 Registers Cache Memory 16kB data 8 Calculation Units The TMS320C6414 : a last generation DSP from TI CPU Core C64x Central Memory 1MB DMA Controller Périphérals Julie PRAST, LAPP CNRS, FRANCE
The DSP code structure Julie PRAST, LAPP CNRS, FRANCE
DSP Software • Developed with Code Composer Studio. • Whole code written in C language except • Physics loops written in linear assembly and then optimized using CCS. • Code complexity limited • Good legibility and maintenance Julie PRAST, LAPP CNRS, FRANCE
Example of Linear Assembly • Calculation of the cell energy : E=ai(si-p) • Let the compiler do all the laborious work of parallelizing, pipelining and register allocation. a1s1 a2s2+a2s2 a5s5+a5s5 aisi (i=2..5) aisi (i=1..5)E=aisi-aip mpy s1,a1,sa1 dotp2 a23,s23,sa23dotp2 s45,a45,sa45 add sa23,sa45,sa25 add sa1,sa25,sa15 sub sa15,px,e Julie PRAST, LAPP CNRS, FRANCE
DSP software results • Physics calculation of 128 channels : 3.5 s. • Includes all the necessary histograms • , 2 for a fraction of 10 % of high energy cells. • 30 to 40% of time is due to stall cycles. • Cycles lost because data are not in the cache. Julie PRAST, LAPP CNRS, FRANCE
Cache Memory 16kB data CPU Core C64x Central Memory 1MB Périphérals DMA Controller Cache Memory 16kB data The Cache Memory • When a data or instruction is not in the cache memory => 6 stalls cycles until the data is copied from the central memory to the cache. • For the E calculation : 6 data to be read => 36 wait cycles • The cache memory must be understood to ameliorate these numbers. Julie PRAST, LAPP CNRS, FRANCE
Cache Memory 16kB data CPU Core C64x Central Memory 1MB Périphérals DMA Controller Cache Memory 16kB data Which improvements ? • L1D Mapping: • Take care of which data is loaded, from which address and in what order. • L1D Pipelining: • Use of consecutive loads • 1 miss : 6 wait cycles • 2 misses : 8 wait cycles • 4 misses : 12 wait cycles • L1D access optimization • Samples preloading • Interleaved histograms Julie PRAST, LAPP CNRS, FRANCE
DSP software results • Physics calculation of 128 channels : 3.5 s. • Includes all the necessary histograms • , 2 for a fraction of 10 % of high energy cells. • 30 to 40% of time is due to stall cycles. • Cycles lost because data are not in the cache. • The complete code takes about 7 s (600 MHz DSP). • Includes the RTX kernel, synchronization and send tasks, … • 30 % of margin for further improvements. Julie PRAST, LAPP CNRS, FRANCE
Agenda • Mid March : Motherboard + PU assembled • May 2003: Validation in standalone mode. • Fall 2003: System test in the experimentenvironment. • Spring 2004: production launch. • Summer 2004: Boards installation at LHC. Julie PRAST, LAPP CNRS, FRANCE
Conclusion: the ROD • Calculate precise energy and timing of the signals calorimeter. • 1 motherboard and 4 Processing Units. • 1 PU = two 600 MHz TMS320C6414 DSP. • 30 % of margin for future improvements. • 200 ROD to be produced in 2004. Julie PRAST, LAPP CNRS, FRANCE
Thank You Julie PRAST, LAPP CNRS, FRANCE