380 likes | 496 Views
Study on buffer usage and data packing at the FE. LHCb Electronics Upgrade Meeting 11 April 2013. Federico Alessio, w ith inputs from Richard, Ken, Guillaume. Scope. Attempt to study : Impact of TFC commands on behaviour of FE buffer in upgraded readout architecture
E N D
Study on buffer usage and data packingat the FE LHCb Electronics Upgrade Meeting 11 April 2013 Federico Alessio, with inputs from Richard, Ken, Guillaume
Scope • Attempt to study: • Impact of TFC commands on behaviour of FE buffer in upgradedreadoutarchitecture • Feasibility of packingalgorithmacross GBT link asspecified in readoutarchitecturespecifications • ThispresentationisNOT intendedto show youhow to pack across the GBT link or how to use the buffer. • However, itIS intended to stimulatediscussionsusing a practicalexample on possiblesolutions/implicationsat FE in the global readoutarchitecture. • Thereis a publishedinternal note whichcontainswhatI’mpresentinghere: LHCb-INT-2013-015 . • Itisnotfinal, itismeantonly to stimulatediscussions feedbacks! 2
TFC simulationtestbench • First simulationtestbench (est. 2009) developed in VisualElite from Mentor Graphics. Includes: • S-ODIN • SOL40 (only TFC) • LHC clock • LHC fillingscheme • LLT emulation (based on current L0) • Custom-made FE emulationblock • GenericFE emulation • OT-like • CALO-like • No TELL40 emulation, throttleisfaked • Everythingis an HDL entity, portable to othersimulationplatforms • Basically, the aimis to simulate a (very small) slice of the readoutsystem • === Mini-DAQ including FE emulation • Couldaddfew FE channels with differentoccupancies • *onlyproblemissimulation time 3
FE emulation, why? • Needed to develop a FE emulationblock to simulate the generation of detector data • Used to • study impact of TFC commandsat FE buffer behaviour • demonstratefeasibility of packingmechanismat FE aswritten in specs • emulate FE data generator to spy on sub-detectors for FE reviews…. • Proposed to use itas a practicalexample of a generic FE data generator for the readoutarchitecturesimulationframeworkuntil sub-detectors’ codesbecomeavailable • Description of the code here • Simulationresults • Considerations on packingmechanism • Considerations on buffer usage • Synthesisresults • Practicalproof of howimportant • simulating code is… 5
Generic FE channelas in specs • FE channelcontains a buffer: • No trigger at FE, so buffer isactually a derandomizer. • Used to pipe data @ 40MHz to be packed and sent over GBT link. • If no TFC command and occupancytoo high, buffer willfill up veryveryquickly • We are runningat 40MHz! It’s 40 timesfasterthannow… • Mechanism to empty buffer • TFC commands come in handy • DATA coming out on GBT link: • No emptyspaces, no unexpected 0s • Fullydynamicpackingalgorithmacross GBT frame-width • Wishingly, data should be in order… 6
The code: GBT dynamicpacking Very important to analyze simulation output bit-by-bit and clock-by-clock! 9
The code: configuration • FE generic data generator is fully programmable: • Number of channels associated to GBT link • Width of each channel • Derandomizer depth • Mean occupancy of the channels associated to GBT link • Size of GBT frame (80 bits or WideBus + GBT header 4 bits) • Extremely flexible and easy to configure with parameters • Covers almost all possibilities (almost…) • Including flexible transmission of NZS and ZS • Including TFC commands as defined in specs • Study dependency of FE buffer behaviour with TFC commands • Study effect of packing algorithm on TELL40 • Study synchronization mechanism at beginning of run • Study re-synchronization mechanism when de-synchronized • Etc… etc… etc… • And it is fully synthesizable… 10
Simulationresults • Simulated 11 different scenarios: • fixed GBT size to 80 bits + 4 bits GBT header • fixed width of data header to 24 bits in three fields (12 for BXID, 8 for data size, 4 for info) • fixed width of data channel to 5 bits as practical example • Numbers scale relatively: less occupancy, more number of channels 11
Simulationresults Scenario 1: 10% occupancy, 50x5bits channels, derandomizer depth 75 Scenario 2: 25% occupancy, 50x5bits channels, derandomizer depth 75 12
Simulationresults Scenario 8: 40% occupancy, 32x5bits channels, derandomizer depth 165 Scenario 9: 40% occupancy, 32x5bits channels, derandomizer depth 165 + NO BX VETO sent from TFC 13
Simulationresults Filling scheme TFC commands FE data generated Derandomizer occupancy GBT output For a bit-by-bit zoom in please come to my office 14
Simulationresults For a bit-by-bit zoom in please come to my office or ask the code 15
Synthesisresults • Using Quartus Altera 12.1 SP1 • No synthesis optimization done, let fitter free, no pinout defined, no timing constraint • No memory cells used • Doable, can be further improved though. 16
FYI, simulationoutlook • Simulation should be a coordinated effort • Personal drive in order to be able to produce a (complex) code for TFC on time • FE generic code + TFC code should be merged with TELL40 effort • To test both FE packing algorithm and FE buffer management • To test decoding at TELL40 and investigate consequences/solutions • To analyze effects of TFC commands on global system (including TELL40) • Effort already ongoing between me and Guillaume to do so • We would very very much appreciate to have the code (emulation) of each sub-detectors • a FE generic code is useful to study things on paper, but real code is something different • Proposal is to use this simulation effort to validate FE code • simulation performed by me and Guillaume to investigate solutions, issues in FE 17
Conclusions • Packing mechanism as specified in our document is feasible. • Will be used temporarily to emulate FE generated data in global readout and TFC simulation. • However, very big open questions: • Is your FE compatible with such scheme? What about such code in an ASIC? • Behaviour of FE derandomizer will strongly depend on your compression or suppression mechanism. • If dynamic could create big latencies • If your data does not come out of order can become quite complicated… • Behaviour of FE derandomizer will strongly depend on TFC commands • FE buffer depth should not rely on having a BX VETO! Aim at a bandwidth for fully 40 MHz readout BX VETO solely to discard events synchronously. • What about SYNCH command? When do you think you can apply it? Ideally after derandomizer and after suppression/compression, but… • How many clock cycles do you need to recover from an NZS event? • Can you handle consecutive NZS events? 18
Qs & As? 19
System and functional requirements • Bidirectionalcommunication network • Clock jitter, and phase and latency control • At the FE, butalsoat TELL40 and between S-TFC boards • Partitioningto allowrunning with any ensemble and parallelpartitions • LHCinterfaces • Eventsrate control • Low-Level-Trigger input • Support for old TTC-baseddistributionsystem • Destination control for the eventpackets • Sub-detectors calibrationtriggers • S-ODIN data bank • Infomationabouttransmittedevents • Test-benchsupport 20
The S-TFC system at a glance • S-ODINresponsible for controllingupgradedreadoutsystem • Distributing timing and synchronouscommands • Manages the dispatching of events to the EFF • Rate regulates the system • Support old TTC system: hybridsystem! STORAGE • SOL40responsible for interfacingFE+TELL40 sliceto S-ODIN • Fan-out TFC information to TELL40 • Fan-in THROTTLE information from TELL40 • Distributes TFC information to FE • Distributes ECS configuration data to FE • Receives ECS monitoring data from FE DATA DATA 21
The upgraded physical readout slice • Common electronicsboard for upgradedreadoutsystem: Marseille’s ATCA board with 4 AMC cards • S-ODIN AMC card • LLT AMC card • TELL40 AMC card • LHC Interfaces specific AMC card 23
Latest S-TFC protocol to TELL40 Wewillprovide the TFC decodingblock for the TELL40: VHDL entity with inputs/outputs • «Extended» TFC word to TELL40 via SOL40: • 64 bits sentevery 40 MHz = 2.56 Gb/s (on backplane) • packed with 8b/10b protocol(i.e. total of 80 bits) • no dedicated GBT buffer, use ALTERA GX simple 8b/10b encoder/decoder • MEP acceptcommandwhen MEP ready: • Take MEP address and pack to FARM • No need for special address, dynamic Constant latency after BXID • THROTTLE information from each TELL40 to SOL40: • no change: 1 bit for each AMC board + BXID for which the throttlewas set • 16 bits in 8b/10b encoder • same GX buffer asbefore (assame decoder!) 24
S-TFC protocol to FE, no change • TFC word on downlink to FE via SOL40 embedded in GBT word: • 24 bits in each GBT frame every 40 MHz = 0.98 Gb/s • allcommandsassociated to BXID in TFC word • Put localconfigurabledelays for each TFC command • GBT doesnotsupportindividualdelays for each line • Need for «local» pipelining: detector delays+cables+operationallogic (i.e. laser pulse?) • DATA SHOULD BE TAGGED WITH THE CROSSING TO WHICH IT BELONGS! • TFC word willarrivebefore the actualeventtakesplace • To allow use of commands/resets for particularBXID • Accounting of delays in S-ODIN: for now, 16 clock cyclesearlier + time to receive • Aligned to the furthest FE (simulation, then in situ calibration!) • TFC protocol to FE hasimplications on GBT configuration and ECS to/from FE • seespecsdocument! 25
Timing distribution • From TFC point of view, weensureconstant: • LATENCY: Alignment with BXID • FINE PHASE: Alignment with best samplingpoint • Some resynchronizationmechanismsenvisaged: • Within TFC boards • With GBT • No impact on FE itself • Loopbackmechanism: • re-transmit TFC word back • allows for latencymeasurement + monitoring of TFC commands and synchronization 27
How to decode TFC in FE chips? FE electronicblock • Use of TFC+ECS GBTsin FE is 100% common to everybody!! • dashedlines indicate the detector specificinterfaceparts • pleasepayparticular care in the clock transmission: the TFC clock must be used by FE to transmit data, i.e. lowjitter! • Kaptoncable, crate, copperbetween FE ASICs and GBTX 28
The TFC+ECS GBT Clock[7:0] External clock reference FEModule • These clocks should be the main clocks for the FE • 8 programmablephases • 4 programmablefrequencies (40,80,160,320 MHz) E – Port GBTX e-Link Phase - Shifter CLK Reference/xPLL E – Port FEModule E – Port ePLLRx GBTIA DEC/DSCR CDR E – Port data-down data-up Phase – Aligners + Ser/Des for E – Ports CLK Manager clock 80, 160 and 320 Mb/s ports GBLD SCR/ENC SER E – Port ePLLTx FEModule E – Port E – Port • Used to: • sample TFC bits • drive Data GBTs • drive FE processes Control Logic Configuration (e-Fuses + reg-Bank) one 80 Mb/s port GBT – SCA JTAG I2C Slave I2C Master E – Port data I2C (light) control clocks JTAG port I2C port 29
The TFC+ECS GBT protocol to FE • TFC protocolhasdirectimplications in the way in which GBT should be usedeverywhere • 24 e-links @ 80 Mb/s dedicated to TFC word: • use 80 MHz phaseshifter clock to sample TFC parallel word • TFC bits are packed in GBT frame so thattheyall come out on the same clock edge • We can repeat the TFC bits also on consecutive 80 MHz clock edgeifneeded • Leftover 17 e-linksdedicated to GBT-SCAs for ECS configuring and monitoring(seelater) 30
Words come out from GBT at 80 Mb/s • In simplewords: • Odd bits of GBT protocol on risingedgeof 40 MHz clock (first, msb), • Even bits of GBT protocol on fallingedgeof 40 MHz clock (second,lsb) 31
TFC decoding at FE after GBT • Thisiscrucial!! • wecan alreadyspecifywhereeach TFC bit will come out on the GBT chip • thisis the only way in which FE designers stillhaveminimalfreedom with GBT chip • if TFC info waspacked to come out on only 12 e-links (first oddtheneven), thendecoding in FE ASIC would be mandatory! • whichwouldmeanthatthe GBT bus wouldhave to go to each FE ASIC for decoding of TFC command • thereisalso the idea to repeat the TFC bits on even and odd bits in TFC protocol • wouldthat help? • FE couldtielogicalblocksdirectly on GBT pins… 32
Now, what about the ECS part? • Eachpair of bit from ECS field inside GBT can go to a GBT-SCA • OneGBT-SCA isneeded to configure the Data GBTs(EC one for example?) • The rest can go to either FE ASICs or DCS objects(temperature, pressure) via other GBT-SCAs • GBT-SCA chip hasalreadyeverything for us: interfaces, e-linksports .. • No reason to go for somethingdifferent! • However, «silicon for SCA will come laterthansilicon for GBTX»… • Weneedsomethingwhilewewait for it! 33
SOL40 encoding block to FE! • Protocol drivers build GBT-SCA packets with addressing scheme and bus type for associated GBT-SCA user busses to selected FE chip • Basically each block will build one of the GBT-SCA supported protocols Memory Mapwith internal addressing scheme for GBT-SCA chips + FE chips addressing, e-link addressing and bus type: content of memory loaded from ECS 34
Usual considerations … • TFC+ECSInterface has the ECS load of an entireFE cluster for configurating and monitoring • 34bits @ 40 MHz = 1.36Gb/son single GBT link • ~180 Gb/s for full TFC+ECSInterface (132 links) • Single CCPC mightbecomebottleneck… • Clara & us, December 2011 • How long to configure FE cluster? • howmany bits / FE? • howmanyFEs/ GBT link? • howmanyFEs / TFC+ECSInterface? • Numbers to be pinned down soon+ GBT-SCAinterfaces and protocols. 35
Old TTC systemsupport and runningtwosystems in parallel • We already suggested the idea of a hybrid system: • reminder: L0 electronics relying on TTC protocol • part of the system runs with old TTC system • part of the system runs with the new architecture • How? • Need connection between S-ODIN and ODIN (bidirectional) • use dedicated RTM board on S-ODIN ATCA card • In an early commissioning phase ODIN is the master, S-ODIN is the slave • S-ODIN task would be to distribute new commands to new FE, to new TELL40s, and run processes in parallel to ODIN • ODIN tasks are the ones today + S-ODIN controls the upgraded part • In this configuration, upgraded slice will run at 40 MHz, but positive triggers will come only at maximum 1.1MHz… • Great testbench for development + tests + apprenticeship… • Bi-product: improve LHCb physics programme in 2015-2018… • 3. In the final system, S-ODIN is the master, ODIN is the slave • ODIN task is only to interface the L0 electronics path to S-ODIN and to • provide clock resets on old TTC protocol 36