50 likes | 181 Views
TF-DDU Activities. J. Gilmore 2 June 2009. TF-DDU Firmware Updated. Wednesday : Loaded latest DDU firmware solved the few-per-million S-Link CRC problem no errors found in ~2 billion events taken at 60-95 kHz Late that night : TF-DDU "FMM Error" stops 2 Global runs
E N D
TF-DDU Activities J. Gilmore 2 June 2009
TF-DDU Firmware Updated Wednesday: Loaded latest DDU firmware • solved the few-per-million S-Link CRC problem • no errors found in ~2 billion events taken at 60-95 kHz Late that night: TF-DDU "FMM Error" stops 2 Global runs • "Error" may have been Lost Sync or Busy (no details) • occurred while running Random L1A 60-70 kHz • never before had Global run go so long at this rate • first error after 100 minutes running • second after 65 minutes running • possible causes: DDU detected an error or a buffer overflow • Note: do not confuse this with other “TF” errors early Thursday, as those were related to a CCB clock problem on VME-4/6
Why this is hard to Diagnose We don't have any data for these cases • Global DAQ discards all data from random L1As • local DAQ can't keep up at such high rate • at least 10 times faster than local DAQ • must run diagnostic via VME when it occurs Watched & waited all day at Point 5 for a recurrence (~1 billion events) • But it never happened again! • Even with rates over 90 kHz • not very likely to be a buffer overflow…
A Possible Clue... TF-DDU VME diagnostics • monitoring all day did reveal a warning • Minor indicator, not FMM level • GT-Rx error on fiber 2, observed 2 times (hours apart) • shows fiber transmission issue from SP4 to DDU • usually innocuous, but can cause loss of data
Another issue revealed CMS may run at ~100 kHz L1As next year …and TFs send a lot of data • Current minimum is 1.8 kB total per L1A • sum for 12 TFs, even for empty events • In 95 kHz Global runs, DAQ saw 2.0 kB average • TF-DDU sent 170 MB/s rate, close to S-Link "limit" • not "190" due to backpressure dead-time from other systems May require future event size reduction • Lev suggests 2 options for consideration • reduce from 7 time bins to 5 • change data format to "skip" empty bins