490 likes | 592 Views
Overview of Level 2. James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999. Requirements. 10 KHz input rate 100 sec decision nominal time budget Reject 90% at acceptable efficiency read out at 1KHz Deadtime < 5% 16 buffers for events awaiting decision
E N D
Overview of Level 2 James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999
Requirements • 10 KHz input rate • 100 sec decision nominal time budget • Reject 90% at acceptable efficiency • read out at 1KHz • Deadtime < 5% • 16 buffers for events awaiting decision • Flexibility in trigger configuration
L2 Trigger Cal e / j / Et CAL CAL (w/o STT) PS FPS/CPS PS Global L2 CFT/CPS CFT Track SC Muon Muon MDT PDT L2: Combines objects into e, m, j L1: ET towers, tracks consistent with e, m, j
Architecture • 2 to 3 stage stochastic pipeline 100 sec/stage • Preprocessors for individual detectors • Global processor to combine detectors • 128 trigger conditions (1 to 1 with L1) • each programmable • series of conditions (e, j, ) and cuts (ET > 20) • 16 buffers in Front end (real events) • 16 buffers in front of preprocessors, global • Busy raised by Front Ends • Hardware frame drives readout
Queuing Simulations: Effective Time Budget • 50-75 sec/stage: tails of processing time • RESQ + standalone to check simple cases • Preprocessors: not event synchronous • avoid worst of n distribution • Need the buffers in front of each element • Avoid long tails in processing time • Farms feasible only if surrender event order • our front ends required order preservation
Inputs VBD MPM L2 HWFW (Global only) 128 L2 Answer Standard Crate MBus VME TCC Wo r ker Admin MBT SCL Outputs to Global (preprocessors only) L3 7 VME slots minimum up to 5 workers per crate short (non-CDF) MBus Dec Alpha (Unix) JTL, MSU 12/18/97
Standard Crate VME Slot Assignments • 1: Bit3 (Crate Controller) no J3 (1 slot) • 2: VBD (2 signals from J3 to Admin) • through hole in blank MBus • 3-6 J3 connector for VTM • up to 4 FIC’s, or any non-MBus cards (SLIC/SFO) • 7-21 J3 Magic Bus: • 20-21 Administrator (all Alphas 2 slots) • 19 Pilot MBT (preproc. : 1MBT for 2 Workers) • 18 down [Assistant MBT as needed] • [need 1 MBT per 2 Workers for output] • 7-8 up up to 5 Workers (or non-MBus cards)
Alphas • Up to 1 GIP Alpha 21164 on VME card • small local disk for bootup • Enet to Dec Unix Alpha for user .EXE, debugging • Most MBus I/O via MBT card • MBus DMA input 80-100 MB/s (Input “for free”) • MBus bi-directional programmed I/O 20 MB/s? • preprocessor output to Global • but interprocessor communication w/o MBT • 2 per crate • Worker formatting, Output to Global • Administrator housekeeping, L3 R/O
Alphas, continued • VME for L3 readout, monitoring, downloading • 32 bits ECL output • scaler gates for monitoring in L1 Scalers • available even if alpha crashed to tell states • J2 Inputs • miscellaneous communication • e.g. “you have a message from MBT”
MBTMagic Bus Transceiver • VME slave; MBus master and slave • Administrator controls card(s) • 7 Cypress Hotlink inputs • 16 MB/s each (Gigabit Ethernet UTP) • broadcast to Alphas (Workers & Admin) on MBus • normal data Input path • 3 Cypress Outputs • 2 Preprocessor outputs to L2 Global • 1 Echo of L1 SCL info
MBT, continued • Serial Command Link (SCL) Receiver • broadcast L1 to Alphas on MBus • synchronization check • L1 Qualifiers (basic info on handling events) • echo’d on Cypress output for SLIC • Queue L2 accept/rej for Administrator MBus reads • Parallel Output (16-128 b) • Global uses to send L2 decision to L2 HWFW • handy for monitoring/debugging
Other Cards(Not unique to L2) • Bit3 is commercial VME interface • multiport for indirect communication with TCC • parameter download, monitoring, error logging • VBD is standard DØ VME Readout to L3 • tolerable constraints on how Alphas read out • Forces interprocessor communication to MBus
Bit3 MPM • Commercial; fiber optic connection • To PCI of a PC; VME master,crate controller • Add Multiport Memory Module • Perform general VME I/O, generate interrupts • Download parameters for run • Run begin/end commands • Collect Monitoring information • preferably, already placed in MPM by Administrator Alpha • If necessary, can collect from other modules
VBD • Standard DØ card • VME Master to read out to L3 (standard card) • Not interruptible during Readout • Probably 10-20 MB/s effective (more?) • Must read from SAME set of VME addresses every event • intent is readout from Worker Alpha • move data, or map to actual location • some wordcounts may be zero • faster if fewer addresses
Standard Crate Uses • Global JUST Standard Crate described so far • Cal: more workers • Standard Crate can also be used with non-Alpha, non-MBus pre-preprocessor • Cypress inputs to Worker via MBT • format, massage data for Global • handle L2, L3 buffering & I/O, most of monitoring • Completely standard data movement software • User code testable once data structure fixed • Penalty: extra latency (lose a buffer) • 3-stage pipeline as in L2Mu, L2STT
L2 Inputs • Cypress Hot Links 160 Mbit/s UTP • well-defined protocol • begin, end event special characters • compatible with muon (except cable: CIC) • Standard L2 header/trailer defined • some header info repeated in trailer • allows more error detection/correction • Hardware Longitudinal Parity Check in trailer
L2 Header B0 # objects (NOT IN HEADER) [note 255 max!] B1 Header Length in 4B words (1B) [=3 for default] B2 Object Length in 4B words (1B) [ALL same size!] B3 Header/Trailer Format # ( hi 3 bits) [ONLY changes if new format] Object Format # ( lo 5 bits) [ONLY changes if new format] B4 Data Type # (1B) [unique in all L2 MBT inputs] B5 Bunch # (1B) B6-7 Rotation# (2B) [B6 is LSB of rotation] B8 Algorithm Major Version (1 B) [e.g. 7 from Version 7.1] B9 Algorithm Minor Version (1B) [e.g. 1 from 7.1] or Processor Specific Bits (1B) [esp. if hardware data source] B10 Processor Specific Bits (1B) B11 Status Bits [b7 on means some error] [some standard for L2 Proc]
Standard Status Bitsb7, b0 for all; others if L2proc 7 error on event (any kind): use at own risk 6 no processing attempted (none required) 5 object list truncated (any reason) 4 Receiver error on some input physical trailer 3 2 1 more data-type info (processor-specific) other test modes; unbiased-sample data... 0 0 for real data, 1 for MC data
L2 Trailer B0 Bunch # (1B) = B5 of Header B1 Data Type # (1B) = B4 of Header (Swapped even/odd from Header) B2 Longitudinal Parity of even Bytes B3 Longitudinal Parity of odd Bytes or--if parity too slow to calculate, Turn # (B6-7 of Header) MBT Out, SLIC, FIC will append physical trailer with 8-bit hardware-generated longitudinal parity Zero padding to 16 B group FOLLOWS trailer, before End of Event
L2 Physical Trailer • FIC, SLIC, MBT Out: add a physical 2B trailer • after logical trailer, before End Event • This BREAKS 16B boundary, but handled by MBT • B0 8 bit longitudinal parity of received data • B1 Status Bits [b7 on if any receive error] • not included longitudinal parity! • b0, b1 are type ID: 0 = FIC, 1 = SLIC, 2 = MBT • MBT inputs place this in B0, B1 of 16B physical trailer • adds B14, its own longitudinal parity of everything received • B15 its own Error Bits [b7 on if any receive error] • reserves 4B for incoming, may give error locations in B4-13 • MBT Outs produce 2B physical trailer like FIC
SLIC:Serial Link Input Card • 16 Cypress serial inputs • 1-slot VME slave card • 4 TI DSP’s, up to 2 GIPs each • more inputs, CPU / slot than Alpha • output via Hotlink to MBT (avoids VBD R/O) • Readout via Worker Alpha via MBT • Acts as pre-preprocessor • test registers on all inputs (eg. SCL) • NO MBus! (big simplification)
SFO: SCL Fanout(Really: Cypress Fanout) • Receives L1 SCL information • from MBT as Cypress Hotlink • Fans out as Cypress output to 12 SLIC cards • event synchronization • L1 Qualifiers • purely analog fanout • can be used to fan out any Cypress signal • L1HWFW messages to L2 • potential use in L2STT?
Inputs VBD MPM Standard Crate with SLIC MBus VME TCC Admin SLIC Wo r ker SFO MBT SCL Outputs to Global L3 10 VME slots minimum Inputs Dec Alpha (Unix) JTL, MSU 12/18/97
Fiber Input Converter (FIC) • Convert Fiber Input to Cu Cypress 160 Mb/s • G-link input 16b data in 20b data frame (24b total) • input thru J3 by standard VTM (hard G-link engineering done) • implement g-link input via VRB card • allows passive split for fanout to L3 or STT • adds physical trailer with longitudinal parity • Front end to either SLIC or MBT • avoids variants of complex card • used in L2PS, L2Cal, L2CTT • 4 independent channels per card • VME control, monitoring
FIC: L2CFT from L1 CFT trigger (& L1 Cal) • g-link 1.3Gb/s = 106MB/s • 16b=2B data in 24b frame, frames at 53MHz • L1CFT: 100B (50 tracks)/fiber to STT in 1 s • standard L2 header • trailer includes 2B longitudinal parity • pad w/ trailing zeros • L1Cal: • similar format, fixed-length data • optical split from data for L3 readout
VBD MPM Standard Crate with FIC to MBT MBus VME TCC Admin Wo r ker FIC MBT SCL Outputs to Global L3 9 VME slots minimum Inputs Dec Alpha (Unix) JTL, MSU 12/18/97
Inputs VBD MPM Standard Crate with FIC to SLIC Inputs MBus VME TCC Admin SLIC Wo r ker FIC SFO MBT SCL Outputs toGlobal L3 11 VME slots minimum Dec Alpha (Unix) JTL, MSU 12/18/97
Trigger Connections L2STT (In Design) Si Trker 288 Fi-Glink 1. 3Gb/s, 20-bit Undetermined 6 12 Fi-Glink 1.3Gb/s 20-bit L2CTT (FIC/MBT) Broad L1 CFT 2 6 4ax,2st: 6 FE Fi-Glink 1.3Gb/s 20-bit L2PS (FIC/MBT) L1 FPS Broad Broad 2 FE FE Cu-AMCC 1.4Gb/s L2G (MBT) 4 96 Cu-AMCC 1.4Gb/s L1 m MGR Cu-Cyp 160Mb/s L1 m 3 Fi-Cypress 160Mb/s 1 ~280 3 Cu-Cyp160 Mb/s L2m (SLIC/MBT) MUON Cu-Cypress 160 Mb/s 2 CIC ~150 ~150 L2CAL (FIC/MBT) L1 CAL 4 Fi-Glink 1.3 Gb/s 20-bit 10
Loading of Paths L2STT (STC/MBT) Si Trker 288 6 6 1248/208 (max) 600/300 L2CTT (FIC/MBT) L1 CFT Broad 2 6 1632/272 (max) 4ax,2st: 6 FE L2PS (FIC/MBT) 1088/272 (max) L1 FPS Broad 2 FE L2G (MBT) 4 96 L1 m MGR L1 m 3 1 ~280 3 L2m (SLIC/MBT) MUON 5000/100 2 CIC ~150 ~150 4 L2CAL (FIC/MBT) L1 CAL 3200/350 10 240/40
L2 Overview II, and Summary James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999
L2 Maximum Event Sizes(FIFO size choice) • Length = 16B(min) … 4KB (max) X 16 events includes 12B header and 4B trailer source pads to multiples of 16B with zeros after trailer • VRB: 32KB or 64KB, but currently no raw data to L2! • 5 KHz max (Cypress) is 16B/s X 200 s =3.2KB • clearly issue of max, not mean! • Actual Max EventFIFO “event” total FIC/CFT/PS 272B .5KB 8KB Cal/MBT 304B 4KB 64KB Mu/SLIC .3 to 3KB .5KB 8KB Global/MBT 2.3KB 4KB 64KB =255 tracks*8B (255*16B = 4KB = STT?)
SCL INITIALIZEwhy we avoid it • Needed if event fragments don’t match • must clear all buffers EVERYWHERE and restart • violent: touches EVERY front end crate • Avoidance: • redundancy header to trailer (protect 1-bit errors) • try to preserve event format (to find trailer) • try to preserve event boundary (else must re-init) • detect missed event boundary (end or begin) • send pads before End Event to reframe if needed
Monitoring(online, data flow) • Every 5 seconds, via TCC/administrators • Some by L1 Scalers, some by VME • L1 Scalers available even if alphas crash • buffer occupancy for data flow diagnosis • lots of buffers, need to be able to look at them • in all cards owning buffers: FIC, SLIC, MBT, Alpha • but DMA: most events into alpha’s buffers • time in state (like L3 in Run I) in all alphas • idle, processing, waiting, interrupt... • Global’s pass fraction by bit #; events vs node
L1 Scalers for Alpha States • ECL Gates: sampled every beam crossing • Worker States (5) • wait/event, process, wait/admin, interrupt, collecting_status • Admin States (6) • wait, reply/worker, manage_L3, interrupt, L2_Acc/Rej, collecting_status • multi-workers: wait more complex: • wait/event, wait/worker, wait/L2_Acc/Rej + Processing • Shows time fraction in state • thus <time> in state
Alpha Buffer Monitoring(L1 Scalers) • Alpha should be where events stack up • Binned histo: # events in each buffer state • buffer states monitored in Administrator • allocated, processing, wait/L2_Acc/Rej, wait/L3 • to be allocated to worker n • free • Each L2 crate: 22 scalers/admin + 1/worker
VME (“Slow”) Monitoring(Via Bit3, TCC) • Exact event accounting from ALL cards • evaluated after marked event (CollectStatus) • All non-Alpha Cards in L2 Crate • MBT, SLIC, FIC • State and Buffer Occupancy Sampled on board • lower statistics, perhaps 500 Hz • Alphas can monitor distribution of event times • Circular buffer of start/stop times (Histogram on host) • 1 CPU cycle (2ns) resolution possible • same mechanism for any state duration • or counts sampled by event (e.g. # tracks)
Monitoring:Event Samples • Histograms of objects found (Examine) • few (.2 Hz) Unbiased Sample (No L2 cuts) • mostly fails • few more events before L3 cuts (after L2) • passed L2 • mostly after passed L3 • Verification: run simulator on these samples • check data arrives intact • bit by bit compare with online L2 results • detects hardware, history bugs (nearly only way)
Test Stand at FNAL • 4 crates: • Global simulator (Admin + Worker) • 2 preprocessor simulators (A+2W, A+W+Slic) • 1 data source (2alphas, MBT’s; own MBus) • Incomplete system-- • no L1, L2 • not enough parts for full code of any/all crates • except maybe full playback for Global • could reconfigure if need be--painful! • Copy of some real-time inputs? (grounding!?)
Test Stand: What can it do? • (Pre-)Commissioning/debugging • alpha-alpha and alpha-MBT issues • Timing, verification of download • run in real environment; count clock cycles • how good is offline simulator? • Playback • drop data into memory • testing pre-release after running in simulator • Debugging • event dump and restart (else debug = deadtime) • hard to write event dump/reload!?
Zvtx? • Zvtx in 6cm bins from L0? • actual resolution varies with luminosity • IF felt to be worthwhile at L2 resolution • better to know Z better than Z=0 • or avoid making mistakes and possible L dependence • studies in progress • L2STT also considering mechanisms • candidate vertex Z’s, then algorithm to report one • better intrinsic accuracy than L0 • different luminosity dependence than L0
How does L2STT fit in? • Well defined protocol allows it feed into a preprocessor via MBT • like SLIC’s do in muon crate • Send data to L2CTT crate • pt-ordered list and impact-ordered list to Global • just pt-ordered from L1CFT, lower resolution • extra input already reserved • add 1 MBT and alpha to L2CTT crate • allows simultaneous input of L1CFT and L2STT • can build two kinds of lists in parallel • modest cost; $8K (or use spares / cannibalize test stand) • can run in parallel until shaken down
L2 STTimpact on L2 performance • Heavier loading on new CTT inputs • 16 B per track? Duplicates? • Heavier loading of pt ordered list? • More Bytes/track? But can reject tracks, too! • New output: impact parameter ordered list • already included in bandwidth estimates • More work for Global? • Yes, but controlled by scripts • Must limit # tracks used in matches! • STT can only give higher quality
Budget • Detailed Cost Estimate Exists • Latest update is down about 80K$ (SLIC) • nearly back to original estimate • L2STT loaner crates, L2CTT upgrades • included in above estimates • engineering costs not over till it’s over • how many extra alphas as insurance? • may be hard to do a 2nd run
How Many Alphas? • Roughly X 2 design safety factor (10KHz) • more alphas are only lifeboat too slow • but gain is not linear, maybe square root • cannibalize test stand (IF it becomes unimportant!) • Other uses for more alphas: • Shadow nodes (online test at high statistics) • where? Real crate or test stand? • potential use in STT? • Production Alpha order in next few months • have requested 2 X for some obsolescent parts
Schedule • Detailed scheduled exists • SLIC now on critical path • Rule: L2STT can’t compromise schedule • Prototypes due March-May 1999 • Production May-Dec 1999
Issues • L2STT design decisions (and money…) • Prototype to production to installation • Transition to software • simulation needed (decisions, studies, development) • Low level software (“drivers”) • finish download path, L3 output path • Manpower crunch coming • Monitoring, verification, releases just starting • infrastructure, definitions needed soon, then people... • who writes global algorithms? (MSU, probably?) • studies of trigger scripts, global algorithms
Conclusions • Solid, modular design for L2 trigger • connectivity understood; prototypes in progress • clear method for L2STT to integrate • Time budgets understood from simulation • Hardware supports appropriate algorithms • Have the manpower for the hardware • Have excellent core group for the software • Request TDR approval: • go-ahead for production