1 / 49

Overview of Level 2

Overview of Level 2. James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999. Requirements. 10 KHz input rate 100  sec decision nominal time budget Reject 90% at acceptable efficiency read out at 1KHz Deadtime < 5% 16 buffers for events awaiting decision

Download Presentation

Overview of Level 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Level 2 James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999

  2. Requirements • 10 KHz input rate • 100  sec decision nominal time budget • Reject 90% at acceptable efficiency • read out at 1KHz • Deadtime < 5% • 16 buffers for events awaiting decision • Flexibility in trigger configuration

  3. L2 Trigger Cal e / j / Et CAL CAL (w/o STT) PS FPS/CPS PS Global L2 CFT/CPS CFT Track SC Muon Muon MDT PDT L2: Combines objects into e, m, j L1: ET towers, tracks consistent with e, m, j

  4. Architecture • 2 to 3 stage stochastic pipeline 100 sec/stage • Preprocessors for individual detectors • Global processor to combine detectors • 128 trigger conditions (1 to 1 with L1) • each programmable • series of conditions (e, j, ) and cuts (ET > 20) • 16 buffers in Front end (real events) • 16 buffers in front of preprocessors, global • Busy raised by Front Ends • Hardware frame drives readout

  5. Queuing Simulations: Effective Time Budget • 50-75 sec/stage: tails of processing time • RESQ + standalone to check simple cases • Preprocessors: not event synchronous • avoid worst of n distribution • Need the buffers in front of each element • Avoid long tails in processing time • Farms feasible only if surrender event order • our front ends required order preservation

  6. Inputs VBD MPM L2 HWFW (Global only) 128 L2 Answer Standard Crate MBus VME TCC Wo r ker Admin MBT SCL Outputs to Global (preprocessors only) L3 7 VME slots minimum up to 5 workers per crate short (non-CDF) MBus Dec Alpha (Unix) JTL, MSU 12/18/97

  7. Standard Crate VME Slot Assignments • 1: Bit3 (Crate Controller) no J3 (1 slot) • 2: VBD (2 signals from J3 to Admin) • through hole in blank MBus • 3-6 J3 connector for VTM • up to 4 FIC’s, or any non-MBus cards (SLIC/SFO) • 7-21 J3 Magic Bus: • 20-21 Administrator (all Alphas 2 slots) • 19 Pilot MBT (preproc. : 1MBT for 2 Workers) • 18 down [Assistant MBT as needed] • [need 1 MBT per 2 Workers for output] • 7-8 up up to 5 Workers (or non-MBus cards)

  8. Alphas • Up to 1 GIP Alpha 21164 on VME card • small local disk for bootup • Enet to Dec Unix Alpha for user .EXE, debugging • Most MBus I/O via MBT card • MBus DMA input 80-100 MB/s (Input “for free”) • MBus bi-directional programmed I/O 20 MB/s? • preprocessor output to Global • but interprocessor communication w/o MBT • 2 per crate • Worker formatting, Output to Global • Administrator housekeeping, L3 R/O

  9. Alphas, continued • VME for L3 readout, monitoring, downloading • 32 bits ECL output • scaler gates for monitoring in L1 Scalers • available even if alpha crashed to tell states • J2 Inputs • miscellaneous communication • e.g. “you have a message from MBT”

  10. MBTMagic Bus Transceiver • VME slave; MBus master and slave • Administrator controls card(s) • 7 Cypress Hotlink inputs • 16 MB/s each (Gigabit Ethernet UTP) • broadcast to Alphas (Workers & Admin) on MBus • normal data Input path • 3 Cypress Outputs • 2 Preprocessor outputs to L2 Global • 1 Echo of L1 SCL info

  11. MBT, continued • Serial Command Link (SCL) Receiver • broadcast L1 to Alphas on MBus • synchronization check • L1 Qualifiers (basic info on handling events) • echo’d on Cypress output for SLIC • Queue L2 accept/rej for Administrator MBus reads • Parallel Output (16-128 b) • Global uses to send L2 decision to L2 HWFW • handy for monitoring/debugging

  12. Other Cards(Not unique to L2) • Bit3 is commercial VME interface • multiport for indirect communication with TCC • parameter download, monitoring, error logging • VBD is standard DØ VME Readout to L3 • tolerable constraints on how Alphas read out • Forces interprocessor communication to MBus

  13. Bit3 MPM • Commercial; fiber optic connection • To PCI of a PC; VME master,crate controller • Add Multiport Memory Module • Perform general VME I/O, generate interrupts • Download parameters for run • Run begin/end commands • Collect Monitoring information • preferably, already placed in MPM by Administrator Alpha • If necessary, can collect from other modules

  14. VBD • Standard DØ card • VME Master to read out to L3 (standard card) • Not interruptible during Readout • Probably 10-20 MB/s effective (more?) • Must read from SAME set of VME addresses every event • intent is readout from Worker Alpha • move data, or map to actual location • some wordcounts may be zero • faster if fewer addresses

  15. Standard Crate Uses • Global JUST Standard Crate described so far • Cal: more workers • Standard Crate can also be used with non-Alpha, non-MBus pre-preprocessor • Cypress inputs to Worker via MBT • format, massage data for Global • handle L2, L3 buffering & I/O, most of monitoring • Completely standard data movement software • User code testable once data structure fixed • Penalty: extra latency (lose a buffer) • 3-stage pipeline as in L2Mu, L2STT

  16. L2 Inputs • Cypress Hot Links 160 Mbit/s UTP • well-defined protocol • begin, end event special characters • compatible with muon (except cable: CIC) • Standard L2 header/trailer defined • some header info repeated in trailer • allows more error detection/correction • Hardware Longitudinal Parity Check in trailer

  17. L2 Header B0 # objects (NOT IN HEADER) [note 255 max!] B1 Header Length in 4B words (1B) [=3 for default] B2 Object Length in 4B words (1B) [ALL same size!] B3 Header/Trailer Format # ( hi 3 bits) [ONLY changes if new format] Object Format # ( lo 5 bits) [ONLY changes if new format] B4 Data Type # (1B) [unique in all L2 MBT inputs] B5 Bunch # (1B) B6-7 Rotation# (2B) [B6 is LSB of rotation] B8 Algorithm Major Version (1 B) [e.g. 7 from Version 7.1] B9 Algorithm Minor Version (1B) [e.g. 1 from 7.1] or Processor Specific Bits (1B) [esp. if hardware data source] B10 Processor Specific Bits (1B) B11 Status Bits [b7 on means some error] [some standard for L2 Proc]

  18. Standard Status Bitsb7, b0 for all; others if L2proc 7 error on event (any kind): use at own risk 6 no processing attempted (none required) 5 object list truncated (any reason) 4 Receiver error on some input physical trailer 3 2 1 more data-type info (processor-specific) other test modes; unbiased-sample data... 0 0 for real data, 1 for MC data

  19. L2 Trailer B0 Bunch # (1B) = B5 of Header B1 Data Type # (1B) = B4 of Header (Swapped even/odd from Header) B2 Longitudinal Parity of even Bytes B3 Longitudinal Parity of odd Bytes or--if parity too slow to calculate, Turn # (B6-7 of Header) MBT Out, SLIC, FIC will append physical trailer with 8-bit hardware-generated longitudinal parity Zero padding to 16 B group FOLLOWS trailer, before End of Event

  20. L2 Physical Trailer • FIC, SLIC, MBT Out: add a physical 2B trailer • after logical trailer, before End Event • This BREAKS 16B boundary, but handled by MBT • B0 8 bit longitudinal parity of received data • B1 Status Bits [b7 on if any receive error] • not included longitudinal parity! • b0, b1 are type ID: 0 = FIC, 1 = SLIC, 2 = MBT • MBT inputs place this in B0, B1 of 16B physical trailer • adds B14, its own longitudinal parity of everything received • B15 its own Error Bits [b7 on if any receive error] • reserves 4B for incoming, may give error locations in B4-13 • MBT Outs produce 2B physical trailer like FIC

  21. SLIC:Serial Link Input Card • 16 Cypress serial inputs • 1-slot VME slave card • 4 TI DSP’s, up to 2 GIPs each • more inputs, CPU / slot than Alpha • output via Hotlink to MBT (avoids VBD R/O) • Readout via Worker Alpha via MBT • Acts as pre-preprocessor • test registers on all inputs (eg. SCL) • NO MBus! (big simplification)

  22. SFO: SCL Fanout(Really: Cypress Fanout) • Receives L1 SCL information • from MBT as Cypress Hotlink • Fans out as Cypress output to 12 SLIC cards • event synchronization • L1 Qualifiers • purely analog fanout • can be used to fan out any Cypress signal • L1HWFW messages to L2 • potential use in L2STT?

  23. Inputs VBD MPM Standard Crate with SLIC MBus VME TCC Admin SLIC Wo r ker SFO MBT SCL Outputs to Global L3 10 VME slots minimum Inputs Dec Alpha (Unix) JTL, MSU 12/18/97

  24. Fiber Input Converter (FIC) • Convert Fiber Input to Cu Cypress 160 Mb/s • G-link input 16b data in 20b data frame (24b total) • input thru J3 by standard VTM (hard G-link engineering done) • implement g-link input via VRB card • allows passive split for fanout to L3 or STT • adds physical trailer with longitudinal parity • Front end to either SLIC or MBT • avoids variants of complex card • used in L2PS, L2Cal, L2CTT • 4 independent channels per card • VME control, monitoring

  25. FIC: L2CFT from L1 CFT trigger (& L1 Cal) • g-link 1.3Gb/s = 106MB/s • 16b=2B data in 24b frame, frames at 53MHz • L1CFT: 100B (50 tracks)/fiber to STT in 1 s • standard L2 header • trailer includes 2B longitudinal parity • pad w/ trailing zeros • L1Cal: • similar format, fixed-length data • optical split from data for L3 readout

  26. VBD MPM Standard Crate with FIC to MBT MBus VME TCC Admin Wo r ker FIC MBT SCL Outputs to Global L3 9 VME slots minimum Inputs Dec Alpha (Unix) JTL, MSU 12/18/97

  27. Inputs VBD MPM Standard Crate with FIC to SLIC Inputs MBus VME TCC Admin SLIC Wo r ker FIC SFO MBT SCL Outputs toGlobal L3 11 VME slots minimum Dec Alpha (Unix) JTL, MSU 12/18/97

  28. Trigger Connections L2STT (In Design) Si Trker 288 Fi-Glink 1. 3Gb/s, 20-bit Undetermined 6 12 Fi-Glink 1.3Gb/s 20-bit L2CTT (FIC/MBT) Broad L1 CFT 2 6 4ax,2st: 6 FE Fi-Glink 1.3Gb/s 20-bit L2PS (FIC/MBT) L1 FPS Broad Broad 2 FE FE Cu-AMCC 1.4Gb/s L2G (MBT) 4 96 Cu-AMCC 1.4Gb/s L1 m MGR Cu-Cyp 160Mb/s L1 m 3 Fi-Cypress 160Mb/s 1 ~280 3 Cu-Cyp160 Mb/s L2m (SLIC/MBT) MUON Cu-Cypress 160 Mb/s 2 CIC ~150 ~150 L2CAL (FIC/MBT) L1 CAL 4 Fi-Glink 1.3 Gb/s 20-bit 10

  29. L2 Bandwidthand budget/event @ 10KHz

  30. Loading of Paths L2STT (STC/MBT) Si Trker 288 6 6 1248/208 (max) 600/300 L2CTT (FIC/MBT) L1 CFT Broad 2 6 1632/272 (max) 4ax,2st: 6 FE L2PS (FIC/MBT) 1088/272 (max) L1 FPS Broad 2 FE L2G (MBT) 4 96 L1 m MGR L1 m 3 1 ~280 3 L2m (SLIC/MBT) MUON 5000/100 2 CIC ~150 ~150 4 L2CAL (FIC/MBT) L1 CAL 3200/350 10 240/40

  31. % Capacity used

  32. L2 Overview II, and Summary James T. Linnemann Michigan State University Level 2 Review Feb 6, 1999

  33. L2 Maximum Event Sizes(FIFO size choice) • Length = 16B(min) … 4KB (max) X 16 events includes 12B header and 4B trailer source pads to multiples of 16B with zeros after trailer • VRB: 32KB or 64KB, but currently no raw data to L2! • 5 KHz max (Cypress) is 16B/s X 200 s =3.2KB • clearly issue of max, not mean! • Actual Max EventFIFO “event” total FIC/CFT/PS 272B .5KB 8KB Cal/MBT 304B 4KB 64KB Mu/SLIC .3 to 3KB .5KB 8KB Global/MBT 2.3KB 4KB 64KB =255 tracks*8B (255*16B = 4KB = STT?)

  34. SCL INITIALIZEwhy we avoid it • Needed if event fragments don’t match • must clear all buffers EVERYWHERE and restart • violent: touches EVERY front end crate • Avoidance: • redundancy header to trailer (protect 1-bit errors) • try to preserve event format (to find trailer) • try to preserve event boundary (else must re-init) • detect missed event boundary (end or begin) • send pads before End Event to reframe if needed

  35. Monitoring(online, data flow) • Every 5 seconds, via TCC/administrators • Some by L1 Scalers, some by VME • L1 Scalers available even if alphas crash • buffer occupancy for data flow diagnosis • lots of buffers, need to be able to look at them • in all cards owning buffers: FIC, SLIC, MBT, Alpha • but DMA: most events into alpha’s buffers • time in state (like L3 in Run I) in all alphas • idle, processing, waiting, interrupt... • Global’s pass fraction by bit #; events vs node

  36. L1 Scalers for Alpha States • ECL Gates: sampled every beam crossing • Worker States (5) • wait/event, process, wait/admin, interrupt, collecting_status • Admin States (6) • wait, reply/worker, manage_L3, interrupt, L2_Acc/Rej, collecting_status • multi-workers: wait more complex: • wait/event, wait/worker, wait/L2_Acc/Rej + Processing • Shows time fraction in state • thus <time> in state

  37. Alpha Buffer Monitoring(L1 Scalers) • Alpha should be where events stack up • Binned histo: # events in each buffer state • buffer states monitored in Administrator • allocated, processing, wait/L2_Acc/Rej, wait/L3 • to be allocated to worker n • free • Each L2 crate: 22 scalers/admin + 1/worker

  38. VME (“Slow”) Monitoring(Via Bit3, TCC) • Exact event accounting from ALL cards • evaluated after marked event (CollectStatus) • All non-Alpha Cards in L2 Crate • MBT, SLIC, FIC • State and Buffer Occupancy Sampled on board • lower statistics, perhaps 500 Hz • Alphas can monitor distribution of event times • Circular buffer of start/stop times (Histogram on host) • 1 CPU cycle (2ns) resolution possible • same mechanism for any state duration • or counts sampled by event (e.g. # tracks)

  39. Monitoring:Event Samples • Histograms of objects found (Examine) • few (.2 Hz) Unbiased Sample (No L2 cuts) • mostly fails • few more events before L3 cuts (after L2) • passed L2 • mostly after passed L3 • Verification: run simulator on these samples • check data arrives intact • bit by bit compare with online L2 results • detects hardware, history bugs (nearly only way)

  40. Test Stand at FNAL • 4 crates: • Global simulator (Admin + Worker) • 2 preprocessor simulators (A+2W, A+W+Slic) • 1 data source (2alphas, MBT’s; own MBus) • Incomplete system-- • no L1, L2 • not enough parts for full code of any/all crates • except maybe full playback for Global • could reconfigure if need be--painful! • Copy of some real-time inputs? (grounding!?)

  41. Test Stand: What can it do? • (Pre-)Commissioning/debugging • alpha-alpha and alpha-MBT issues • Timing, verification of download • run in real environment; count clock cycles • how good is offline simulator? • Playback • drop data into memory • testing pre-release after running in simulator • Debugging • event dump and restart (else debug = deadtime) • hard to write event dump/reload!?

  42. Zvtx? • Zvtx in 6cm bins from L0? • actual resolution varies with luminosity • IF felt to be worthwhile at L2 resolution • better to know Z better than Z=0 • or avoid making mistakes and possible L dependence • studies in progress • L2STT also considering mechanisms • candidate vertex Z’s, then algorithm to report one • better intrinsic accuracy than L0 • different luminosity dependence than L0

  43. How does L2STT fit in? • Well defined protocol allows it feed into a preprocessor via MBT • like SLIC’s do in muon crate • Send data to L2CTT crate • pt-ordered list and impact-ordered list to Global • just pt-ordered from L1CFT, lower resolution • extra input already reserved • add 1 MBT and alpha to L2CTT crate • allows simultaneous input of L1CFT and L2STT • can build two kinds of lists in parallel • modest cost; $8K (or use spares / cannibalize test stand) • can run in parallel until shaken down

  44. L2 STTimpact on L2 performance • Heavier loading on new CTT inputs • 16 B per track? Duplicates? • Heavier loading of pt ordered list? • More Bytes/track? But can reject tracks, too! • New output: impact parameter ordered list • already included in bandwidth estimates • More work for Global? • Yes, but controlled by scripts • Must limit # tracks used in matches! • STT can only give higher quality

  45. Budget • Detailed Cost Estimate Exists • Latest update is down about 80K$ (SLIC) • nearly back to original estimate • L2STT loaner crates, L2CTT upgrades • included in above estimates • engineering costs not over till it’s over • how many extra alphas as insurance? • may be hard to do a 2nd run

  46. How Many Alphas? • Roughly X 2 design safety factor (10KHz) • more alphas are only lifeboat too slow • but gain is not linear, maybe square root • cannibalize test stand (IF it becomes unimportant!) • Other uses for more alphas: • Shadow nodes (online test at high statistics) • where? Real crate or test stand? • potential use in STT? • Production Alpha order in next few months • have requested 2 X for some obsolescent parts

  47. Schedule • Detailed scheduled exists • SLIC now on critical path • Rule: L2STT can’t compromise schedule • Prototypes due March-May 1999 • Production May-Dec 1999

  48. Issues • L2STT design decisions (and money…) • Prototype to production to installation • Transition to software • simulation needed (decisions, studies, development) • Low level software (“drivers”) • finish download path, L3 output path • Manpower crunch coming • Monitoring, verification, releases just starting • infrastructure, definitions needed soon, then people... • who writes global algorithms? (MSU, probably?) • studies of trigger scripts, global algorithms

  49. Conclusions • Solid, modular design for L2 trigger • connectivity understood; prototypes in progress • clear method for L2STT to integrate • Time budgets understood from simulation • Hardware supports appropriate algorithms • Have the manpower for the hardware • Have excellent core group for the software • Request TDR approval: • go-ahead for production

More Related