390 likes | 506 Views
Run Coordinator Report on behalf of everybody involved in Pit Operation . First, 2010 is a major achievement! THANKS EVERYBODY! . Seasonal Vacation!?. All lot of work in a relatively short Winter Stop! Not just a pit stop for tire exchange but rather engine overhaul….
E N D
Run CoordinatorReporton behalf of everybody involved in Pit Operation First, 2010 is a major achievement! THANKS EVERYBODY!
Seasonal Vacation!? • All lot of work in a relatively short Winter Stop! Not just a pit stop for tire exchange but rather engine overhaul… Was it just a near miss to disaster? • NO! Far from, but we are not over the hill for 2011
2010 Challenges – Extreme conditions • Operational objective in retrospect: • Explore on LHCb physics potential • Explore and tune detector, trigger and readout performance • June MD to go to nominal intensity and THEN increase number of bunches very beneficial but a lot of uncertainty in the luminosity (evolution) per bunch • 80% of design luminosity reached with 344 colliding bunches instead of 2622… Average number of visible interactions per crossing ? LHCb Design Specs • Faced with preparations without knowledge about the ultimate parameters • Cannot formulate running conditions and operate this way next year July September October August
2010 Challenges - Commissioning • 89 physics fills • Very limited day-time to commissioning properly trigger and tuning with non- CERN based experts/developers and continued increase of pile-up/bunches
Global Operational Performance • Main source of operational difficulty • Changing ‘surprise’ conditions rather than extreme conditions EFF Upgrade! • CMS: 43.2 pb-1 / 47.0 pb-1 : 91.2% • 84% usable by any analysis • >92% for muons only • Atlas: 45.0 pb-1 / 48.2 pb-1 : 93.6% • 93 – 98 % efficiency • Except a few one-off problems and “shock-m” • Most luminosity delivered with largest geometrical reduction factor • We only got 42 pb-1 delivered out of the promised 50 pb-1
Luminosity Discrepancy • Systematic luminosity difference IP1/5 and IP8 – Not understood • Geometrical factor • July – August: LHCb 2x270 mrad 8-9% as compared to Atlas/CMS with 0mrad • B up+aext: LHCb 2 x (270 – 100) mrad 3% as compared to Atlas/CMS with 200 mrad • B down+aext: LHCb 2 x (270 + 100) mrad 9% as compared to Atlas/CMS with 200 mrad • Normalization – work starting up to normalize via Alice • b* / Waist effect? Observations of strange geometrical effects during scans B down July September October August • Will not be an issue 2011 as soon as we reach our maximum total luminosity • 2*1032 – 5*1032 cm-2s-1
Trigger Compromise • We received individual requests, complaints, and praising for our struggle • It wasn’t always so easy, many PPG-OPG evening and weekend email exchanges to understand and find best solutions HLT Rate (t0) ~ 2.5 kHz Luminosity (t0) ~ 1.5E32 Mu (t0) ~ 2.5 Trigger deadtime (t0) ~ 1% L0 Rate (t0) ~ 350 kHz TCK change 2.2 pb-1 19.1 pb-1 12.7 pb-1
End of Fill Procedure Beam Dump handshake • Lost almost 0.8 pb-1 between beam dump warning and actual dump in total! • Modification • Movable Device Allowed flag will become “TRUE” also in BEAM DUMP mode • Dump handshake remains the same • But we no longer “protect” the VELO by dumping the beam if the VELO is not in garage position when LHC intends to dump the beam…. • May still retract VELO but more room for flexibility in software • INJECTION and ADJUST logic remains the same obviously Total 29 handshakes out of 58 fills Sum of difference Total – LHCb = 100min Luckily most fills were lost! ;-) DT(WARNING READY)[min] LHCb
Normalized Fill Efficiencies • All fills normalized to 1 pb-1 94% 233 bunches High luminosity with high mu Commissioning trigger nominal bunches SD DAQ problem during 1.5h fill EFF Upgrade Detector Safety System Commercial Hardware fault July September October August
Event Filter Farm ‘Real-Time’ Upgrade 50 subfarms of 19 nodes • 100 servers x 4 farm nodes • Configuration/Start Run >20min 6min by custom-made NFS • Installed and commissioning in three days 5-8 October, fully ready for fill 1408 • 50 Subfarms with two new servers (= 2x4 farm nodes) installed in each • 19 farm nodes/sum-farm • 4 low-end with 8 trigger tasks, 7 middle-end with 12 tasks, 8 high-end with 20 tasks • Summary= 950 farm nodes with triple farm capacity • Another 100 x 4 farm nodes during winter stop
Operational Difficulties 2010 • Main sources of operational inefficiencies in short • Changing conditions rather than extreme conditions • Lack of real knowledge about luminosity evolution • Shifter experience and instructions (not the fault of the shifter!) • Operational Parameters and system limits • Trigger rate • CPU consumption • Trigger optimized for m ~ 1.6 and 350 bunches AND OLD FARM (luckily…) • Event size • One bottle neck was hiding another (HLT CPU L0 bandwidth) • Detector stability • HV trips • HPD disabling • Wrong configuration • Desynchronization • OPC servers • … • Diagnostics tools, diagnostics tools, diagnostics tools!
System Performance – School Example • Impressive! • Many things to analyze, understand and tune • In particular with the complete farm • We “lose” some nodes during running • I.e. For some reason ODIN doesn’t receive their event requests all of sudden TCK change Trigger Livetime Luminosity Event request rate System Latency Lost nodes O(%) Available Farm Nodes Destination Search Time Fill 1453
Global Operation Observations Readout 2011 • 1 MHz L0 readout • Only proven on “paper” up to now (partially and momentarily with idle system) • Loaded system has a VERY different behaviour as observed already 2010 • L0 bandwidth per TELL/UKL1: Work in progress Test • Readout network and storage bandwidth: Should be OK but recabling and additional switch • CPU capacity: Extensive testing with 2011 trigger • Challenge and work intensive for the next 6 months • “Trigger Boundaries” reached within 6 months • Load balancing: Monitoring and diagnostics System Performance Overview panel • We need time to test all of this extensively! • Running at limits • Rate and event size (= potential deadtime) influenced by beam orbit variations with displaced beams, background (e.g. beam-gas@vacuum), and de-bunching, … • Careful about running at margin
Global Operation Observations Farm management and Controls • Configuration speed consolidation • More dynamic farm control needed Majority logic in FSM on CONFIGURE and START RUN to go to READY/RUNNING • 10-20% is sufficient to start and prepare the rest on the fly similar to recovery mechanism • On CONFIGURE, de-centralize FSM logic to allow nodes to continue from state OFFLINE READY independently of the state of the other nodes • Reconstruction and Monitoring Farm not needed either to (start) take data, only to take GOOD data If in trouble get them going once data taking has already started • Monitoring of incomplete events by counters (in e.g. Node Status panel) • Farm system performance overview • Farm log messages: Global limit per message for entire farm and not per node…. • More (proper) use of Message Levels
Global Operational Observations Trigger • Operational (functional) diagnostics • Performance monitoring • We really don't have many knobs, HLT we have to do more in shorter time than before Support for detector performance monitoring • All sub-detector scans with beam • Some should be performed regularly (every n pb-1) • Devise proper scheme for each and combinations • Permanent Trigger Configurations (with downscaling) • Express needs in terms of integrated luminosity • Regular scans must be supported by automated recipes • Work and testing during shutdown for the recommissioning in March high priority! Data quality • To good in 2010? • Ad-hoc treatments of trips and other problems • “Should” become an issue in 2011… • Need to be attentive and have the tools and improve feedback • Watch closely experimental conditions and detector effects • RMS – Radiation Monitoring System should become important 2011 • Proposal for new back-end readout to replace VME scaler • TFC HUGIN (throttle-OR) is a very flexible hi-speed multi-channel board!
Approach to Running Conditions 2011 • Note on “decision” about m and L: • m has mainly hard limits – rates*event size, CPU time, reconstruction etc • L has soft limits – detector stability • Unknown domain of detector operation and unknown domain of accelerator operation • Optimize d/dmShi*eOP1/2 * [s/b1/2] = 0 for physics output • where hi importance factor for a specific physics analysis • Operational stability eOP = eDAQ+ edead-time > 95% • Of course we should also be able to store and process events in reasonable time • Ageing – No problem 2010 • Not necessarily a problem if we assume linear relation with particle flux and we collect more usable luminosity in shorter time • LHCb lifetime is integrated luminosity not years Focus on understanding of ageing mechanism and prognosis • Technical ambition 2011 • 2011: Luminosity increase 2-3x (In 2010: 500x between July and November) • Operationally aim for m~2.0 – 2.5 • Total luminosity 2 – 5 x 1032 cm-2s-1 3-4 x 1032 cm-2s-1 realistic my feeling from last year • Main consequences • Careful to run at limit of capacity • Manpower to monitor and follow-up on experiment conditions and detector effects • Regular scans to understand ageing/detector effects and the associated luminosity penalty • Pre-prepared extreme and liberal alternative trigger configurations allowing for flexibility
Luminosity Leveling by Collision Offset X (IP t=0) • Luminosity leveling applied several times during 2010 • First time on July 17 and July 18 • In the steps between trigger configurations • Followed bunch behaviour with VELO/BLS and no sign of problems • Two beam stability tests done • 152 bunches x 1E11 @ 150ns up to more than 1 sigma • 100 bunches x 0.9E11 @ 50ns up to 6sigma • Tests with several 100 bunches and high intensity not done Last but most important consequence: Luminosity leveling is crucial to run LHCb at optimum luminosity 2011
L0 Rate Variation • L0 rate sensitive to many effects • Collision offset Orbit variations of 20% - 25% of beam sigma up to 10% in rate • Background such a beam-gas • Luminosity control communication and application and information latency L0 Rate vs mu Luminosity reduction vs sigma L0 Rate vs sigma
Beam-Gas and Vacuum • No visible effect of any increase vacuum in LSS8 during 2010 • Sensitivity at L0 trigger • Expect a rate of potentially visible (one track in cavern) beam-gas in LHCb at normal pressure of 1E11/1.6E14 * 20% * 11.245 kHz = 1.4Hz/bunch • L0 selection efficiency 3.1% 16 Hz @ 368 bunches • Also, increased probability to accept single MB event when accompanied by beam-gas • For MB events with no pileup L0 selection 3.8% 6% • With high pileup effect is less visible Estimated O(10Hz) • At nominal pressure few 10 Hz of beam-gas at 368 bunches • Even increasing pressure by 100x is a no worry • Increasing vacuum pressure locally will only have a partial effect • BUT it adds to particle flux (detector stability and occupancy)
L0 Rate Impact on Deadtime • Pure L0 Rate limited by the “L0 Derandomizer” readout scheme • 1 clock cycles to put event in • 36 clock cycles to read event out : (36*25ns)-1 = 1.1111.. MHz • 16 deep • Common specs emulated by ODIN to regulate L0 • Upper water mark 16 events, lower water mark 15 events However, write/read controllers more complicated • Exception to global specifications: • OTIS chip of OT – Proper emulation in ODIN all 2010 • Beetle of VELO and ST – Work in progress • Consequence of no Beetle emulation • Upper water mark 8 events, lower water mark at 3 events From L0 Pipeline on L0 accept Write/read controller To TELL1/UKL1
Derandomizer & L0 Rate & Filling Schemes • Deadtime effect of running at high rate with few bunches • Deadtime worse with fewer bunches!! 50ns , 600 colliding bunches 50ns , 800 colliding bunches PHYSICS TRIGGER DEADTIME 25ns , 2440 colliding bunches 75ns , 670 colliding bunches
Injection – LHCb a Sitting Duck • Injection Losses from un-captured beam • Already difficult in 2010 with 0.3% • Expected to get worse 2011 with up to 1% • Culmination on October 30 with 8b-injections • Shot blew a fuse in CALO HV distribution! • 30% BCM levels agree with 30% BLM levels • We almost became show stopper • Immediate actions • LHC: SPS 800 MHz cavity problem and SPS scraping • LHCb: Disable 40ms logic during injection phase and raise thresholds (2x-3x) • Done in a few hours • LHC: Investigate using shifted Abort Gap Cleaning during injection • Check timing and origin of splashes with Beam Loss Scintillators Improved situation significantly and took us through the year B Beam 2 from SPS (TI8)
Injection –Actions 2011 • Actions for 2011 • Switch off/lower HV AND LV of sensitive detectors in LHCb during injection • Complicated since we need to configure and run LHCb WELL BEFORE next data taking • Requires quite a lot of work on DAQ and CONTROL • Check timing and origin of splashes with Beam Loss Scintillator and BCM • Injection Quality information from BLS+BCM fed back to LHC on each injection • Shielding being investigated together with machine • Blind BCM during the injection shot using Injection Pulse on direct fibre from RF • No relaxed attitude… SPS satellites LHC uncaptured
Injection Schemes – Just an Idea • For 75ns • ~100 (8) + 4 x (24) • ~200 (8) + 8 x (24) • ~300 (8) + 12 x (24) • ~400 (8) + 8 x (48) • ~500 (8) + 8 x (48) + 4 x (24) • ~600 (8) + 12 x (48) • ~700 (8) + 8 x (72) + 4 x (24) • ~800 (8) + 8 x (72) + 4 x (48) • ~900 (8) + 12 x (72) • For 50 ns it will be similar progression to max 1400b (!!), maybe something like: • ~100 (12) + 8 x (12) • ~200 (12) + 16 x (12) • ~300 (12) + 8 x (36) • ~400 (12) + 12 x (36) • ~5/600 (12) + 8 x (72) • ~700 (12) + 8 x (72) + 4 x (36) • ~800 (12) + 12 x (72) • ~9/1000 (12) + 8 x (108) + 4 x (36) • ~1200 (12) + 12 x (108) • ~1400 (12) + 12 x (108) + 4 x (36)
LHCb Re-commissioning Plan 2011 PRELIMINARY • Luminosity ramp: back-up to 300 in 50 bunch steps with 75ns • 3 weeks 2-3 days per step
Bunch Ramp Up • From Mike Lamont: • 2 to 3 weeks re-commissioning • Virgin set-up followed by full validation (loss maps, asynchronous dumps etc.) • 2011 – back-up to 300 in 50 bunch steps • Would imagine starting with 75 ns • 2010 around 4 days (minimum) per 50 bunch step • 50 – 100 – 150 – 200 – 250 – 300 • Around 3 weeks to get back to 300 bunches • 100 bunch steps thereafter. • 400 – 500 – 600 – 700 – 800 – 900 • 3 weeks minimum • Ultimate parameters for 2011 (Qb:1.6E11 x eN:2E6 x Nb:1400)
Annual Shift Summary • Summary includes 2008 – 2009 – 2010 because individual function counters were not reset • Total: 7660 shifts equivalent to 13.4 months of running Number of shifts per function Shifts Number of equivalent months Months
Annual Shift Summary • Each author (507) should have contributed to 15.1 shift slots in this period • Total number of shifters: 297 Each shifter contributed to 25.8 shift slots Number of Shifters compared to Authors Number of Functions
Annual Shift Summary Pit Shifts Offline Shifts Number of Shifts Piquet Shifts
Rise or Sugar Normalized Shift Contribution
Shifts 2011 • Current shift situation (number of shifters we have had): • Shift Leader: 53 • Data Manager: 109 • Production: 36 • Data Quality: 58 • How many are still active and how many are available 2011? Poll • Answer to my mail about availability for 2011 if you are already shifter • Answer to my call for shifters at the beginning of next year • Refresher and trainings in February – March • HV training (sensibilization) and VELO closure • Improve training of SL and DM together with sub-detectors • Shifter online running instructions, helps and trouble shooting
Conclusion • A huge thanks to everybody who baby-sat, operated and nursed LHCb! • I don’t think we can repeat this enough! • Stop meeting and reporting and go back to the office to take care of Our New Year Promises • Since I only got 20 minutes for this talk, I’ll stop the conclusion here! MERRY CHRISTMAS A HAPPY END OF 2010 HAPPY START 2011 1 fb-1
Workshops on Operation 2010 • 2010 Running (Autopsy) Postmortem Workshop scope • Collect (recall!) flaws and drawbacks from 2010 operation • Hopefully with some associated solution • If not, what is needed, how do we address it? • Works and improvements during shutdown • Planning and manpower • Needs for re-commissioning and special runs 2011 • Magnet OFF data preferably at 3.5 TeV • Etc • Main worries for 2011 • Sub-detector guesstimates of luminosity tolerance • Manpower for next year • Will not summarize operational performance 2010 and whole workshop here (obviously…) • A veeery long do-list – just main points • http://indico.cern.ch/conferenceDisplay.py?confId=113227 • Revisit situation end of January – beginning February • Also reported yesterday on all aspects of operation with beam to LHC in LPC meeting • Andreas reported on desiderata for 2011 • Input to LHC workshop in Evian December and Chamonix • http://indico.cern.ch/conferenceDisplay.py?confId=111076 • See Andreas’ talk next
Detector Operation 2011 • Purely in terms of operation all depends on detector stability • Operating at 50ns • Experiment conditions • Beam-beam effects from bunch behaviour • Background (electron cloud + IBS) • VELO foil temperature and HV trips • Displacing beams (up to several sigmas) • Spill-over/signal pileup • Spill-over effects in all detectors but RICH • Event size at L0 • Reconstruction performance Short 50ns (1 fill @ 100 bunches) run allowed only partially address these • 75ns as long as possible and beneficial
Luminosity • Two online sources, with several x-checks • LHCb detector • Beam Loss Scintillator • Independent from LHCb DAQ • Auto-calibrated with LHCb detector while running • Very reliable and versatile Combination sent to LHC as delivered lumi • Applications • Injection quality • Background with high time resolution • Beam-gas rate monitoring and veto in trigger • Luminosity • Debunched beam • Upgrade of BLS during shutdown • Faster PMT(quartz)+cable no spillover and additional scintillators
Longitudinal Scan X (IP t=0) t~+dT/2 • Shifting timing of beam 2 +/-1ns Z ~30 mm (20mm with 90 mrad) ~10 mm (5 mm with 25 mrad) ~200 mm
Longitudinal Scan • Several questions about results: • Indicates something fundamental? • T0 good for VELO (Z~0) • Bad transversal optimization? • Lumi region z-size? • Should have done mini-scan after • Repeat next year! Ratio ~9% Did we loose optimization? Specific luminosity Nominal Physics Lumi region z-size decreases strongly when z<0? Luminosity
VDM Scan – Lumi Region Movements Horisontal 2 beams 6s Vertical 2 beams 6s 5mm effect from XY-rotation of 13 mrad ~90 mm (100mm with 90 mrad) ? ~40 mm (30mm with 25 mrad) 1200mm (6s @170mrad) Courtesy C. Barschel