150 likes | 324 Views
Offline shifter training tutorial. L. Betev February 19 , 2009. Outline. Offline shifter basic responsibilities The shifter check list Systems and tools (separate talks) The Dashboard The Shuttle Offline Shifter Information System Event display. Basic responsibilities – RAW data.
E N D
Offline shifter training tutorial L. Betev February 19, 2009
Outline • Offline shifter basic responsibilities • The shifter check list • Systems and tools (separate talks) • The Dashboard • The Shuttle • Offline Shifter Information System • Event display
Basic responsibilities – RAW data • The RAW data path DAQ online buffer @P2 Fast optical link to CERN CC, maximum rates: 500MB/sec (p+p), 1.25GB/sec (Pb+Pb) Step A Reduced 100 MB/sec (p+p) CASTOR2 disk buffer CASTOR2 tape buffer Step B
Step A – Online buffer -> CASTOR buffer • Automatic and well-exercised (it almost never goes wrong) • At this step, the files are also registered in the AliEncatalogue through a gateway • DAQ is nominally responsible for the transfers • Offline provides the registration gateway • If not working, DAQ/SL notifies the shifter and/or the alice-shift-alarms@cern.ch expert list
Step A – Shifter responsibilities • Monitors the fill of the CASTOR buffer (dashboard) • Notify the run coordinator/shift leader if more than 80% full • Clear disk space following instructions received from the SL • Follow the registration of RAW (dashboard) • All runs in PHYSICS partition are typically written to CASTOR • Follow the run screen and grow suspicious if none of the runs are being registered • Contact the SL and ask what is going on
Step B – CASTOR buffer -> Tape storage • Selective copying of runs to tape • Part of the p+p data stream (depends on the acquisition rate, max 100MB/sec) • Full data stream in Pb+Pb (1.25GB/sec) • The selection of runs to be copied/removed is provided by the SL • Offline shifter is responsible for the copy procedure (dashboard) • And for the deletion of data from the CASTOR buffer
Basic responsibilities – Shuttle • Covered in Shuttle presentation • Here just to put it in the context of the basic responsibilities
Basic responsibilities – event display • Covered in Event Display presentation • Here just to put it in the context of the basic responsibilities
Basic responsibilities – data replication • After RAW is recorded to tape in CASTOR • A copy is made to a remote T1 centre (out of 6 possible) for custodial storage and processing • The replication is an automatic process, triggered at EoR • Progress is displayed on the dashboard, the shifter follows the transfers and reports problems • Presently (muon/calibration runs) – automatic replication is disabled
Basic responsibilities – offline processing pass 1 (at CERN T0) • After RAW is recorded to tape in CASTOR + Shuttle is done • Processing is launched automatically • Progress is displayed on the dashboard • Automatic processing – only for PHYSICS runs • Detector calibration runs are processed on request • The Offline shifter (if asked by detector groups/run coordination) collects the run numbers and writes them in the shifter report
Offline shifter check list • Registration of RAW (dashboard) • Periodic check of status • Follow PHYSICS runs (start/stop in DAQ logbook) and registration to CASTOR • Ask shift leader in case of doubt • Report registration errors to on-call expert (list of experts in aloshi) • Run copy and removal procedure (dashboard) • Shuttle (dashboard) • Follow on processing of all runs + global Shuttle messages • In case of preprocessor failures, escalate to (concerned) detector shifters, note in shifter report (aloshi) • In case of Shuttle failures first follow the restart/debug procedures, then report to on-call expert
Offline shifter check list (2) • Data replication (dashboard) • Periodic check of replication status • Note ‘stuck’ runs – not replicated 12 hours after registration – in the shifter report pages and sent list to alice-shift-alarms@cern.ch • Data processing pass 1 (dashboard) • Periodic check of processing status • Note ‘stuck’ runs – not processed 12 hours after registration – in the shifter report pages and sent list to alice-shift-alarms@cern.ch • Shift report (aloshi) • At end of shift – summary of the operation and noteworthy events
System Run Coordination meeting • Evening shifter only • Attend the daily @17:30 System Run Coordination (SRC) meeting • Prepare and present a 24-hour Offline status report • Template for the report is given in aloshi
General shifter rules • Before pressing the • Read the procedures and rules, defined for each error type • aloshi has a search feature, use it to look for similar problems and solutions • Try out the remedies • If all fails, inform the on-call expert
Information sources for the shifter • The shifter manual – instructions • Shifter interface(http://aloshi.cern.ch) • Monitoring – MonALISA(http://alimonitor.cern.ch/) • Dashboard • Shuttle • Processing and data management