170 likes | 316 Views
Offline shifter training tutorial. Y. Schutz for L. Betev August 26, 2009. Outline. Offline shifter basic responsibilities The shifter check list Systems and tools The dashboard (see Costin’s talk) The Shuttle (see Chiara’s talk)
E N D
Offline shifter training tutorial Y. Schutz for L. Betev August 26, 2009
Outline • Offline shifter basic responsibilities • The shifter check list • Systems and tools • The dashboard (see Costin’s talk) • The Shuttle (see Chiara’s talk) • The reonctruction and visualization package (see Marco’s talk)
Basic responsibilities – RAW data • The RAW data path DAQ online buffer @P2 Fast optical link to CERN CC 500MB/sec (p+p), 1.25GB/sec (Pb+Pb) Step A Reduced 100 MB/sec (p+p) CASTOR2 disk buffer CASTOR2 tape buffer Step B
Step A – Online buffer -> CASTOR buffer • Automatic and well-exercised (it almost never goes wrong) • At this step, the files are also registered in the AliEncatalogue • DAQ is nominally responsible for the transfers • Offline provides the registration gateway • If not working, DAQ notifies the shifter and/or the alice-shift-alarms@cern.ch expert list
Step A – Shifter responsibilities • Monitors the fill of the CASTOR buffer (through the dashboard) • Notify the run coordinator/shift leader if more than 80% full • Clear disk space following instructions received from the SL (regular and selective basis • Follow the registration of RAW (through the dashboard) • All files in PHYSICS partition typically go to CASTOR • Follow the run screen and grow suspicious if none of the runs are being registered • Contact the DAQ shifter and ask what is going on
Step B – CASTOR buffer -> Tape storage • Selective copying of runs to tape • 1/5 of RAW data stream in p+p (100 MB/sec) • Full data stream in Pb+Pb (1.25GB/sec) • The selection of runs to be copied is provided by the SL • The Offline shifter will be responsible for the copy procedure (though dashboard tools) • Also for the deletion of data from the CASTOR buffer • It will involve some automatic copying (calibration data for example)
Basic responsibilities – Shuttle • Covered in Chiara’s presentation • Here just to put it in the context of the basic responsibilities
Basic responsibilities – fast reco and event display • A quick method to check the reconstruction of data and display couple of events from recent runs • NOT a tool to do analysis • Covered in Marco’s presentation • Here just to put it in the context of the basic responsibilities
Basic responsibilities – data replication • After RAW is recorded to tape in CASTOR • A copy is made to a remote T1 centre (out of 6 possible) for custodial storage (and processing) • The replication is an automatic process, triggered at EoR • Progress is displayed on the dashboard • Beginning of data taking – automatic replication is disabled • In general – the Offline shifter should follow the replication and raise alarm in case of failures
Basic responsibilities – prompt offline processing • After RAW is recorded to tape in CASTOR + Shuttle is done • Processing is launched • The processing is an automatic process • Progress is displayed on the dashboard • Beginning of data taking – automatic processing is disabled • Lists of runs to be processed is compiled by the run coordinator / shift leader
Basic responsibilities – prompt offline processing (2) • The experiment logbook contains ‘hints’ - run quality flags • Per detector and global • The run quality flags are filled manually by the SL based on the detector QA and the reconstruction/analysis QA collected by the QA shifter • Offline shifter responsibility is to follow for all PHYSICS runs the content of the quality flags and prompt the shift leader to fill these.
Offline shifter check list • Registration of RAW (dashboard) • Periodic check of status • Follow PHYSICS runs • Ask shift leader in case of doubt • Report registration errors to on-call expert • The run copy and removal procedure • Shuttle (dashboard) • Follow on processing of all runs + global Shuttle messages • In case of preprocessor failures, escalate to (concerned) detector shifters • In case of Shuttle failures first follow the restart/debug procedures, then report to on-call expert
Offline shifter check list (2) • Fast reconstruction and event display (processing scripts on shifter console) • Periodic check of PHYSICS runs (not the entire run!) • Run reconstruction and analyse the AliRoot log files for errors/crashes • Note the above in the shifter report pages and send to alice-shift-alarms@cern.ch • Visualize periodically events in PHYSICS runs • Note ‘strange’ event characteristics in the shifter report pages and send to alice-shift-alarms@cern.ch
Run Coordination meeting • Dayshifter only • Attend thedaily (2:00PM)Runcoordination meeting • ReportaboutissuesrelatedtotheOStasksencounteredsincethelastmeeting (seeofflinelog book) • SendamailtoAIPgroupsummarizingofflineissuesdiscussedduringthemeetingandimportantnotices CCRC F2F 10/01/2008
General shifter rules • Before pressing the • Read the procedures and rules, defined for each error type • Try out the remedies • If all fails, inform the on-call expert
Offline shifter check list (3) • Data replication (dashboard) • Periodic check of replication status • Note ‘stuck’ runs – not replicated 12 hours after registration – in the shifter report pages and sent list to alice-shift-alarms@cern.ch • Prompt data processing (dashboard) • Periodic check of processing status • Note ‘stuck’ runs – not processed 12 hours after registration – in the shifter report pages and sent list to alice-shift-alarms@cern.ch • Shift report (shifter system) • At end of shift – summary of the operation and noteworthy events
Information sources for the shifter • The shifter manual – instructions • Shifter interface(http://aloshi.cern.ch) • Monitoring – MonALISA(http://alimonitor.cern.ch/) • Dashboard • Shuttle • Processing and data management