80 likes | 229 Views
Analysis Framework - status. Andrei Gheata , Mihaela Gheata , Andreas Morsch ALICE Offline Week – 16 Nov 2010. Few words of introduction. The AF provides handful tools and freedom to do analysis at large scale
E N D
Analysis Framework - status Andrei Gheata, MihaelaGheata, Andreas Morsch ALICE Offline Week – 16 Nov 2010
Few words of introduction • The AF provides handful tools and freedom to do analysis at large scale • … but the resources allowing to do that are stretched to the limits while operated in emergency + chaotic mode. • Tolerance limits had to be introduced, but these cannot prevent all kind of misuses or abuses • … so everybody is kindly asked to follow few basic rules: • Read the documentation – it is better to understand than copy code with bugs • Develop tasks locally, not on CAF/GRID • Check not only if your task works, but also how much resources it needs (memory, CPU time) • Do NOT process entire productions for a single task – if you need to do that it is maybe the time to move the code in AliRoot so that it can be run in a central train – talk to your PWG software coordinators • Use par files only if you have to and DO NOT use the framework par files (ESD,AOD, ANALYSIS*,CORRFW). The framework supports compiling your task against the core libraries.
AliEn handler – staged merging • Previously merging done accessing remote files from AliEn • In groups defined via SetNMaxMergeFiles, resuming supported, but scaling as Nfiles • New SetMergeViaJDL, supporting merging in stages in alien • Number of files per chunk set using the same SetNMaxMergeFiles • Sending Nfiles/Nper_chunk jobs for each merging stage, scaling as log(Nfiles) • When running locally the plugin in “terminate” mode, the output of the merging jobs for the current stage is checked and the meging jobs for the missing outputs are resubmitted • NOTE: check in aliensh that all merging jobs are in a final status.! It is preferable to resubmit failed jobs from aliensh. • If using a list of runs or run ranges, the final merging step will merge the outputs per run on the local client and run Terminate for the connected tasks • Possible improvement: create a collection of the output files, then merge according the splitting • Requires splitting to be uniform • Could be made dynamic if we would have: • InputCollection = “FIND: basedir wildcard” • Would be very useful for automatic merging of AODs
MyAnalysis_merge(“…/Output, stage=1, chunk) MyAnalysis_merge(“…/Output, stage=2, chunk) MyAnalysis.root wn.xml Output/001/ AnalysisResults.root Output/00n/ AnalysisResults.root Output/006/ AnalysisResults.root Output/005/ AnalysisResults.root Output/004/ AnalysisResults.root Output/003/ AnalysisResults.root Output/002/ AnalysisResults.root Output/ AnalysisResults_ Stage01_002.root Output/ AnalysisResults_ Stage01_001.root Output/ AnalysisResults_ Stage01_000.root MyAnalysis.root wn.xml Output/ AnalysisResults.root plugin->SetMergeViaJDL() plugin->SetMaxMergeFiles(3) MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root 123456.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root 123456.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml MyAnalysis.root wn.xml plugin->SetRunMode(“full”) StartAnalysis(“grid”) plugin->SetRunMode(“terminate”) StartAnalysis(“grid”) plugin->SetRunMode(“terminate”) StartAnalysis(“grid”) mgr->Terminate() (per run) Running again with SetMergeViaJDL(kFALSE) will merge all runs on the client. MyAnalysis.root wn.xml
Single task optimization • Loading manually only the requested branches at task level (C.Loizides) • Calls only for the selected eventsbranch->GetEntry()instead of tree->GetEntry() • Can highly reduce I/O for single task (or few task) analysis • Specially when looking for rare events w/o tags and/or needing well-localized information • This is an expert setting: when misused will silently make data belonging to not loaded branches unavailable !!! STEERING MACRO: analysisManager->SetAutoBranchLoading(kFALSE); TASK: //______________________________________________________________ void AliAnalysisTaskPt::UserExec(Option_t *) { ... AliAnalysisManager *am = AliAnalysisManager::GetAnalysisManager(); am->LoadBranch("AliESDHeader."); am->LoadBranch("AliESDRun."); /* should have meaningful check here, use dummy just to illustrate example*/ if (some_condition_on_event_header_or_ESDRun_object) { return; } // We can load the interesting branches: am->LoadBranch("Tracks"); // Track loop to fill a pT spectrum printf("There are %d tracks in this event\n", fESD->GetNumberOfTracks()); ... track loop
PROOF analysis via the AliEn handler • New API to add configuration related to the proof cluster • Completely transparent for the user task and for the steering macro • mgr->StartAnalysis(“proof”) • Most AF features available • Plugin “test” mode will run the analysis in proof lite mode on a local chain described bi the file used in SetFileForTestMode /********************************************************* *** PROOF MODE SPECIFIC SETTINGS ************ *********************************************************/ // Proof cluster plugin->SetProofCluster("alice-caf"); // plugin->SetProofCluster("skaf.saske.sk"); // Dataset to be used // plugin->SetProofDataSet("/alice/data/LHC10e_000128175_p1#esdTree"); plugin->SetProofDataSet("/alice/data/LHC10e_000128452_p1#esdTree"); // May need to reset proof. Supported modes: 0-no reset, 1-soft, 2-hard plugin->SetProofReset(0); // May limit number of workers plugin->SetNproofWorkers(0); // May limit the number of workers per slave plugin->SetNproofWorkersPerSlave(1); // May use a specific version of root installed in proof plugin->SetRootVersionForProof("current"); // May set the aliroot mode. Check http://aaf.cern.ch/node/83 plugin->SetAliRootMode("default"); // Loads AF libs by default // May request ClearPackages (individual ClearPackage not supported) plugin->SetClearPackages(kFALSE); // Plugin test mode works only providing a file containing test file locations plugin->SetFileForTestMode("files.txt"); // Request connection to alien upon connection to grid plugin->SetProofConnectGrid(kFALSE); return plugin;
Analysis statistics information • EventStat_temp.root available in AOD analysis • Added method AliInputEventHandler::GetStatistics(Option_t *option) to retrieve the physics selection histograms in ESD and AOD analysis. • This method will return the statistics TH2F histograms filled by the AliPhysicsSelection in case the task AliPhysicsSelectionTask is used in the ESD train (or used during AOD production). • To use, the user task must call this method during FinishTaskOutput (executed on the worker after all events are processed) AliAnalysisManager *am = AliAnalysisManager::GetAnalysisManager(); AliInputEventHandler *inputH = dynamic_cast<AliInputEventHandler*>(am->GetInputEventHandler()); if (!inputH) return; TH2F *histStat = dynamic_cast<TH2F*>(inputH->GetStatistics()); TH2F *histBin0 = dynamic_cast<TH2F*>(inputH->GetStatistics("BIN0")); • AliAnalysisManager::AddStatisticsMsg() to add user messages lelated to the processed statistics. • The analysis manager dumps all messages in a file called <nevents>.stat (nevents in format %09d) that is written after processing on the slave but also during Terminate (client)
CDB access and run number • New task created for CDB access in the QA train • PWG1/AliTaskCDBconnect.h/.cxx • To be used by tasks in central trains • Generally the run number is not accesible in UserCreateOutputObjects • Added new static method AliAnalysisManager::GetRunFromAlienPath() that extracts the run number from the path to data (must be an alien path to data or MC). This is used by the plugin to set the new data member AliAnalysisManager::fRunFromPath which is available in UserCreateOutputObjects of any task if running via the plugin in grid mode. • Use mgr->GetRunFromPath() in your task to get this