220 likes | 434 Views
Development of Farm Monitoring & Remote Concatenation for CDFII Production Project. By: Solomon Mikael (UMBC) Advisors: Elena Vataga (UNM) & Pavel Murat (FNAL). Outline. CDF Experiment CDF Production Farm Goals & Structure Issues with Concatenation My Contributions 1 Control & Monitoring
E N D
Development of Farm Monitoring & Remote Concatenation for CDFII Production Project By: Solomon Mikael (UMBC) Advisors: Elena Vataga (UNM) & Pavel Murat (FNAL)
Outline • CDF Experiment • CDF Production Farm Goals & Structure • Issues with Concatenation • My Contributions 1 • Control & Monitoring • My Contributions 2 • Summary • Acknowledgments Solomon Mikael
Goal of CDF Prodution Farm • The main goal of the Production Farm is the reconstruction of available data for physics analysis as soon as possible, reprocess data when necessary, and generate Monte Carlo events Solomon Mikael
CDF Experiment • The CDF ( collider detector experiment at FermiLab) is an international collaboration involving many universities and national laboratories • 2 intense beams of protons and anti-protons meet head on in the middle of the 100 ton solenoidal CDF detector • In order to observe the particles there are layers of subdetectors in the CDF detector each layer responsible for the detection of a different particles properties. • Information from 1000000 electronic channels is recorded Solomon Mikael
Production Farm - Hardware • the Production Farm consists of 150 dual CPU PC's with a total computing power of 800 GHz • The high throughput Linux clusters are used for event reconstruction and analysis • The production farm PC have a total of 25 TB of disk space Solomon Mikael
Production Farm - Software • the CDF production farm performs computing and network intensive tasks in a cost effective manner • SAM (sequential data access via metadata) a data handeling system organized as set of servers working together to store and retrieve files. SAM mitigates the problem of one person hogging the tape drives and/or flooding the tape system. SAM provides tools for the database bookkeeping. • CAF (CDF Analysis Farm) is software and control systems for batch job submission on top of Condor batch system. Solomon Mikael
Stucture of CDF Farm Solomon Mikael concatenation
Issues with Concatenation • In the present scheme concatenator and tape uploader are running at the same time resulting in limited I/O from the stager. • Disk access rate depends on the number of simultaneous I/O operations from the disk RAID 5. • My project at CDF entailed removing the load of concatenation from the stagers to CAF to achieve higher data flow rates. Entries tape transfer rate (MB/s) tape transfer rate (MB/s) Solomon Mikael
Structure Stager Worker • mergeSubmit.py • Analyze input directory • Creates .tcl • Send CAF job • Script copies input files • Run binary code for concatenator • Copy output file to stager Monitoring Solomon Mikael
Required Skills • Before I could make any changes to the CDF farm it was imperative I learned how the individual parts of the farm operated and how they are interrelated: • Effective use of bash scripts & awk text editor • Learning python to modifying the concatenating script MergeSubmit.py • Modifying Tikiwiki pages using the online Tikiwiki editor & web pages Solomon Mikael
What’s BASH & AWK • Shell is a program which interprets commands, either typed in directly by the user or contained in a file called a shell script. • Awk named after its developers ( Aho, Weinberger, and Kernighan ) is a programming language which permits manipulation of structured data and generation of formatted reports. A pattern scanning and processing language Solomon Mikael
#!/bin/bashname=`basename $0`. ./cdfopr/scripts/common_procedures. ./cdfopr/scripts/parse_parameters $* > temp_parse_logecho $TCL_FILEecho $PARAM_USERecho $PARAM_HOSTecho $PARAM_PATHcmd="fcp -c ${RCP} ${FCP_USER}@${FCP_HOST}:${PARAM_PATH}/${TCL_FILE} .";$cmdSTATUS=$?;if[ STATUS -ne 0 ];then echo "$TCL_FILE was not able to be copied" exit 1;fifor file_loc in `grep "include file /" ${TCL_FILE} | awk '{print $3}'`;do cmd="fcp -c ${RCP} ${FCP_USER}@${FCP_HOST}:${file_loc} . "; $cmd STATUS=$?; if [ STATUS -ne 0 ];then echo "$file_loc was not able to be copide fidone the_num=2;export JOB_NUM=3echo $SEGMENT_NUMBERecho $JOB_NUMtemp.awk -v seg_num=$SEGMENT_NUMBER --------------------------------------------------------------- #!/bin/awkBEGIN { flag = 0;}/SEGMENT_NUMBER/ { if ($5 == seg) { flag = 1 }}/include/ { if (flag == 1) { print $0 }}{ if ($1 == "}") { flag = 0 }# if (($4 == "==") && (ENVIRON["SEGMENT_NUMBER"] == $5)) {print $0}} Solomon Mikael
Example .tcl File if { $env(SEGMENT_NUMBER) == 1 } {#------------------------------------------------------# OutputDir = "/export/data1/cdfmc/concatTest/Monte_Carlo_Test1/mergeLogs/hphysr_0y_01/tmp" ;# total size: 27780#------------------------------------------------------ set DATASET xbck0y include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbck0y/reco.xy0339c5.0284bck0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbck0y/reco.xy0339c5.028ebck0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbck0y/reco.xy0339c5.0298bck0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbck0y/reco.xy0339c5.02a2bck0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbck0y/reco.xy0339c5.02acbck0 } if { $env(SEGMENT_NUMBER) == 2 } {#------------------------------------------------------# OutputDir = "/export/data1/cdfmc/concatTest/Monte_Carlo_Test1/mergeLogs/hphysr_0y_01/tmp" ;# total size: 1380315#------------------------------------------------------ set DATASET xbhd0y include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbhd0y/reco.xy0339c5.027abhd0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbhd0y/reco.xy0339c5.0284bhd0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbhd0y/reco.xy0339c5.028ebhd0 include file /export/data1/cdfmc/concatTest/Monte_Carlo_Test1/xbhd0y/reco.xy0339c5.0298bhd0 } Solomon Mikael
Control & Monitoring • Tikiwiki software is used for web based documentation • The tiki database keeps a history of all changes to the Farm Projects • Tiki pages enable users to: • keep track of all existing projects • Start or stop a project • Change resource sharing between projects • Redirect output to another stager • Forward execution to CAF without having to connect to the main server • Python’s extensive support for XML, email, RSS feeds and many other Internet protocols make it effective for developing custom web solutions Solomon Mikael
Monitoring – Web Page Interface • to ensure CDF production farm runs smoothly the hardware performance including status reporting must be monitered • this is done using the production farm web interface (PFWI) • PFWI parses, calculates, and displays all major characteristics of the farm with online results Solomon Mikael
Tikiwiki Contributions • In this page it shows the disk space on the 32 partitions on the different servers Edited pythons script df_disk.py 110 p = string.find(output, '/export/data4') 111usage = string.strip(output[p-24:p]) 112fp.write(""" fncdfsrv5 %20s /export/data4 """ % usage) 113fp.write("\n") 114percentage = string.strip(output[p-5:p-2]) 115 if string.atoi(percentage) > 90 : IsFull=1 Solomon Mikael
Tiki Editor • Using the online tiki editor modifications were made to improve the functionality of the ProjectConfiguration page Solomon Mikael
Summary • In these weeks: • Implemented improvements to Production Farm monitoring. • Participated in development of remote concatenation. • Acquired new skills • Learned about the physics inside the FermiLab laboratory Solomon Mikael
Acknowledgements • SIST committee for giving me this opportunity • Elena Vataga & Pavel Murat • Ms. Engram & Dr. Elliott McCrory • Dr. Davenport & Jamieson Olsen Solomon Mikael
BACKUP Solomon Mikael
History of Production Farm • Fermilab has used clusters of processors to provide large computing power with dedicated processors like the Motoroloa 68030 • CDF Run 2 data was processed using the first developed Farm Processing System (FPS) using FBSNG batch system (1) • Farm Processing System was the software that managed, controlled, and monitered the CDF production farm from 1999-2005 Solomon Mikael
Monitoring • PARSING - this layer access MySQL or CAF output files and after processing text and performing calculations the data is fed to cache layer • CACHE – this layer does statistical preprocessing and has an interface to easily visualize the data. The data is then stored. (1) • WEB – displays all the information collected by the parseres and gathers data not needing pre-processing. -- Uses PHP4 to generaet the web pages. • Python tiki Solomon Mikael