1 / 12

Data Management

Data Management. D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions. Kors Bos, NIKHEF, Amsterdam Fermilab, May 23 2001. D0 Monte Carlo needs. D0 Trigger rate is 100 Hz, 10 7 seconds/yr  10 9 events/yr

maegan
Download Presentation

Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab, May 23 2001

  2. D0 Monte Carlo needs • D0 Trigger rate is 100 Hz, 107 seconds/yr  109 events/yr • We want at least 10% of that be simulated  108 events/yr • To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) • On a 800 MHz PIII • So 1 cpu can produce ~105 events/yr (~200 Gbyte) • Assuming a 60% overall efficiency • So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte) • And this is only 10% of the goal we set ourselves • Not counting Nijmegen D0 farm yet • So we need at least an order of magnitude more • UTA (50), Lyon (200), Prague(10), BU(64), • Nijmegen(50), Lancaster(200), Rio(25),

  3. Example: Min.bias • Did a run with 1000 events on all cpu’s • Took ~2 min./event • So ~1.5 days for the whole run • Ouput file size ~575 MByte • We left those files on the nodes • reason for enough local disk space ! • Intend to repeat that “sometimes”

  4. Output data • -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

  5. Output data translated isajet_*.params RunJob_Farm_*.params d0gstar_*.params d0sim_*.params samglobal_*.params Summary.conf 0.047 Gbyte gen_* 1.5 Gbyte d0g_* 0.7 Gbyte sim_* import_gen_*.py import_d0g_*.py import_sim_*.py 12 files for generator+d0gstar+psim But of course only 3 big ones Total ~2 Gbyte Per Day, on 100 cpu’s Total 200 Gbyte/day !

  6. Automation • Mc_runjob (modified) • Prepares MC jobs (gen+sim+reco+anal) • (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times • Submits them into the batch (FBSNG) • Ran on the nodes • Moves the executable to the nodes + some files • Copy to fileserver after completion • A separate batch job onto the fileserver • Data moves between nodes and server • Submits them into SAM • Sam does file transfers to Fermi and SARA • Runs for a week …

  7. 1.2 TB fbs(rcp) fbs(sam) mcc request farm server SAM DB file server fbs job: 1 mcc 2 rcp 3 sam fbs(mcc) datastore mcc input SARA mcc output datastore FNAL node 50 + control 40 GB data metadata

  8. Network bandwidth • NIKHEF SURFnet 1 Gbit • SURFnet: Amsterdam  Chicago 622 Mbit • Esnet: Chicago  Fermilab 155 Mbit ATM • But ftp gives us ~4 Mbit/sec • bbftp gives us ~25 Mbit/sec • bbftp processes in parallel ~45 Mbit/sec • For 2002 • NIKHEF SURFnet 2.5 Gbit • SURFnet: Amsterdam  Chicago 622 Mbit • SURFnet: Amsterdam  Chicago 2.5 Gbit optical • Chicago  Fermilab ? More than 155

  9. 100 Gbit/s SURFnet5 20 Gbit/s SURFnet4 10 Gbit/s 10 Gbit/s 2,5 Gbit/s 1.0 Gbit/s 1 Gbit/s Access capacity 155 Mbit/s 100 Mbit/s 10 Mbit/s 1999 2000 2001 2002 network capacity internally

  10. TA network capacity UK SuperJANET4 NL SURFnet GEANT It GARR-B Fr Renater NewYork Abilene STAR-LIGHT ESNET Geneva 2.5 Gb MREN 622 Mb STAR-TAP

  11. Network load last week • Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) • Available to Chicago: 622 Mbit/s • Available to FNAL: 155 Mbit/s • Needed next year (double cap.): ~25 Mbit/s • Available to Chicago: 2.5 Gbit/s: factor 100 more !! • Available to FNAL: ??

  12. Conclusions • Producing a lot of data is easy • Storing a lot of data less easy, but still easy • Moving a lot of data even less easy, but still easy • So what is the problem? • Managing a lot of data is difficult  metadata dbase • The network around Fermilab/CERN is getting tight • Otherwise there is enough bandwidth ! • Conclusion: • Do the easiest thing: • Don’t store or move: recalculate !!

More Related