1 / 11

UTA MC Production Farm & Grid Computing Activities

Explore UTA's MC farming activities, control software, expansion plans, and grid computing efforts. Learn about reliable error recovery, SAM interfacing, job packaging, and future upgrades.

tarleton
Download Presentation

UTA MC Production Farm & Grid Computing Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UTA MC Production Farm &Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 • UTA DØMC Farm • MCFARM Job control and packaging software • What has been happening?? • Conclusion

  2. UTA DØ Monte Carlo Farm • UTA operates 2 Linux MC farms: HEP and CSE HEP farm: 6x566 , 36x866 MHz processors, 3 file servers, (250 GB) one job server, 8mm tape drive. CSE farm:10x866 MHz processors, 1 file server (20 GB), 1 job server Exploring an option of adding a third farm (ACS,36x866 MHz) • Control software (job submission, load balancing, archiving, bookkeeping, job execution control etc) developed entirely in UTA by Drew Meyer • Scalable:started with 7 and 52 processors at present http://wwwhep.uta.edu/~mcfarm/mcfarm/main.html Jae Yu, UTA Grid Effort DØRACE Workshop

  3. MCFARM – UTA Farm Control Software • MCFARM is a specialized batch systemfor: Pythia, Isajet, D0g, D0sim, D0reco, recoanalyze • Can be adaptedfor other sites and experiments with relative minor change • Reliable Error recovery and check point system: It knows how to handle and recover from typical error conditions. • Robust – even if several nodes crash the production can continue • Interfaced to SAM and bookkeeping package, easily exports production status to WWW page http://www-hep.uta.edu/~mcfarm/mcfarm/main.html Jae Yu, UTA Grid Effort DØRACE Workshop

  4. UTA cluster of Linux farms and current expansion plans SAM mass storage in FNAL 32 866MHz bb_ftp ACS farm at UTA (planned) UTA www server bb_ftp UTA analysis server (300Gb) 8 mm tape (planned) 12 866MHz …… …… 25 dual 866MHz CSEfarm at UTA (1 supervisor, 1 file server, 10 workers) HEPfarm at UTA (1 supervisor, 3 file servers, 21 workers, tape drive) Jae Yu, UTA Grid Effort DØRACE Workshop

  5. Main server (Job Manager) Can read and write to all other nodes Contains executables and job archive Execution node (The worker) Mounts its home directory on main server. Can read and write to file server disk File server Mounts /home on main server. Its disk stores min bias and generator files and is readable and writable by everybody HEP and CSE farms share the same layout, differing only by the number of nodes involved and by the export software Flexible layout allows for simple expansion process Jae Yu, UTA Grid Effort DØRACE Workshop

  6. DØgstar (DØ GEANT) DØgstar (DØ GEANT) Underlying events (prepared in advance) SAM storage in FNAL SAM storage in FNAL DØ Monte Carlo Production Chain Generator job (Pythia, Isajet, …) DØsim (Detector response) DØreco (reconstruction) RecoA (root tuple) Jae Yu, UTA Grid Effort DØRACE Workshop

  7. UTA MC farm software daemons and their control WWW Root daemon Lock manager Monitor daemon Bookkeeper Distribute daemon Execute daemon Gather daemon Job archive Cache disk Jae Yu, UTA Grid Effort DØRACE Workshop Remote machine SAM

  8. Distributer daemon Executer daemon Gatherer daemon Distribute queue Job Life Cycle Execute queue Gatherer queue Error queue Cache, SAM, archive

  9. Mcp10 production (Oct2001-now) recoA files In SAM Jobs done Reco events in SAM Jae Yu, UTA Grid Effort DØRACE Workshop

  10. What’s been happening for Grid? • Investigating network bandwidth capacity at UTA • Conducting tests using normal FTP and bbftp • The UTA farm will be put on a gigabit bandwidth link • Would like to leverage on our extensive experience with Job packaging and control • Would like to interface farm control to more generic Grid tools • A design document for such higher level interface has been submitted for perusal to the DØGrid group. • Expand to include ACS farm • Exploit SAM station set up and exercise remote reconstruction • Proposed to the displaced vertex group to reconstruct their special data set  More complex than originally anticipated due to DB transport • Upgrade the HEP farm server Jae Yu, UTA Grid Effort DØRACE Workshop

  11. Conclusions • The UTA farm has been very successful • The internally developed UTA MCFARM software is solid and robust • The MC production is very efficient • We plan to use our farms for data reprocessing, not only MC production. • We would like to leverage on the extensive experience of running MC production farm • We believe we can contribute significantly in higher level user interface and job packaging Jae Yu, UTA Grid Effort DØRACE Workshop

More Related