1 / 25

SkimSlimService

SkimSlimService. enabling new ways. Ilija Vukotic, Rob Gardner, Lincoln Bryant Computation and Enrico Fermi Institutes, University of Chicago Software & Computing Workshop March 13, 2013. Overview. A new paradigm for HEP community – data as a service.

Download Presentation

SkimSlimService

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SkimSlimService enabling new ways Ilija Vukotic, Rob Gardner, Lincoln Bryant Computation and Enrico Fermi Institutes, University of Chicago Software & Computing Workshop March 13, 2013

  2. Overview • A new paradigm for HEP community – data as a service. • Efficiently use available resources (over the pledge, OSG, ANALY queues, EC2) • Free physicists from dealing with big data • Free IT professionals from dealing with physicists, let them deal with what they do the best - big data. Ilija Vukotic <ivukotic@uchicago.edu>

  3. Problems of Current Analysis Model Unsustainable in the long run (higher luminosity, no faster cpu’s) Physicists have no feedback on resources they used. Long running times. Very small percentage of people wants/knows-how to optimize their code. IT people are not happy when someone submits 10k jobs running with 1% efficiency for days, producing 10k of 100 MB files. Huge load on people doing DPD production, frequent errors, slow turnaround. Nobody wants to care about DS sizes, registrations, DDM transfers, approvals. This is the moment to do changes. Ilija Vukotic <ivukotic@uchicago.edu>

  4. (R)evolution of ATLAS data formats 3y. ago ESD < 1800kB/ev 10k br. AOD <1000 kB/ev 7k br. D3PD <20 kB/ev 500-7 k br. Athena + ARA + ROOT 4y. ago ESD < 1500kB/ev 8k br. AOD <500 kB/ev 4k br. Athena + ARA Original plan (6y. ago) ESD < 500kB/ev 1k br. AOD <100 kB/ev 500 br. Athena used for everything. Proposals for future ESD < 1800kB/ev AOD <1000 kB/ev GODZILA D3PDs , Structured D3PDs D3PD Athena + ARA + ROOT + Mana+ RootCore + Event… Today ESD < 1800kB/ev AOD <1000 kB/ev D3PD <200 kB/ev Athena + ARA + ROOT + Mana+ RootCore + Event… TAG ?! Ilija Vukotic <ivukotic@uchicago.edu>

  5. Problems with ATLAS data formats Ilija Vukotic <ivukotic@uchicago.edu>

  6. What a physicist want? A full freedom to do analysis In a language he wants Not be forced to use complex frameworks with hundreds of libraries, 20 min compilations, etc. Not be forced to think about computing farms, queues, data transfers, job efficiency, … Get results in no time. Ilija Vukotic <ivukotic@uchicago.edu>

  7. Idea Let small number of highly experienced physicists together with IT stuff handle big data. They can do it efficiently. Move majority of physicists away from 100TB scale data to ~100GB data. Sufficiently small for transport, you can analyze it anywhere, even on your laptop. However inefficient your code you won’t spend too much resources, and will get results back in a reasonable time. Ilija Vukotic <ivukotic@uchicago.edu>

  8. How would it work Use FAX to access all the data without overhead of staging. Use optimally situated replicas. (possible optimization - production D3PDs preplaced at just several sites, maybe even just one) Physicists request skim/slim through a web service. Could add a few variables in flight. Produced datasets registered in the name of requester. Delivered to a site requested. As all of the data is available in FAX, one can do skims of not only production D3PDs but of any flat ntuple, or multi-pass SkimSlims. Ilija Vukotic <ivukotic@uchicago.edu>

  9. How would it work Timely result is a paramount! Several levels of service depending on size of input and output data and importance: Example: • <1TB (input+output) - 2 hours service – this one is essential, as only in this case people will skim/slim to only variables they need without thinking of – “what if I forget something I’ll need”. • 1-10TB – 6 hours • 10-100TB – 24 hours • Extra fast delivery: at EC2 but comes with a sticker tag Ilija Vukotic <ivukotic@uchicago.edu>

  10. Would it work? Couple hundreds dedicated cores which are made free from all personal inefficient slims/skims using prun. Highly optimized code As we know what branches (variables) people are using we know what is useless in the original D3PDs, so we can produce them much smaller. If bug found in D3PD production no new global redistribution. Some problems can even be fixed in place without new production. If we find it useful we can split/merger/reorganize D3PD without anyone noticing. We could later even go for a completely different underlying big data format: Godzilla D3PDs, merged AOD/D3PD, Hadoop ! Ilija Vukotic <ivukotic@uchicago.edu>

  11. SkimSlimService Handmade server1 receives web queries, collects info on datasets, files, trees, branches Web site at CERN gets requests, shows their status OracleDBat CERN Stores requests, splits them in tasks, serves as a backend for the web site Executor at UC31 gets tasks from the DB, creates, submits condor SkimSlim jobs2 makes and registers resulting DS3 • 1 We have no dedicated resources for this I used UC3but any queue that has cvmfs will suffice. • 2 Modified version of filter-and-merge.py used. • 3 Currently under my name as I don’t have production role. Ilija Vukotic <ivukotic@uchicago.edu>

  12. http://ivukotic.web.cern.ch/ivukotic/SSS/index.asp Ilija Vukotic <ivukotic@uchicago.edu>

  13. Test runs results Used datasets, skim, slim code of our larges user. Worst case scenario. All of the SMWZ 2012 data and MC 185 TB -> 10 TB (300 branches) Missing in FAX 24 datasets (~3.5%) Ilija Vukotic <ivukotic@uchicago.edu>

  14. Test runs results CPU efficiency: when data local ~ 0.75%, remote data between 10 and 50% (6.25MB/s gives 100% eff.) All of SMWZ requires 8600 CPU hours. Can be done in 2 hours by pooling unused resources. Could have one service in EU and one in US to avoid over the ocean traffic. It is easy to deploy service on anything that mounts CVMFS (UC3,UCT3, UCT2, OSG, EC2). On EC2 assuming small instance ~ 500$ Micro instance and spot pricing ~100$. But result delivery ~1k$ (10TB * 0.12/GB). Ilija Vukotic <ivukotic@uchicago.edu>

  15. Conclusion Produced a fully functional system you may use now. To be done • Polish it • Market it • Push it politically (essential) Ilija Vukotic <ivukotic@uchicago.edu>

  16. Reserve Ilija Vukotic <ivukotic@uchicago.edu>

  17. What is FAX? • A number of ATLAS sites made their storage accessible from outside using xRootD protocol1. • Has a mechanism that gets you a file if it exists anywhere in the federation. • All kinds of sites: xrootd, dCache, dpm, lustre, gpfs • Read only • Need a grid proxy to use it • Instructions: • https://twiki.cern.ch/twiki/bin/view/Atlas/UsingFAXforEndUsers Redirector Endpoint • 1CMS has very similar system they call AAA. Ilija Vukotic <ivukotic@uchicago.edu>

  18. FAX today • We want all the T1s and T2s included. • Adding new sites weekly. • Currently 31. AGLT2 BNL-ATLAS BU_ATLAS_TIER2 CERN-PROD DESY-HH INFN-FRASCATI INFN-NAPOLI-ATLAS INFN-ROMA1 JINR-LCG2 LRZ-LMU MPPMU MWT2 OU_OCHEP_SWT2 PRAGUELCG2 RAL-LCG2 RU-PROTVINO-IHEP SWT2_CPB UKI-LT2-QMUL UKI-NORTHGRID-LANCS-HEP UKI-NORTHGRID-LIV-HEP UKI-NORTHGRID-MAN-HEP UKI-SCOTGRID-ECDF UKI-SCOTGRID-GLASGOW UKI-SOUTHGRID-CAM-HEP UKI-SOUTHGRID-OX-HEP WT2 WUPPERTALPROD GRIF-LAL GRIF-IRFU GRIF-LPNHE IN2P3-LAPP • 1CMS has very similar system they call AAA. Ilija Vukotic <ivukotic@uchicago.edu>

  19. Does it work? • YES! *For the most part. But a lot of redundancy in the system. We have ~2.5 copies of popular datasets. Ilija Vukotic <ivukotic@uchicago.edu>

  20. What is it good for? Ilija Vukotic <ivukotic@uchicago.edu>

  21. How it works? • Quite complex system • A lot of people involved • A lot of development • Takes time to deploy • Takes time to work • out kinks Ilija Vukotic <ivukotic@uchicago.edu>

  22. What can I do today? • Access data on T2 disks localgroupdisk, userdisk, … • If a file is not there job won’t fail, but will come from elsewhere. • I can run jobs at uct2/uct3 and access data anywhere in FAX. • Use frun: • If you have data processed at 10 sites all over the world • Want to merge them • Want to submit jobs where queues are short Ilija Vukotic <ivukotic@uchicago.edu>

  23. Full Dress Rehearsal A week of stress testing all of the FAX endpoints While we have continuous monitoring of standard user accesses (ROOT, xrdcp) to stress the system one has to submit jobs to grid. Submitting realistic jobs manually, automatically Had more problems with tests than with FAX • Late distribution of test dataset to endpoints (TB size datasets) • High load due to winter conferences did not help • Jobs running on a grid node are entirely different game due to limited proxy they use. • Found and addresses a number of issues • New voms libraries developed • Settings at several sites corrected • New pilot version Conclusion: We broke nothing(storages, lfcs, links, servers, monitoring). As soon as all observed problems fixed, we’ll hit harder. Ilija Vukotic <ivukotic@uchicago.edu>

  24. FAX – remaining to be done Near future: Further expansion: next in line – French and Spanish clouds Improving robustness of all the elements of the system Improving documentation, giving tutorials, user support Months: Move to Rucio Optimization: making network smart so it provides the fastest transfers Integration with other network services Ilija Vukotic <ivukotic@uchicago.edu>

  25. Say NO to IE, firefox, chrome! Foogle.com From inventors of WWW ! New internet search engine! Terminal based! • Simple to use: • Learn a few simple things (shell scripts, pbs/condor macros, python, root and c++, laTeX, … ) • Write a few hundreds pages of code • Process crawler data and rewrite in a new way. Move it • Rewrite original format to a new different one. • Rewrite again . Move it. • Rewrite again . Move it. • Rewrite again . Move it. • Code to find the page • Compile your page to ps/pdf • Show! • RAW -> ESD • ESD -> AOD • AOD -> D3PD • D3PD -> slimmed D3PD • slimmed one to Ntuple for final analysis • Final analysis Ilija Vukotic <ivukotic@uchicago.edu>

More Related