1 / 35

Outline

Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013. Outline. The Science Software & Computing Challenges Present Solutions Future Solutions. The Science. The Universe is a strange place!.

matteo
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamically Creating Big Data Centers for the LHCFrank WürthweinProfessor of PhysicsUniversity of California San DiegoSeptember 25th, 2013

  2. Outline • The Science • Software & Computing Challenges • Present Solutions • Future Solutions Frank Wurthwein - ISC Big Data

  3. The Science

  4. The Universe is a strange place! ~67% of energy is “dark energy” We got no clue what this is. ~29% of matter is “dark matter” We have some ideas but no proof of what this is! All of what we know makes up Only about 4% of the universe. Frank Wurthwein - ISC Big Data

  5. Mont Blanc To study Dark Matter we need to create it in the laboratory Lake Geneva LHCb ATLAS CMS ALICE Frank Wurthwein - ISC Big Data

  6. “Big bang” in the laboratory • We gain insight by colliding particles at the highest energies possible to measure: • Production rates • Masses & lifetimes • Decay rates • From this we derive the “spectroscopy” as well as the “dynamics” of elementary particles. • Progress is made by going to higher energies and brighter beams. Frank Wurthwein - ISC Big Data

  7. Explore Nature over 15 Orders of magnitudePerfect agreement between Theory & Experiment Dark Matter expected somewhere below this line. Frank Wurthwein - ISC Big Data

  8. And for the Sci-Fi Buffs … Imagine our 3D world to be confined to a 3D surface in a 4D universe. Imagine this surface to be curved such that the 4th D distance is short for locations light years away in 3D. Imagine space travel by tunneling through the 4th D. The LHC is searching for evidence of a 4th dimension of space. Frank Wurthwein - ISC Big Data

  9. Recap so far … • The beams cross in the ATLAS and CMS detectors at a rate of 20MHz • Each crossing contains ~10 collisions • We are looking for rare events that are expected to occur in roughly 1/10000000000000 collisions, or less. Frank Wurthwein - ISC Big Data

  10. Software & ComputingChallenges

  11. The CMS Experiment

  12. The CMS Experiment • 80 Million electronic channels x 4 bytes x 40MHz ----------------------- ~ 10 Petabytes/sec of information x 1/1000 zero-suppression x 1/100,000 online event filtering ------------------------ ~ 100-1000 Megabytes/sec raw data to tape 1 to 10 Petabytes of raw data per year written to tape, not counting simulations. • 2000 Scientists (1200 Ph.D. in physics) • ~ 180 Institutions • ~ 40 countries • 12,500 tons, 21m long, 16m diameter Frank Wurthwein - ISC Big Data

  13. Active Scientists in CMS ~1/4 of the collaboration, scientists and engineers, contributed to the common source code of ~3.6M C++ SLOC. 5-40% of the scientific members are actively doing large scale data analysis in any given week. Frank Wurthwein - ISC Big Data

  14. Evolution of LHC Science Program 1000Hz 150Hz 10000Hz Event Rate written to tape Frank Wurthwein - ISC Big Data

  15. The Challenge How do we organize the processing of 10’s to 1000’s of Petabytes of data by a globally distributed community of scientists, and do so with manageable “change costs” for the next 20 years ? Guiding Principles for Solutions Chose technical solutions that allow computing resources as distributed as human resources. Support distributed ownership and control, within a global single sign-on security context. Design for heterogeneity and adaptability. Frank Wurthwein - ISC Big Data

  16. Present Solutions

  17. Federation of National Infrastructures. In the U.S.A.: Open Science Grid Frank Wurthwein - ISC Big Data

  18. Among the top 500 supercomputers there are only two that are bigger when measured by power consumption. Frank Wurthwein - ISC Big Data

  19. Tier-3 Centers • Locally controlled resources not pledged to any of the 4 collaborations. • Large clusters at major research Universities that are time shared. • Small clusters inside departments and individual research groups. • Requires global sign-on system to be open for dynamically adding resources. • Easy to support APIs • Easy to work around unsupported APIs Frank Wurthwein - ISC Big Data

  20. Me -- My friends -- The grid/cloud Common to all sciences and industry Domain science specific Me Thin client O(104) Users Thick VO Middleware & Support My friends O(101-2) VOs The anonymous Grid or Cloud Thin “Grid API” O(102-3) Sites Frank Wurthwein - ISC Big Data

  21. “My Friends” Services • Dynamic Resource provisioning • Workload management • schedule resource, establish runtime environment, execute workload, handle results, clean up • Data distribution and access • Input, output, and relevant metadata • File catalogue Frank Wurthwein - ISC Big Data

  22. Optimize Data Structure for Partial Reads Frank Wurthwein - ISC Big Data

  23. Fraction of a file that is read Average 20-35% Median 3-7% 20% (depending on type of file) # of files read Overflow bin For vast majority of files, less than 20% of the file is read. Frank Wurthwein - ISC Big Data

  24. Future Solutions

  25. From present to future • Initially, we operated a largely static system. • Data was placed quasi-static before it can be analyzed. • Analysis centers have contractual agreements with the collaboration. • All reconstruction is done at centers with custodial archives. • Increasingly, we have too much data to afford this. • Dynamic data placement • Data is placed at T2s based on job backlog in global queues. • WAN access: ”Any Data, Anytime, Anywhere” • Jobs are started on the same continent as the data instead of the same cluster attached to the data. • Dynamic creation of data processing centers • Tier-1 hardware bought to satisfy steady state needs instead of peak needs. • Primary processing as data comes off the detector => steady state • Annual Reprocessing of accumulated data => peak needs Frank Wurthwein - ISC Big Data

  26. Any Data, Anytime, Anywhere Global redirection system to unify all CMS data into one globally accessible namespace. Is made possible by paying careful attention to IO layer to avoid inefficiencies due to IO related latencies. Frank Wurthwein - ISC Big Data

  27. Vision going forward Implemented vision for 1st time in Spring 2013 using Gordon Supercomputer at SDSC. Frank Wurthwein - ISC Big Data

  28. Frank Wurthwein - ISC Big Data

  29. CMS “My Friends” Stack • CMSSW release environment • NFS exported from Gordon IO nodes • Future: CernVM-FS via Squid caches • J. Blomeret al.;2012 J. Phys.: Conf. Ser. 396052013 • Security Context (CA certs, CRLs) via OSG worker node client • CMS calibration data access via FroNTier • B. Blumenfeldet al; 2008 J. Phys.: Conf. Ser.119 072007 • Squid caches installed on Gordon IO nodes • glideinWMS • I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950 • Implements “late binding” provisioning of CPU and job scheduling • Submits pilots to Gordon via BOSCO (GSI-SSH) • WMAgent to manage CMS workloads • PhEDExdata transfer management • Uses SRM and gridftp Job environment Data and Job handling Frank Wurthwein - ISC Big Data

  30. CMS “My Friends” Stack This is clearly mighty complex !!! • CMSSW release environment • NFS exported from Gordon IO nodes • Future: CernVM-FS via Squid caches • J. Blomeret al.;2012 J. Phys.: Conf. Ser. 396052013 • Security Context (CA certs, CRLs) via OSG worker node client • CMS calibration data access via FroNTier • B. Blumenfeldet al; 2008 J. Phys.: Conf. Ser.119 072007 • Squid caches installed on Gordon IO nodes • glideinWMS • I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950 • Implements “late binding” provisioning of CPU and job scheduling • Submits pilots to Gordon via BOSCO (GSI-SSH) • WMAgent to manage CMS workloads • PhEDExdata transfer management • Uses SRM and gridftp Job environment So let’s focus only on the parts that are specific to incorporating Gordon as a dynamic data processing center. Data and Job handling Frank Wurthwein - ISC Big Data

  31. Items in red were deployed/modified to incorporate Gordon Minor mod of PhEDExconfig file BOSCO Deploy Squid Export CMSSW & WN client Frank Wurthwein - ISC Big Data

  32. Gordon Results • Work completed in February/March 2013 as a result of a “lunch conversation” between SDSC & US-CMS management • Dynamically responding to an opportunity • 400 Million RAW events processed • 125 TB in and ~150 TB out • ~2 Million core hours of processing • Extremely useful for both science results as well as proof of principle in software & computing. Frank Wurthwein - ISC Big Data

  33. Summary & Conclusions • Guided by the principles: • Support distributed ownership and controlin a global single sign-on security context. • Design for heterogeneity and adaptability • The LHC experiments very successfully developed and implemented a set of new concepts to deal with BigData. Frank Wurthwein - ISC Big Data

  34. Outlook • The LHC experiments had to largely invent an island of BigData technologies with limited interactions with industry and other domain sciences. • Is it worth building bridges to other islands ? • IO stack and HDF5 ? • MapReduce ? • What else ? • Is there a mainland emerging that is not just another island ? Frank Wurthwein - ISC Big Data

More Related