760 likes | 898 Views
LHC Scale Physics in 2008: Grids, Networks and Petabytes. Shawn McKee (smckee@umich.edu) May 18 th , 2005 Pan-American Advanced Studies Institute (PASI) Mendoza, Argentina. Acknowledgements. Much of this talk was constructed from various sources. I would like acknowledge:
E N D
LHC Scale Physics in 2008: Grids, Networks and Petabytes Shawn McKee (smckee@umich.edu) May 18th, 2005 Pan-American Advanced Studies Institute (PASI) Mendoza, Argentina
Acknowledgements • Much of this talk was constructed from various sources. I would like acknowledge: • Rob Gardner (U Chicago) • Harvey Newman (Caltech) • Paul Avery (U Florida) • Ian Foster (U Chicago/ANL) • Alan Wilson (Michigan) • The Globus Team • The ATLAS Collaboration • Trillium Shawn McKee - PASI - Mendoza, Argentina
Outline • Large Datasets in High Energy Physics • Overview of High Energy Physics and the LHC • The ATLAS Experiment’s Data Model • Managing LHC Scale Data • Grids and Networks Computing Model • Current Planning, Tools, Middleware and Projects • LHC Scale Physics in 2008 • Grids and Networks at Michigan • Virtual Data • The Future of Data Intensive Science Shawn McKee - PASI - Mendoza, Argentina
Introduction to High-Energy Physics • Before I can talk in detail about large datasets I want to provide a quick context for you to understand where all this data comes from. • High Energy physics explores the very small constituents of nature by colliding “high energy” particles and reconstructing the zoo of particles which result. • One of the most intriguing issues in High Energy physics we are trying to address is the origin of mass… Shawn McKee - PASI - Mendoza, Argentina
Physics with ATLAS: The Higgs Particle • The Riddle of Mass • One of the main goals of the ATLAS program is to discover and study the Higgs particle. The Higgs particle is of critical importance in particle theories and is directly related to the concept of particle mass and therefore to all masses. Shawn McKee - PASI - Mendoza, Argentina
High-Energy: From an Electron-Volt to Trillions of Electron-Volts • Energies are often expressed in units of "electron-volts". An electron-volt (eV) is the energy acquired by a electron (or any particle with the same charge) when it is accelerated by a potential difference of 1 volt. • Typical energies involved in atomic processes (processes such as chemical reactions or the emission of light) are of order a few eV. That is why batteries typically produce about 1 volt, and have to be connected in series to get much larger potentials. • Energies in nuclear processes (like nuclear fission or radioactive decay) are typically of order one million electron-volts (1 MeV). • The highest energy accelerator now operating (at Fermilab) accelerates protons to 1 million million electron volts (1 TeV =1012 eV). • The Large Hadron Collider (LHC) at CERN will accelerate each of two counter-rotating beams of protons to 7 TeV per proton. Shawn McKee - PASI - Mendoza, Argentina
What is an Event? • In the ATLAS detector there will be about a billion collision events per second, a data rate equivalent to twenty simultaneous telephone conversations by every person on the earth. ATLAS will measure the collisions of 7 TeV protons. Each time protons collide or single particles decay is called an “event” Shawn McKee - PASI - Mendoza, Argentina
How Many Collisions? • If two bunches of protons meet head on, the number of collisions from zero upwards. How often are there actually collisions? • For a fixed bunch size, this depends on how many protons there are in each bunch, and how large each proton is. • A proton can be roughly thought of as being about 10-15 meter in radius. If you had bunches 10-6 meters in radius, and only, say, 10 protons in each bunch, the chance of even one proton-proton collision when two bunches met would be extremely small. • If each bunch had a billion-billion (1018) protons so that its entire cross section were just filled with protons, every proton from one bunch would collide with one from the other bunch, and you would have a billion-billion collisions per bunch crossing. • The LHC situation is in between these two extremes, a few collisions (up to 20) per bunch crossing, which requires about a billion protons in each bunch. As you will see, this leads to a lot of data to sift through. Shawn McKee - PASI - Mendoza, Argentina
The Large Hadron Collider (LHC)CERN, Geneva: 2007 Start • 27 km Tunnel in Switzerland & France CMS TOTEM pp, general purpose; HI pp, general purpose; HI Atlas First Beams: April 2007 Physics Runs: from Summer 2007 ALICE : HI LHCb: B-physics Shawn McKee - PASI - Mendoza, Argentina
Data Comparison: LHC vs Prior Exp. High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 LHCB ATLAS CMS 105 HERA-B KLOE TeV II 104 Hans Hoffman DOE/NSF Review, Nov 00 High Data Archive(PetaBytes) CDF/D0 103 H1ZEUS ALICE NA49 UA1 102 104 105 106 107 LEP Event Size (bytes) Shawn McKee - PASI - Mendoza, Argentina
The ATLAS Experiment Shawn McKee - PASI - Mendoza, Argentina
ATLAS • A Torroidal LHC ApparatuS • Collaboration • 150 institutes • 1850 physicists • Detector • Inner tracker • Calorimeter • Magnet • Muon • United States ATLAS • 29 universities, 3 national labs • 20% of ATLAS Shawn McKee - PASI - Mendoza, Argentina
Data Flow from ATLAS 40 MHz (~PB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded processors 5 KHz (5 GB/sec) ATLAS: 10 PB/y (simulated + raw+sum) level 3 - PCs 200 Hz (100-400 MB/sec) data recording & offline analysis Shawn McKee - PASI - Mendoza, Argentina
LHC Timeline for Service Challenges We are here … not much time to get things ready! Shawn McKee - PASI - Mendoza, Argentina
The Data Challenge for LHC • There is a very real challenge to managing 10’s of Petabytes of data yearly for a globally distributed collaboration of 2000 physicists! • While much of the interesting data we seek is small in volume we must understand and sort through a huge volume of relatively uninteresting “events” to discover new physics. • The primary (only!) plan for LHC is to utilize Grid Middleware and high performance networks to harness the complete global resources of our collaborations to manage this data analysis challenge Shawn McKee - PASI - Mendoza, Argentina
Managing LHC Scale Data Grids and Networks Computing Model
The Problem Petabytes… Shawn McKee - PASI - Mendoza, Argentina
The Solution Shawn McKee - PASI - Mendoza, Argentina
What is “The Grid”? • There are many answers and interpretations • The term was originally coined in the mid-1990’s (in analogy with the power grid) and can be described thusly: “The grid provides flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources (virtual organizations:VOs)” Shawn McKee - PASI - Mendoza, Argentina
Grid Perspectives • Users Viewpoint: • A virtual computer which minimizes time to completion for my application while transparently managing access to inputs and resources • Programmers Viewpoint: • A toolkit of applications and API’s which provide transparent access to distributed resources • Administrators Viewpoint: • An environment to monitor, manage and secure access to geographically distributed computers, storage and networks. Shawn McKee - PASI - Mendoza, Argentina
Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS Data Grids for High Energy Physics CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2 ~PByte/sec ~100-400 MBytes/sec Online System Offline Farm,CERN Computer Ctr ~25 TIPS Tier0 +1 10-40 Gbits/sec HPSS Tier 1 France Italy UK BNL Center Tier 2 ~10+ Gbps Tier 3 Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels Institute ~0.25TIPS Institute Institute Institute 100 - 10000 Mbits/sec Physics data cache Tier 4 Workstations ATLAS version from Harvey Newman’s original Shawn McKee - PASI - Mendoza, Argentina
Managing LHC Scale Data Current Planning, Tools, Middleware and Testbeds
Grids and Networks: Why Now? • Moore’s law improvements in computing produce highly functional end systems • The Internet and burgeoning wired and wireless provide ~universal connectivity • Changing modes of working and problem solving emphasize teamwork, computation • Network exponentials produce dramatic changes in geometry and geography Shawn McKee - PASI - Mendoza, Argentina
Living in an Exponential World(1) Computing & Sensors Moore’s Law: transistor count doubles each ~18 months Magnetohydro- dynamics star formation Shawn McKee - PASI - Mendoza, Argentina
Living in an Exponential World:(2) Storage • Storage density doubles every ~12 months • This led to a dramatic growth in HEP online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) • 2000 ~0.5 petabyte • 2005 ~10 petabytes • 2010 ~100 petabytes • 2015 ~1000 petabytes • Its transforming entire disciplines in physical and, increasingly, biological sciences; humanities next? Shawn McKee - PASI - Mendoza, Argentina
Network Exponentials • Network vs. computer performance • Computer speed doubles every 18 months • Network speed doubles every 9 months • Difference = order of magnitude per 5 years • 1986 to 2000 • Computers: x 500 • Networks: x 340,000 • 2001 to 2010 • Computers: x 60 • Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins. Shawn McKee - PASI - Mendoza, Argentina
The Network • As can be seen in the previous transparency, it can be argued it is the evolution of the network which has been the primary motivator for the Grid. • Ubiquitous, dependable worldwide networks have opened up the possibility of tying together geographically distributed resources • The success of the WWWfor sharing information has spawned a push for a system to share resources • The network has become the “virtual bus” of a virtual computer. • More on this later… Shawn McKee - PASI - Mendoza, Argentina
What Is Needed for LHC-HEP? • We require a number of high level capabilities to do High-Energy Physics: • Data Processing: All data needs to be reconstructed, first into fundamental components like tracks and energy deposition and then into “physics” objects like electrons, muons, hadrons, neutrinos, etc. • Raw -> Reconstructed ->Summarized • Simulation, same path. Critical to understanding our detectors and the underlying physics. • Data Discovery: We must be able to locate events of interest • Data Movement: We must be able to move discovered data as needed for analysis or reprocessing • Data Analysis: We must be able to apply our analysis to the data to determine if • Collaborative Tools:Vital to maintain our global collaborations • Policy and Resource Management:Allow resource owners to specify conditions under which they will share and allow them to manage those resources as they evolve Shawn McKee - PASI - Mendoza, Argentina
Monitoring Example on OSG-ITB Shawn McKee - PASI - Mendoza, Argentina
Managing LHC Scale Data HEP Related Grid/Network Projects
The Evolution of Data Movement • The recent history of data movement capabilities exemplifies the evolution of network capacity. • NSFNet started with a 56Kbit modem link as the US network backbone • Current networks are so fast that end systems are only able to fully drive them when storage clusters are used at each end Shawn McKee - PASI - Mendoza, Argentina
256 s (4 min) 1024 s (17 min) 150,000 s (41 hrs) 4 MB/s 1 MB/s .007 MB/s NSFNET 56 Kb/s Site Architecture Bandwidth in terms of burst data transfer and user wait time. VAX Fuzzball Across the room Across the country 1024MB Shawn McKee - PASI - Mendoza, Argentina
2000 s (33 min) 13k s (3.6h) 0.5 GB/s 78 MB/s 2002 Cluster-WAN Architecture OC-48 Cloud OC-12 n x GbE (small n) Across the room Across the country 1 TB Shawn McKee - PASI - Mendoza, Argentina
2000 s (33 min) 5 GB/s* (Wire speed limit…not yet achieved) Distributed Terascale Cluster Interconnect Big Fast Interconnect OC-192 n x GbE (large n) 10 TB 10 TB Shawn McKee - PASI - Mendoza, Argentina
UltraLight Goal (Near Future) • A more modest goal in terms of bandwidth achieved is being targeted by the UltraLight collaboration. • Build, tune and deploy moderately priced servers capable of delivering 1 GB/s between 2 such servers over the WAN • Provides the ability to utilize the full capability of lambda’s, as available, without requiring 10-100’s of nodes at each end. • Easier to manage, coordinate and deploy a smaller number of performant servers than a much larger number of less capable ones • Easier to scale-up as needed to match the available bandwidth Shawn McKee - PASI - Mendoza, Argentina
What is UltraLight? • UltraLight is a program to explore the integration of cutting-edge network technology with the grid computing and data infrastructure of HEP/Astronomy • The program intends to explore network configurations from common shared infrastructure (current IP networks) thru dedicated optical paths point-to-point. • A critical aspect of UltraLight is its integration with two driving application domains in support of their national and international eScience collaborations: LHC-HEP and eVLBI-Astronomy • The Collaboration includes: • Caltech • Florida Int. Univ. • MIT • Univ. of Florida • Univ. of Michigan • UC Riverside • BNL • FNAL • SLAC • UCAID/Internet2 Shawn McKee - PASI - Mendoza, Argentina
UltraLight Network: PHASE I • Implementation via “sharing” with HOPI/NLR • MIT not yet “optically” coupled Shawn McKee - PASI - Mendoza, Argentina
UltraLight Network: PHASE III By 2008 • Move into production – Terabyte datasets in 10 minutes • Optical switching fully enabled amongst primary sites • Integrated international infrastructure Shawn McKee - PASI - Mendoza, Argentina
ATLAS Discovery Potential for SM Higgs Boson • Good sensitivity over the full mass range from ~100 GeV to ~ 1 TeV • For most of the mass range at least two channels available • Detector performance is crucial: b-tag, leptons, g, E resolution, g / jet separation, ... Shawn McKee - PASI - Mendoza, Argentina
ATLAS Shawn McKee - PASI - Mendoza, Argentina
Data IntensiveComputing and Grids • The term “Data Grid” is often used • Unfortunate as it implies a distinct infrastructure, which it isn’t; but easy to say • Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, … • Security, resource mgt, info services, etc. • Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained • Fortunately this seems easy to do! Shawn McKee - PASI - Mendoza, Argentina
A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations Logical Collection and Logical File Name MDS Selected Replica Replica Selection Performance Information & Predictions NWS GridFTP Control Channel Disk Cache GridFTPDataChannel TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3 Shawn McKee - PASI - Mendoza, Argentina
Examples ofDesired Data Grid Functionality • High-speed, reliable access to remote data • Automated discovery of “best” copy of data • Manage replication to improve performance • Co-schedule compute, storage, network • “Transparency” wrt delivered performance • Enforce access control on data • Allow representation of “global” resource allocation policies • Not there yet! Back to the physics… Shawn McKee - PASI - Mendoza, Argentina
Needles in LARGE Haystacks • When protons collide, some events are "interesting" and may tell us about exciting new particles or forces, whereas many others are "ordinary" collisions (often called "background"). The ratio of their relative rates is about 1 interesting event for 10 million background events. One of our key needs is to separate the interesting events from the ordinary ones. • Furthermore the information must be sufficiently detailed and precise to allow eventual recognition of certain "events" that may only occur at the rate of one in one million-million collisions (10-12), a very small fraction of the recorded events, which are a very small fraction of all events. • I will outline the steps ATLAS takes in getting to these interesting particles Shawn McKee - PASI - Mendoza, Argentina