170 likes | 189 Views
George McLaughlin Mark Prior AARNet. Massive Data Transfers. “Big Science” projects driving networks. Large Hadron Collider Coming on-stream in 2007 Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site
E N D
George McLaughlin Mark Prior AARNet Massive Data Transfers
“Big Science” projects driving networks • Large Hadron Collider • Coming on-stream in 2007 • Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site • Need to transfer data to global “tier 1” sites. A tier 1 site must have a 10Gbps path to CERN • Tier 1 sites need to ensure gigabit capacity to the Tier2 sites they serve • Square Kilometre Array • Coming on-stream in 2010? • Greater data generator than LHC • Up to 125 sites at remote locations, data need to be brought together for correlation • Can’t determine “noise” prior to correlation • Many logistic issues to be addressed Billion dollar globally funded projects Massive data transfer needs
Scientists and Network Engineers coming together • HEP community and R&E network community have figured out mechanisms for interaction – probably because HEP is pushing network boundaries • eg the ICFA workshops on HEP, Grid and the Global Digital Divide bring together scientists, network engineers and decision makers – and achieve results • http://agenda.cern.ch/List.php
What’s been achieved so far • A new generation of real-time Grid systems is emerging - support worldwide data analysis by the physics community • Leading role of HEP in developing new systems and paradigms for data intensive science • Transformed view and theoretical understanding of TCP as an efficient, scalable protocol with a wide field of use • Efficient standalone and shared use of 10 Gbps paths of virtually unlimited length; progress towards 100 Gbps networking • Emergence of a new generation of “hybrid” packet- and circuit- switched networks
LHC data (simplified) 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production • Per experiment • 40 million collisions per second • After filtering, 100 collisions of interest per second • A Megabyte of digitised information for each collision = recording rate of 100 Megabytes/sec • 1 billion collisions recorded = 1 Petabyte/year CMS LHCb ATLAS ALICE
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center 2.5-10 Gbps Tier 2 Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later. Physics data cache 0.1 to 10 Gbps Tier 4 Workstations LHC Computing Hierarchy
Lightpaths for Massive data transfers • From CANARIE A small number of users with large data transfer needs can use more bandwidth than all other users
Why? • Cees de Laat classifies network users into 3 broad groups. • Lightweight users, browsing, mailing, home use. Who need full Internet routing, one to many; • Business applications, multicast, streaming, VPN’s, mostly LAN. Who need VPN services and full Internet routing, several to several + uplink; and • Scientific applications, distributed data processing, all sorts of grids. Need for very fat pipes, limited multiple Virtual Organizations, few to few, peer to peer. Type 3 users: High Energy Physics Astronomers, eVLBI, High Definition multimedia over IP Massive data transfers from experiments running 24x7
What is the GLIF? • Global Lambda Infrastructure Facility - www.glif.is • International virtual organization that supports persistent data-intensive scientific research and middleware development • Provides ability to create dedicated international point to point Gigabit Ethernet circuits for “fixed term” experiments
Huygens Space Probe – a practical example • Cassini spacecraft left Earth in October 1997 to travel to Saturn • On Christmas Day 2004, the Huygens probe separated from Cassini • Started it’s descent through the dense atmosphere of Titan on 14 Jan 2005 • Using this technique 17 telescopes in Australia, China, Japan and the US were able to accurately position the probe to within a kilometre (Titan is ~1.5 billion kilometres from Earth) • Need to transfer Terabytes of data between Australia and the Netherlands Very Long Baseline Interferometry (VLBI) is a technique where widely separated radio-telescopes observe the same region of the sky simultaneously to generate images of cosmic radio sources
AARNet - CSIRO ATNF contribution • Created “dedicated” circuit • The data from two of the Australian telescopes (Parkes [The Dish] & Mopra) was transferred via light plane to CSIRO Marsfield (Sydney) • CeNTIE based fibre from CSIRO Marsfield to AARNet3 GigaPOP • SXTransPORT 10G to Seattle • “Lightpath” to Joint Institute for VLBI in Europe (JIVE) across CA*net4 and SURFnet optical infrastructure
But……….. • 9 organisations in 4 countries involved in “making it happen” • Required extensive human-human interaction (mainly emails…….lots of them) • Although a 1Gbps path was available, maximum throughput was around 400Gbps • Issues with protocols, stack tuning, disk-to-disk transfer, firewalls, different formats, etc • Currently scientists and engineers need to test thoroughly before important experiments, not yet “turn up and use” • Ultimate goal is for the control plane issues to be transparent to the end-user who simply presses the “make it happen” icon Although time from concept to undertaking the scientific experiment was only 3 weeks……..
EXPReS and Square Kilometre Array • SKA bigger data generator than LHC • But in a remote location Australia one of countries bidding for SKA – significant infrastructure challenges Also, Eu Commision funded EXPReS project to link 16 radio telescopes around the world at gigabit speeds
In Conclusion • scientists and network engineers working together can exploit the new opportunities that high capacity networking opens up for “big science” • Need to solve issues associated with scalability, control plane, ease of use • QUESTIONS?