560 likes | 788 Views
Grid Computing (Special Topics in Computer Engineering). Veera Muangsin 23 January 200 4. Outline. High-Performance Computing Grid Computing Grid Applications Grid Architecture Grid Middleware Grid Services. High-Performance Computing. World’s Fastest Computers: The Top 5.
E N D
Grid Computing(Special Topics in Computer Engineering) Veera Muangsin 23 January 2004
Outline • High-Performance Computing • Grid Computing • Grid Applications • Grid Architecture • Grid Middleware • Grid Services
World’s Fastest Computers: The Top 5 mega = 106 (ล้าน) giga = 109 (พันล้าน) , tera = 1012 (ล้านล้าน) , peta = 1015(พันล้านล้าน)
5,120 Specifications Total number of processors Peak performance / processor 8 Gflops Total number of nodes 640 Peak performance / node 64 Gflops Total peak performance 40 Tflops 16 GB Total main memory 10 TB Shared memory #1 Japan’s Earth Simulator
Parallel decomposition Grid points: 3840*1920*96 Spectral space Grid space FFT J=1920 PN01 PN01 J=1920 PN02 PN02 PN03 PN03 . . . Inversed FFT . . . K=96 K=96 PN320 PN320 I=3840 Earth Simulator does climate modeling
Being constructed by IBM • To be completed in 2006 • Expected performance: 1 PetaFLOPS, to be no.1 in the TOP500 list (in 2003 the aggregated performance of TOP500 machines is 528 TFlops) • Applications: molecular dynamics, protein folding, drug-protein interaction (docking)
Clusters The most common architecture in the TOP500 • 7 in the top 10 • 208 from 500
#2 LANL’s ASCI Q • 13.88 TFlops • 8192-node cluster HP AlphaServer 1.25 GHz • LANL (Los Alamos National Laboratory) • Analyze and predict the performance, safety, and reliability of nuclear weapons
#3 Virginia Tech’s System X • 10.28 TFlops • 1,100-node cluster, Apple G5, Dual PowerPC970 2GHz, 4GB memory, 160GB disk (total 176 TB), Mac OS X (FreeBSD based UNIX) • $5.2 millions
System X’s Applications • Nanoscale Electonics • Quantum Chemistry • Computational Chemistry/Biochemistry • Computational Fluid Dynamics • Computational Acoustics • Ecomputational Electromagnetics • Wireless Systems Modeling • Large scale Network emulation
#4 NCSA’s Tungsten • 9.81 TFlops • 1,450-node cluster, dual-processor Dell PowerEdge 1750, Intel Xeon 3.06 GHz • NCSA (National Center for Supercomputing Applications)
#5 PNNL’s MPP2 • 8.63 TFlops • 980-node cluster, HP Longs Peak, dual Intel Itanium-2 1.5 GHz • PNNL (Pacific Northwest National Laboratory) • Application: Molecular Science
Evaluate AIDS drugs at home • 9,020 users (12 Jan 2004) • AutoDock: predict how drug candidates, might bind to a receptor of HIV’s protein
Scientific Applications • Always push computer technology to its limit • Grand Challenge applications • Those applications that cannot be completed with sufficient accuracy and timeliness to be of interest, due to limitations such as speed and memory in current computing systems • Next challenge: large scale collaborative problems
E-Science: a new way to do science • Pre-electronic science • Theorize and/or experiment, in small teams • Post-electronic science • Construct and mine very large databases • Develop computer simulations & analyses • Access specialized devices remotely • Exchange information within distributed multidisciplinary teams
Data Intensive Science: 2000-2015 • Scientific discovery increasingly driven by IT • Computationally intensive analyses • Massive data collections • Data distributed across networks of varying capability • Geographically distributed collaboration • Dominant factor: data growth • 2000 ~0.5 Petabyte • 2005 ~10 Petabytes • 2010 ~100 Petabytes • 2015 ~1000 Petabytes? • Storage density doubles every 12 months • Transforming entire disciplines in physical and biological sciences
Network • Network vs. computer performance • Computer speed doubles every 18 months • Network speed doubles every 9 months • Difference = order of magnitude per 5 years • 1986 to 2000 • Computers: x 500 • Networks: x 340,000 • 2001 to 2010 • Computers: x 60 • Networks: x 4000
software computers sensor nets instruments colleagues data archives E-Science Infrastructure
Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination desktop & VR clients with shared controls real-time collection archival storage tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
Data Intensive Physical Sciences • High energy & nuclear physics • Including new experiments at CERN • Astronomy: Digital sky surveys • Time-dependent 3-D systems (simulation, data) • Earth Observation, climate modeling • Geophysics, earthquake modeling • Fluids, aerodynamic design • Pollutant dispersal scenarios
Data Intensive Biology and Medicine • Medical data • X-Ray • Digitizing patient records • X-ray crystallography • Molecular genomics and related disciplines • Human Genome, other genome databases • Proteomics (protein structure, activities, …) • Protein interactions, drug delivery • 3-D Brain scans
What is Grid? Google Search (Jan 2004) “grid computing” >600,000 hits “grid computing” AND hype >20,000 hits (hype = โฆษณาชวนเชื่อ)
1989: Tim Berners-Lee invented the web • so physicists around the world could share documents • 1999: Grids add to the web • computing power, data management, instruments • E-Science From Web to Grid • Commerce is not far behind
The Grid Opportunity:e-Science and e-Business • Physicists worldwide pool resources for peta-op analyses of petabytes of data • Engineers collaborate to design buildings, cars • An insurance company mines data from partner hospitals for fraud detection • An enterprise configures internal & external resources to support e-Business workload
Grid • “We will give you access to some of our computers and instruments if you give us access to some of yours.” • “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
Grid • Grid provides the infrastructure • to dynamically managed: • Compute resources • Data sources (static and live) • Scientific Instruments (Wind Tunnels, Telescopes, Microscopes, Simulators, etc.) • to build large scale collaborative problem solving environments that are: • cost effective • secure
DATA ACQUISITION ADVANCEDVISUALIZATION PROCESSING,ANALYSIS NETWORK COMPUTATIONALRESOURCES IMAGING INSTRUMENTS LARGE DATABASES Life Sciences
Data mining on genomic databases (exponential growth) Indexing of medical databases (Tb/hospital/year) Collaborative framework for large scale experiments Parallel processing for Databases analysis Complex 3D modelling Biomedical applications
Digital Radiology on the Grid • 28 petabytes/year for 2000 hospitals • must satisfy privacy laws University of Pennsylvania
Brain Imaging • Biomedical Informatics Research Network [BIRN] Reference set of brains provides essential data for developing therapies for neurological disorders (Multiple Sclerosis, Alzheimer’s disease). • Pre-BIRN: • One lab, small patient base • 4 TB collection • With TeraGrid • Tens of collaborating labs • Larger population sample • 400 TB data collection: more brains, higher resolution • Multiple-scale data integration, analysis
Earth Observations • ESA missions: • about 100 Gbytes of data per day (ERS 1/2) • 500 Gbytes, for the next ENVISAT mission
Particle Physics • Simulate and reconstruct complex physics phenomena millions of times
wing models • lift capabilities • drag capabilities • responsiveness stabilizer models airframe models • deflection capabilities • responsiveness crew capabilities - accuracy - perception - stamina - reaction times - SOP’s engine models • braking performance • steering capabilities • traction • dampening capabilities human models • thrust performance • reverse thrust performance • responsiveness • fuel consumption landing gear models Whole-system Simulations NASA Information Power Grid: coupling all sub-system simulations
stabilizer models GRC engine models 44,000 wing runs 50,000 engine runs wing models airframe models 66,000 stabilizer runs ARC LaRC Virtual National Air Space VNAS 22,000 commercial US flights a day 22,000 airframe impact runs • FAA ops data • weather data • airline schedule data • digital flight data • radar tracks • terrain data • surface data simulation drivers 48,000 human crew runs 132,000 landing/ take-off gear runs landing gear models human models National Airspace Simulation Environment NASA Information Power Grid: aircraft, flight paths, airport operations and the environment are combined to get a virtual national airspace
in-flight data global network eg SITA ground station airline DS&S Engine Health Center internet, e-mail, pager data centre maintenance centre Global In-flight Engine Diagnostics Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
Emergency Response Teams • Bring sensors, data, simulations and experts together • wildfire: predict movement of fire & direct fire-fighters • also earthquakes, peacekeeping forces, battlefields,… National Earthquake Simulation Grid Los Alamos National Laboratory: wildfire
Grid C Grid Computing Today omputing Today DISCOM SinRG APGrid IPG …
g g g g g g Name URL & Sponsors Focus Access Grid www.mcs.anl.gov/FL/accessgrid; DOE, NSF Create & deploy group collaboration systems using commodity technologies BlueGrid IBM Grid testbed linking IBM laboratories DISCOM www.cs.sandia.gov/discomDOE Defense Programs Create operational Grid providing access to resources at three U.S. DOE weapons laboratories DOE Science Grid sciencegrid.org DOE Office of Science Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities Earth System Grid (ESG) earthsystemgrid.orgDOE Office of Science Delivery and analysis of large climate model datasets for the climate research community European Union (EU) DataGrid eu-datagrid.org European Union Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics Selected Major Grid Projects New New
g g g g g g Name URL/Sponsor Focus EuroGrid, Grid Interoperability (GRIP) eurogrid.org European Union Create tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus Toolkit™ Fusion Collaboratory fusiongrid.org DOE Off. Science Create a national computational collaboratory for fusion research Globus Project™ globus.org DARPA, DOE, NSF, NASA, Msoft Research on Grid technologies; development and support of Globus Toolkit™; application and deployment GridLab gridlab.org European Union Grid technologies and applications GridPP gridpp.ac.uk U.K. eScience Create & apply an operational grid within the U.K. for particle physics research Grid Research Integration Dev. & Support Center grids-center.org NSF Integration, deployment, support of the NSF Middleware Infrastructure for research & education Selected Major Grid Projects New New New New New
g g g g g g Name URL/Sponsor Focus Grid Application Dev. Software hipersoft.rice.edu/grads; NSF Research into program development technologies for Grid applications Grid Physics Network griphyn.org NSF Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies & applications Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments Selected Major Grid Projects New New
g g Name URL/Sponsor Focus TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K. Unicore BMBFT Technologies for remote access to supercomputers Selected Major Grid Projects New New Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS See also www.gridforum.org
TeraGrid • 13.6 trillion calculations per second • Over 600 trillion bytes of immediately accessible data • 40 gigabit per second network speed