400 likes | 740 Views
Grid Computing. 关于 GRID 的基本概念 GRID 技术产生的背景 GRID 技术要点 GRID 技术的发展现状 HEP 领域中的 GRID 技术. GRID 的基本概念. GRID 技术代表一种机制,用于集成或共享地理上分布的各种物理或逻辑资源(包括各类 CPU 、存储系统、 I/O 设备、通信系统、文件、数据库、程序等),使之成为一个有机的整体,共同完成各种所需任务。它的主要应用包括了分布式计算、高性能计算、协同工程、计算密集型和数据密集型科学计算。. GRID 技术产生的背景. 分布式计算 高性能计算 大规模的资源共享 协同工作
E N D
关于GRID的基本概念 GRID技术产生的背景 GRID技术要点 GRID技术的发展现状 HEP领域中的GRID技术
GRID的基本概念 • GRID技术代表一种机制,用于集成或共享地理上分布的各种物理或逻辑资源(包括各类CPU、存储系统、I/O设备、通信系统、文件、数据库、程序等),使之成为一个有机的整体,共同完成各种所需任务。它的主要应用包括了分布式计算、高性能计算、协同工程、计算密集型和数据密集型科学计算。
GRID技术产生的背景 • 分布式计算 • 高性能计算 • 大规模的资源共享 • 协同工作 • 计算密集型和数据密集型的科学计算
GRID计算环境的基本要求 • 动态自适应性 • 安全性 • 异构性 • 可扩展性
GRID技术要点—基本服务功能 • 通信服务 • 信息服务 • 安全认证 • 名字服务 • 监视系统 • 资源管理和调度 • 资源交易机制 • 编程工具 • 用户图形界面
Application Application Collective Resource Transport Connectivity Internet Fabric Link Grid协议体系结构 Internet协议体系结构 GRID技术要点—协议分层体系结构
GRID技术要点—组成结构 • 网格结点 • 中间件 • 开发环境和工具层 • 应用层
GRID技术与其它相关技术的比较 • WWW • ASP或SSP(IDC) • Enterprise Computing System(CORBA) • Internet and P2P Computing
GRID技术的发展现状 • 由于GRID是Internet发展的高级形式,因此,受到世界各国和组织的高度重视,已经开展了许多论坛、实验环境和研究项目。
GRID技术的发展现状 • European DataGrid • Grid Physics Network (GriPhyN) • Network for Earthquake Engineering and Simulation (NEESgrid) • US Department of Energy (DOE): DOE Science Grid • US National Aeronautics and Space Administration (NASA): Information Power Grid • US National Science Foundation (NSF): National Computational Science Alliance (NCSA) National Technology Grid • German Federal Ministry for Education and Research (BMBF): UNICORE • Globus • Legion • Condor • SinRG • EcoGrid
HEP领域中的GRID技术 高能物理领域对计算技术的需求历来走在时代的前列,关于GRID技术的研究也不例外 • The Particle Physics Data Grid(PPDG) • High Energy Physics Data Grid (HEPDG) • MONARC
PPDG ---参与者 • California Institute of Technology • Argonne National Laboratory • Berkeley Laboratory • Brookhaven National Laboratory • Fermi National Laboratory • San Diego Supercomputer Center • Stanford Linear Accelerator Center • University of Wisconsin
PPDG ---主要目标 • delivery of an infrastructure for widely distributed analysis of particle physics data at multi-petabyte scales by thousands of physicists • acceleration of the development of network and middleware infrastructure aimed broadly at data-intensive collaborative science.
PPDG ---技术方案 第一步(1999年): • delivery of “High-Speed Site-to-Site File Replication Service” • delivery of “Multi-Site Cached File Access service”
PPDG ---技术方案 第二步(2000-2001年): • Development of a generalized file-mover framework(supporting QoS) • Implementation/generalization of the cataloging, resource broker and matchmaking services needed as foundations for both transparent write access and agent technology • Implementation of transparent write access for files • Implementation of limited support for ‘agents • Implementation of distributed resource management for the Data Grid • Major efforts on robustness and rapid problem diagnosis, both at the component level and at the architectural level
PPDG ---技术方案 长远目标 : • The system should use static and mobile autonomous agents to carry out well-defined tasks • The system should be resilient, predictive/adaptive • Prioritization of tasks should be based both on policies and marginal utility • Co-scheduling algorithms ("matchmaking") should be used to match requests to resources within a time quantum, and the outcomes of matchmaking will affect indices used to measure marginal utility • Transaction management should utilize the cost estimators mentioned above, as well as checkpoint/rollback mechanisms
PPDG ---已有技术基础 • ANL: Globus Grid Middleware Services • SLAC: The Objectivity Open File System (OOFS) • Caltech: The Globally Interconnected Object Databases (GIOD) project • FNAL: A Data Access framework (SAM) • LBNL: The Storage Access Coordination System (STACS) • ANL: Scalable Object Storage and Access • U. Wisconsin: Condor Distributed Resource Management System • SDSC: The Storage Resource Broker (SRB)
HEPDG ---参与者 • some institutes,funding agencies and industrial companies,with CERN as the lead partner and coordinator
HEPDG ---目标 • Exploiting the Grid concept and technology in the field of HEP • Deploying a large scale implementation of a computational and data Grid using technology developed by existing projects,complemented by the middleware and tools necessary for the data-intensive applications of HEP
HEPDG ---技术方案 • The overall topology of the Grid would follow that of the MONARC Regional Centre model, with a number of national grids (equivalent to the Tier 1 and Tier 2 Regional Centres) interconnected by a central node at CERN.
MONARC ---背景 • Each LHC experiment foresees a recorded raw data rate of 1 PetaByte/year (or 100 MBytes/sec during running) at the start of LHC operation • The combined raw and processed data of the experiments will approach 100 PetaBytes by approximately 2010
MONARC ---目标 • identifying baseline Computing Models that could provide viable (and cost-effective) solutions to meet the data analysis needs of the LHC experiments • providing a simulation toolset that will enable further Model studies • providing guidelines for the configuration and services of Regional Centres
MONARC ---基于RC计算模式的分层结构 • Tier-0: CERN, which acts also as a Tier-1 centre; • Tier-1: a large Regional Centre serving one or more nations and providing a large capacity, and many services, including substantial production capabilities, with excellent support • Tier-2: smaller centres, serving a single nation, a physical region within a nation, or a logical grouping of tasks within a nation or physical region, providing less expensive facilities, mostly dedicated to final physics analysis • Tier-3: institute workgroup servers, satellites of Tier-2 and/or Tier-1 • Tier-4: individual desktops
MONARC ---基于RC计算模式的优势 • to maximise the intellectual contribution of physicists all over the world, without requiring their physical presence at CERN. • an acknowledgement of the facts of life about network bandwidths and costs • RCs provide a way to utilise the expertise and resources residing in computing centres throughout the world.
MONARC---RC计算能力需求分析 Basic Assumptions: • For a typical large LHC experiment, the data-taking estimate is: • 1 PB raw data per year per experiment • 109 events (1 MB each) per year per experiment • 100 days of data taking (i.e. 107 events per day per experiment)
MONARC--- CERN计算能力需求分析 CERN will have the original or master copy of the following data: • the raw data; • the master copy of the calibration data; and • a complete copy of all ESD (reconstructed), AOD (DST), and TAG (thumbnails, nanoDST) data.
Table 3: Summary of required installed capacity year 2004 2005 2006 2007 total cpu (SI95) 70'000 350'000 520'000 700'000 disks (TB) 40 340 540 740 LAN thr-put (GB/sec) 6 31 46 61 tapes (PB) 0.2 1 3 5 tape I/O (GB/sec) 0.2 0.3 0.5 0.5 approx box count 250 900 1400 1900 MONARC--- CERN计算能力需求分析 支持一次LHC实验所需的计算能力预测
MONARC--- Tier-1 RC 总体结构
MONARC--- Tier-1 RC 工作流图
MONARC--- Tier-1 RC 数据流图
MONARC--- Tier-1 RC计算能力需求 Data Import: From CERN: • [5%] of raw data X <1MB>/raw data event X <109> events/yr) = 50TB/yr • [50%] of ESD data X <100KB>/event X <109> events = 50TB/yr • [100%] of AOD data X <10KB>/event X <109> events = 10TB/yr • [20%] recalculated ESD data X <100KB>/event X <109> events = 20TB/yr From Tier 2 centres: • All revisions of ESD and AOD data, • assumed [10%] of events = 10TB + 1TB/yr From simulation centres: • All simulated data, assumed • [50] samples of [106] events X <2MB>/simulated event =100TB/yr
MONARC--- Tier-1 RC计算能力需求 Data Export: To CERN: • All recalculated ESD data: =10 TB/yr • All simulation ESD data: =10 TB/yr • All locally generated AOD data: =8 TB/yr To Tier 2 centres: • Selections of ESD, AOD and DPD data: [15] TB/yr To local institutes: • Selections of ESD, AOD and DPD data: [20] TB/yr
Tier-1 RC计算能力需求 Data Storage: Mass Storage • Raw data: [5%] of 1 year’s data (5X107 events) = 50TB • Raw (simulated) data: all regional data (108 events) = 200TB • ESD data: 50% of 2 year’s data = 109 events = 100TB • AOD data: All of 2 year’s data = 2X109 events = 20TB • Tag data: All: 2TB • Calibration/conditions data base (latest version): 10TB • Central Disk Cache: [100]TB
Tier-1 RC计算能力需求 • Total Processing Power Total amount of CPU power at the regional centre for this example is roughly 10K SI95 for reconstruction, 20K SI95 for production analysis, and 30K SI95 for individual analysis (or .4, .8, and 1.2 TIPs) for a total of 2.4X106 MIPs