180 likes | 335 Views
Enabling Platforms for high-performance computational Grids oriented to scalable virtual organization (GRID.IT). P. Castoldi , F. Baroncelli, F. Cugini, B. Martini, V. Martini, F. Paolucci, L. Valcarenghi. TERENA Workshop on "Service Oriented Optical Networks“ Catania, May 14 th 200 6.
E N D
Enabling Platforms for high-performance computational Grids oriented to scalable virtual organization (GRID.IT) P. Castoldi, F. Baroncelli, F. Cugini, B. Martini, V. Martini, F. Paolucci, L. Valcarenghi TERENA Workshop on "Service Oriented Optical Networks“ Catania, May 14th 2006
Facts about the GRID.IT project 15 Workpackages WP1 - Grid Oriented Optical Switching Paradigms WP2 - High Performance Photonic Testbed WP3 - Grid Deployment WP4 - Security WP5 - Data Intensive Core Services WP6 - Knowledge Services for Intensive Data Analysis WP7 - Grid Portals WP8 - High-performance Component-based Programming Environments WP9 - Grid-enabled Scientific Libraries WP10 - Grid Applications for Astrophysics WP11- Grid Applications for Earth Observation Systems Application WP12 - Grid Applications for Biology WP13 - Grid Applications for Molecular Virtual Reality WP14 - Grid Applications for Geophysics WP15 - Management • National project • funded by Ministry of University and Research under the FIRB (Fundamental Research Incentive Fund) line • Duration: 3+1 year • (Nov. ‘02 – Oct. ‘06) • 4 clusters of partners: • CNIT*, (5 universities) • UTDallas, CNIT subcontractor • CNR, National Research Council (3 institutes) • INFN,National Institute for Nuclear Physics (3 institutes) • ASI, Italian Space Agency *CNIT (National Inter-university Consortium for Telecommunications) is a non-profit Consortium of 34 Italian universities operating in the telecom area, coordinating large research initiatives with own researchers and staff from affiliated universities
Global Grid Computing Global Grid Computing expands resource horizon from LAN to WAN (not limited to optical networks ..) • Bottlenecks • Computational, storage, etc. resources (CPU) ... same as before • Network Resources become scarse and difficult to be reserved • Requirements • QoS-enabled network connectivity • Network resource monitoring, adaptation and availability • Application task staging should be network-aware • Grid user should possess some ability to trigger network connectivity • A solution - main streamline • A new functional layer, consisting of network middleware is introduced to meet the above requirements .. general concept ..
Application Entity Service Entity A network-centric view of a Grid A Grid network is an overlay L7 network on top of an independent L1/2/3 network Application L7 Grid Middleware L7 Grid Resources Scheduler GRAM GRAM = Grid Resource Allocation Manager • Grid Network Interface should (*): • hide network details (e.g., topology, configuration) to the Grid middleware • be as simple as possible • allow end-to-end, on-demand, and real-time service requests Network Interface L1/2/3 Resource Management System L1/2/3 Resources Network Configurator Network Monitoring OIF-UNI “Note that, the network service interfaces for Grid will have a higher level of abstraction (hiding details) than what is provided by a traditional Service Network or Element Management System” (*) Transport Network (*) Draft-ggf-masum-grid-network-services
How do we map the elements of these two sets? Application services and network services • Customers run application services that exploit (stack of) network protocols for connectivity needs • Application services are abstract description of application (logic) • Network protocols (transport and ancillary functions such as routing, signaling, link management) are logically classified in a few categories of network services: connectionless IP, L1/L2/L3 VPN Customer3 Customer4 Customer5 End-user Customer1 Customer2 Customer5 Application services Internet Access Hosting VoIP/MoIP Storage Grid PSTN IP Fast/Giga Ethernet POS ATM MPLS TDM Network Protocols SDH SDH SONET SDH SDH WDM services
From UNI to a service interface • Via User to Network Interface (UNI) in Control Plane-enabled ASTN/GMPLS networks client networks can request some network services but (e.g.) • it provides only point-to-point network services • it does not coordinate services provided by an arbitrary set of edge nodes • it is not designed to be used by applications • Consistently with existing approaches (IMS and NGN), and efforts of other EU project (MUPBED, NOBEL), applications (e.g. grid) should be enabled to set-up an “application platform”, i.e. a network service tailored to their needs • To this purpose, a Service Plane is used that exports a new service interface towards applications, namely the user-to-service interface (USI).
Service Plane, i.e. a network middleware, that implements a distributed signaling for CP edge nodes coordination and exports a service interface (USI) at a higher level of abstraction than the UNI. Application Host Centralized Service Plane CSE SLA Service Plane Distributed Service Plane USI DSE DSE UPI The SO-ASTN UNI Control Plane Edge CPE CPE Edge CPE NMI-A Management Plane CCI CCI CCI Client/Access Network Transport Plane NMI-T
The service interface • UPI is an interim human-to-human or machine-to-machine interface mediated by the MP currently used as a service interface. • The USI is an evolved machine-to-machine interface that must enable the application entity to require services: • provided by different administrative network domains • without dealing with the network technology details • without dealing with the network topology details • The USI must support: • both executive on multiple administrative domains or informative services on an administrative domain • the transparency of applications across multiple domains • session-based services (e.g., high-definition video-telephony) • non-session-based services (e.g., e-Business transactions)
On-Demand VPN via USIExperimental demonstration 6 1 Site A DSE 1 DSE 3 DSE 2 2 2 AE-A 5 5 B A C 3 4 ER 1 Router ID = 1.1.1.1 4 3 4 VPN 3 Site B CR 3 Site C ER 2 Router ID = 2.2.2.2 CR2 MPLS OSPF RSVP-TE AE-B ER 3 Router ID = 3.3.3.3 AE-C CR 1 1 - VPN Service Request (B,C, bandwidth) 2 - VPN Config (router ID, groups name, VPN id, bandwidth) 3 - VPN Routing Configuration (local address, groups name, routing instance) 4 - Tunnel LSP Set-up (egress router 1, bandwidth) 5 – DSE ACK 6 - VPN ACK AE = Application Entity ER = Edge Router CR = Core Router DSE = Distributed Service Element
Supporting functions for the SP RESOURCE PROVISIONING RESOURCE MONITORING • Network Topology: why? • Grid Applications need network topology to optimally allocate tasks among different sites. • A detailed topology detector is needed in order to satisfy QoS requirements • So far .. • Existing tools provide Grids with only end-to-end network parameters, not sufficient in case of guaranteed-bandwidth connection requests (LSP, VPN) • Path Computation Element (PCE): why? • Definition:Entity capable of computing a network path or route based on a network graph and applying computational constraints. • Advantages • Traffic Engineering (TE) route elaboration may be highly CPU-intensive. PCE avoids router CPU utilization. • Optimal TE solutions, administrative policies and optimal Management solutions • Useful in scenarios where the node has limited visibility of the network topology to the destination (multi-area, multi-domain,multi-layer) INTEGRATED FAULT TOLERANCE • Combining network and application resilience mechanisms: why? • Grid fault tolerant schemes alone may not be as efficient as network resilience schemes • Application layer scheme may not restore previous QoS connectivity in full
Centralized TDS 3. XML Replies 1. Topology request • Based on a central resource broker. • Broker has the routers list and administrator privileges on them. • Broker directly queries routers with router-based requests. • Three kinds of topologies can be discovered • The Grid topology is discovered or updated in time ranges of a few seconds 2. USI Queries 4. XML Topology file
TDS: XML Topologies and Retrieval Strategies Topologies TDS Triggering Mechanisms • EVENT-DRIVEN BASED • Network status changes: active network monitoring • SNMP traps sent by VO nodes • TIMEOUT BASED • Periodical polling • Delivery time <Timeout • No active monitoring TDS Update Methods • GLOBAL • Refresh entire topology at each invocation • Large number of messages exchanged • INCREMENTAL • Update of existing topology • Low network load
OSPF-TE Path Computation Element (PCE) TED <topology> <node> <node-id>10.10.1.1</node-id> <num-links>2</num-links> <link> <adj-node-id>10.10.2.1</adj-node-id> <available-bw7>1000</available-bw7> </link> <link> <adj-node-id>10.10.3.1</adj-node-id> <available-bw7>1000</available-bw7> </link> </node> ……. </topology> C elaboration XSLT elaboration <ted-database junos:style="detail"> <ted-database-id>10.10.14.1-1</ted-database-id> <ted-database-type>Net</ted-database-type> <ted-database-age>22648</ted-database-age> <ted-database-link-in>2</ted-database-link-in> <ted-database-link-out>2</ted-database-link-out> <ted-database-protocol>OSPF(0.0.0.0)</ted-database-protocol> -<ted-link junos:style="database"> <ted-link-to>10.10.13.2</ted-link-to> <ted-link-local-address>0.0.0.0</ted-link-local-address> <ted-link-remote-address>0.0.0.0</ted-link-remote-address> <ted-link-metric>0</ted-link-metric> -<switching-capability-descriptor heading="ISCD(1):"> <switching-type>Packet</switching-type> <encoding-type>Packet</encoding-type> <maximum-lsp-bw0>[0] 0bps</maximum-lsp-bw0> <maximum-lsp-bw1>[1] 0bps</maximum-lsp-bw1> <maximum-lsp-bw2>[2] 0bps</maximum-lsp-bw2> <maximum-lsp-bw3>[3] 0bps</maximum-lsp-bw3> <maximum-lsp-bw4>[4] 0bps</maximum-lsp-bw4> <maximum-lsp-bw5>[5] 0bps</maximum-lsp-bw5> <maximum-lsp-bw6>[6] 0bps</maximum-lsp-bw6> <maximum-lsp-bw7>[7] 0bps</maximum-lsp-bw7> </switching-capability-descriptor> </ted-link> ……. </ted-database> <topology> <node> <node-id>10.10.1.1</node-id> <num-links>2</num-links> <link> <adj-node-id>10.10.2.1</adj-node-id> <available-bw7>1000</available-bw7> </link> <link> <adj-node-id>10.10.3.1</adj-node-id> <available-bw7>1000</available-bw7> </link> </node> ……. </topology> 3 TED download Topology 1 LSP Traffic Matrix 2 <topology> <node> <node-id>10.10.1.1</node-id> <num-links>2</num-links> <link> <adj-node-id>10.10.2.1</adj-node-id> <available-bw7>1000</available-bw7> </link> <link> <adj-node-id>10.10.3.1</adj-node-id> <available-bw7>1000</available-bw7> </link> </node> ……. </topology> LP formulation PCE 4 <topology> <node> <node-id>10.10.1.1</node-id> <num-links>2</num-links> <link> <adj-node-id>10.10.2.1</adj-node-id> <available-bw7>1000</available-bw7> </link> <link> <adj-node-id>10.10.3.1</adj-node-id> <available-bw7>1000</available-bw7> </link> </node> ……. </topology> LP elaboration 5 Router configuration LSP strict routes PCE functions for optimal TE solution: 1, 2, 3 – Download from TE Database of relevant information, XSLT elaboration, C elaboration to produce LP formulation 4 - PCE runs LP formulation to identify Label Switch Path (LSP) traffic allocation that minimizes the maximum link bandwidth (Least-fill policy) 5 - PCE configures LSPs on every Ingress Router (strict routes) Results show that PCE performs fast and achieves optimal bandwidth utilization if compared with CSPF algorithm performed by nodes
Cooperative application-networkQoS-Aware Fault Tolerance • Assumption • Qualified applications (e.g. visualization) requires communication QoS guarantees • QoS parameter • minimum bandwidth • Objective • Maximize recovered connections and minimize required network resources upon network link failure • Possible approach • Integrating QoS unaware layer (application) and QoS capable layer (network) fault tolerance QoS aware integrated fault tolerance • QoS capable layer fault tolerance • (G)MPLS path restoration • Software layer fault tolerance • Service replication (server migration)
Integrated Fault Tolerance Advantages: Path Restoration + Service Replication Primary LSP Client Primary Video Server Backup LSP another primary LSP Backup Video Server LSP to Backup Video Server
Conclusions • The problem of providing a connection oriented service in a WAN environment to individual qualified applications (e.g. grid) have been faced from an architectural point of view with regard to • The Service Plane and service interface • A Centralized Topology Discovery Service (TDS) • Path Computation Element (PCE) • Integrated resilience scheme • But ☺ • People working on grid computing are mainly computer scientists • People working on networks are telecommunication engineers • Not easy to create a common view on the topic.
References • P. Castoldi, L. Valcarenghi, "On the Advantages of Integrating Service Migration and GMPLS Path Restoration for Grid Network Failure Recovery", 1st International Workshop on Networks for Grid Applications (Gridnets 2004) co-located with Broadnets 2004, San Jose, USA, Oct. 2004. • Barbara Martini, Fabio Baroncelli, Piero Castoldi, "A Novel Service Oriented Framework for Automatic Switched Transport Network", 9th IFIP/IEEE International Symposium on Integrated Network Management, Niece (France) 15-19 May, 2005 • F. Baroncelli, B. Martini, L. Valcarenghi, P. Castoldi, "A Service Oriented Network Architecture suitable for Global Grid Computing", Optical Networks Design and Modeling (ONDM 2005), Milan, Italy, February 2005. • L. Valcarenghi, L. Rossi, F. Paolucci, F. Cugini, P. Castoldi, "Multi-Layer Bandwidth Recovery for Multimedia Communications: an Experimental Evaluation", 1st Conference on Next Generation Internet Networks Traffic Engineering, 18-20 April 2005, Rome, Italy • Barbara Martini, Fabio Baroncelli, Piero Castoldi, Angelica Aprigliano, "Experimental validation of a service oriented network architecture applied to global Grid computing", 1st International Conference on AUtomated Production of Cross Media Content for Multi-channel Distribution (AXMEDIS '05), Firenze (Italy), 30 Nov - 2 Dec. 2005. • Barbara Martini, Fabio Baroncelli, Piero Castoldi, Americo Muchanga, Lena Wosinska, "The Service Oriented Optical Network (SOON) Project", Proc. of Reliability issues in Next Generation Optical Networks (RONEXT), COST270 WG1 workshop, colocated with ICTON 2005, July 3 - 7, 2005, Barcelona, Spain. • Luca Valcarenghi, Piero Castoldi, "QoS-Aware Connection Resilience for Network-Aware Grid Computing Fault Tolerance", Proc. of Reliability issues in Next Generation Optical Networks (RONEXT), COST270 WG1 workshop, colocated with ICTON 2005, July 3 - 7, 2005, Barcelona, Spain • Luca Valcarenghi, Francesco Paolucci, Luca Foschini, Filippo Cugini, and Piero Castoldi, "Centralized and Distributed Topology Discovery Service Implementations", 13th Annual IEEE Symposium on High Performance Interconnects, Stanford University, August 17-19, 2005. • L. Valcarenghi, L. Foschini, F. Paolucci, F. Cugini, P. Castoldi, "Topology Discovery Services for Monitoring the Global Grid", IEEE Communication magazine special issue on "Optical Control Plane for Grid Networks: Opportunities, Challenges and the Vision", March 2006, pp. 110-117. • F. Baroncelli, B. Martini, L. Valcarenghi and P. Castoldi "Service Composition in Automatically Switched Transport Networks", IEEE International Conference on Networking and Services (ICNS'06) July 16-18, 2006, Silicon Valley, USA • L.Valcarenghi and P. Castoldi, "Topology-Aware Replica Placement Hauristics in the Global Grid“ Proc. of 2° Reliability issues in Next Generation Optical Networks (RONEXT) Workshop, colocated with ICTON '06, Nottingham, U.K., 18-22 June 2006
E-mail: castoldi@sssup.it Sant’Anna School & CNIT, CNR research area, Via Moruzzi 1, 56124 Pisa, Italy