380 likes | 508 Views
Grid Network Monitoring in the European DataGRID project. Pascale VICAT-BLANC/PRIMET Pascale.Primet@ens-lyon.fr INRIA-RESO Université de Lyon1 - Ecole Normale Supérieure de Lyon France. Grid High Performance Networking. INRIA RESO & UREC teams at LIP/ ENS-Lyon
E N D
Grid Network Monitoring in the European DataGRID project Pascale VICAT-BLANC/PRIMET Pascale.Primet@ens-lyon.fr INRIA-RESO Université de Lyon1 - Ecole Normale Supérieure de Lyon France
Grid High Performance Networking INRIA RESO & UREC teams at LIP/ ENS-Lyon http://www.ens-lyon.fr/LIP/RESO Manage the Network workpackage of the EU DataGRID project : http://ccwp7.in2p3.fr http://eu-datagrid.web.cern.ch/eu-datagrid Contact : Pascale.Primet@ens-lyon.fr French National Grid High Performance Platform over the DWDM VTHD network (2.5Gb/s) thee-toile project Connected through the EU DataTAG 1Gb/s link to Chigago P.VICAT-BLANC/PRIMET
Starlight Chicago 1Gb/s experimental EDT link EDF/Clamart PRISM CERN SUN CEA/SACLAY ENS RESO/LIP New router New VTHD site VTHD site P.VICAT-BLANC/PRIMET
Grid High Performance Networking • Performances measurement and monitoring • for Resource Broker Services (optimisation). • to identify bottlenecks and points of unreliability • High Performance transport protocol • TCP optimizations • Non TCP High Performance Transport Protocol • Reliable multicast • IP Differentiated Services models • Alternative Differentiated Services Models (EDS, QBSS..) • Premium service experiments/evaluation with Bio apps • Active Grid concept • Convergence of Grid and Active Network Technology • Deployment of Active Nodes over the e-toile National Grid High Performance testbed P.VICAT-BLANC/PRIMET
Outline • Grid and Network issues. • The DataGRID testbed • Grid Network monitoring architecture • MapCenter tool dedicated to Grid status visualisation P.VICAT-BLANC/PRIMET
Grid concept • « Anatomical View »: complex collection of shared ressources to create an integrated virtual environment • « Physiological View» : complex collection of distributed services • Main objectives: • Functional : • study new and larger problems • Performances : • diminish the global processing time P.VICAT-BLANC/PRIMET
« Physical » view of a Grid Network Public Network No security No predictable performances No control on the traffic The flat INTERNET Resource = CE (computing element) or Resource = SE (storage element) P.VICAT-BLANC/PRIMET
Network Performance impact • The IP Technology • long distance connectivity • but • Time= f(Tcomputation,Ttransfer) • and • Ttransfer = Volume/Rate • Stability and predictability of network performances are key factors P.VICAT-BLANC/PRIMET
Data transfer : typical scenario Processing center Input Data Output Data Network Job execution User Storage center P.VICAT-BLANC/PRIMET
Where is the bottleneck? Network File Transfer Time (Ttrans) Wide Area Network Local System Local System Local Area Network Local Area Network Where is the bottleneck? P.VICAT-BLANC/PRIMET
European DataGRID project • 7 applications distributed among 6 virtual organisations • 11 organisations over 15 countries • 40 sites in Europe • Based on the European GEANT backbone, National NREN’s (SuperJanet, GARR, Renater, SurfNet…) and local area network (CERN, IN2P3…) • more informations :http://ccwp7.in2p3.fr P.VICAT-BLANC/PRIMET
EDG Network Infrastructure P.VICAT-BLANC/PRIMET
European DataGRID testbed P.VICAT-BLANC/PRIMET
Grid High Performance Networking • Collaboration with providers • Specific requirements studies • Available infrastructure and services studies • Enhanced Network services tests • Collaboration with users • End to end monitoring • E2E performances problems identification • Network cost functions for scheduling • Transport protocols studies and optimisation P.VICAT-BLANC/PRIMET
CERN TIER 0 TIER 1 INFN RAL … IN2P3 … TIER 2 Lab 1 Lab 2 Lab n … Departments a b c Desktops … Application requirements studies P.VICAT-BLANC/PRIMET
Network performances measurement For Provisioning: • To be available, via visualization to human observer (user, network/system administrators) • To provide tools for network performances measurement, problems identification and resolution (bottlenecks, point of unreliability, quality of service needs, topology…) • To achieve network performance forecast and optimization – Capacity planning P.VICAT-BLANC/PRIMET
Network performances measurement For scheduler/ resource manager…: • Network performance parameters are used for optimizing resource allocation (replication, MPI, Remote file access…) • Network performance metrics must • be published to the Grid Information System • Be accessible through aggregated functions called by Grid resource broker services (computing and data storage). P.VICAT-BLANC/PRIMET
Architectural design (GMA) • four functional units : • monitoring tools or sensors • a repository for collected data; • the means for data analysis to generate network metrics; • the means to access and to use the derived metrics. • See our D7.2 document on WP7 EDG site P.VICAT-BLANC/PRIMET
Network Monitoring Architecture Network managers Resource Broker P_RTPL MapCenter P_NWS Middleware Publication LDAP Forecaster Analysis Data processor Raw Repository Data Collector Sensors PingEr IPerf GridFTP SNMP RTPL … P.VICAT-BLANC/PRIMET
Schema and LDAP backend • Grid applications/mw are able to access network monitoring metrics via LDAP services according to a defined LDAP schema. • LDAP back end makes measurements visible through the Globus GIIS/GRIS . • Current metric information stored in local network monitoring stations. • R-GMA is tested as an alternative solution to Globus MDS http://ccwp7.in2p3.fr P.VICAT-BLANC/PRIMET
Measurement methods • Active methods • Injection of traffic inside the network for testing performances between two points • problem: may be intrusive (TCP/UDP throughput) • Passive methods • Collect traffic informations in one point of the network : router, switch, dedicated passive host, computing element or storage element (GRIDftp logs)… • Problem : give network usage, not capacity P.VICAT-BLANC/PRIMET
Two way metrics • One way metrics Tested system PingER GPS Time RIPE 2 RIPE 1 P.VICAT-BLANC/PRIMET
Passive measurement Passive measures at one point P.VICAT-BLANC/PRIMET
Metrics and tools • Round Trip Delay => PinGER • Packet Loss => PinGER • TCP throughput => IPerfER • UDP throughput => UDPMon • site connectivity => MapCenter • service availability => MapCenter • OneWay metrics => RIPEncc test boxes P.VICAT-BLANC/PRIMET
PingER Output: Daresbury and CERN P.VICAT-BLANC/PRIMET
RIPEncc Output: Daresbury and CERN P.VICAT-BLANC/PRIMET
Some results • from testbed sites to CERN • On 1 day or on 1 month • Pinger RTT: Average: 25ms • Pinger Loss Rate : 3% • OWD: average: 9ms • OWL: average 0 to 0,3% • TCP Throughput : from 0 to 350Mb/S P.VICAT-BLANC/PRIMET
Heterogeneous E2E performances P.VICAT-BLANC/PRIMET
InsufficientE2E performances Mesures sur Liaison CERN – Lyon à 155 Mb/s (06/ 2002) with UDP loss l =20% rate r =80Mb/s With TCP rate r <20Mb/s With ICMP Delay d = 18ms P.VICAT-BLANC/PRIMET
EDG Grid Network Element Concept • Network Element Concept : • Shared ressource • Set of virtual links P.VICAT-BLANC/PRIMET
Network Cost functions Network metrics published in LDAP repositories are used by resource brokers and replica managers through network cost functions : TransferTime = networkCost (SE1, SE2, filesize) Computed from • TCP throughput measurements (aggregated) • GridFTP logs • RTT Measurements (aggregated) P.VICAT-BLANC/PRIMET
MAPCenter et TopoGRID • Provide logical and up to date information on the actual accessibility of the Grid resources and services. • A flexible presentation layer of services and applications status of the grid. • MapCenter polls periodically computing elements and services with different methods and displays various views of the grid status on a Web site. • ccwp7.in2p3.fr/mapcenter P.VICAT-BLANC/PRIMET
Map Center features • Multi-threaded • HTML pages generation as fixed rate • Configurable • A configuration file on the server defines timeouts, numbers of threads, length of the history and the frequency of HTML pages generation • Flexibility • This model is very flexible, and any representation (geographical, virtual organization, graphical...) can be easily done by modifying the data file • Can be distributed (per country, per VO…) P.VICAT-BLANC/PRIMET
TopoGRID • To dynamically discover the topology of your GridNetwork…. P.VICAT-BLANC/PRIMET