250 likes | 261 Views
Learn how to troubleshoot GridFTP flows using XSP and Periscope. This presentation covers the motivation, perfSONAR issues, and new components to address them, including UNIS, Periscope, XSP, and NLMI. An E2E example with GridFTP and visualizations from SC10 demo are also discussed.
E N D
Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany
Outline • Motivation • Review of perfSONAR • PerfSONAR issues • New components to address them • UNIS • Periscope • XSP • NLMI • E2E example with GridFTP • Visualizations from SC10 demo • Questions & rotten fruit Internet2 Joint Techs 2011. Clemson, SC
Motivating Use-Cases • Analyzing PBs of experimental data on an HPC cluster • Offloading or disseminating PBs of simulation output • Large data transfers source: http://xkcd.com/401/ Internet2 Joint Techs 2011. Clemson, SC
PerfSONAR Overview • Infrastructure & software for network performance analysis Discovery Data User or Application Internet2 Joint Techs 2011. Clemson, SC
Motivating questions • How can we accurately forecast application performance? • How can we detect performance anomalies in real-time? • How can we troubleshoot poor application performance? • And improve it! • ‘Shooting the gap between expectation and reality Internet2 Joint Techs 2011. Clemson, SC
PerfSONAR issues • Data is hard to find • Cannot simply ask “which MPs have data for path” • Slow • Lookups across multiple domains • Polling for data = RTT_net + Delay_DB + Delay_WS • XML serialization/deserialization • E2E analysis is difficult • No integrated host, application monitoring • Analysis/visualization done client-side and not exported • Measurement frequency is static • Always-on and lack of aggregation encourages large intervals Internet2 Joint Techs 2011. Clemson, SC
Data is hard to find Internet2 Joint Techs 2011. Clemson, SC
Unified Network Information Service (UNIS) • Merges TS & LS • Topology model • Tree of nodes at different layers (Network/Node/Port) • Relations between arbitrary nodes • Node properties • ‘GIS for networks’ • Relates MPs, MAs to topology Internet2 Joint Techs 2011. Clemson, SC
Slow Internet2 Joint Techs 2011. Clemson, SC
Periscope: Topologically aware cache • PerfSONAR requests have topological locality • Pre-fetch and cache relevant perfSONAR information • New protocols to indicate interesting sub-topologies • Analysis functions • domain-specific transformations, e.g. forecasting • visualization (whee!) • Preserve uniform perfSONAR interface User or Application perfSONAR interface Periscope MP/MA LS ... Internet2 Joint Techs 2011. Clemson, SC
Periscope data representation • Follow PerfSONAR data model • But use a simpler, more efficient format • Many good options: • JSON ✔ • BSON ✔ • Thrift • Avro • Protobuf • NetLogger Internet2 Joint Techs 2011. Clemson, SC
E2E Analysis is Difficult Internet2 Joint Techs 2011. Clemson, SC
Missing metrics Network layers End-to-end components Internet2 Joint Techs 2011. Clemson, SC
NetLogger Machine Information (NLMI) • Basic set of host probes, using /proc • Host interface statistics • TCP settings • CPU, memory • Disk I/O • Export data in Periscope data model Internet2 Joint Techs 2011. Clemson, SC
Measurement frequency is static Internet2 Joint Techs 2011. Clemson, SC
eXtensible Session Protocol (XSP) • Establishment, termination, and negotiation of a session between end-user application processes • Session = stateful layer over multiple other NE’s • In-band or OOB signaling of control information • Other metadata can also be forwarded Session App App TCP TCP xspd xspd xspd A B C NE NE NE Metadata Internet2 Joint Techs 2011. Clemson, SC
Monitoring GridFTP • GridFTP’s XIO allows interception of I/O • New XIO layer can talk to a local xspd • Signaling: open/close • Performance: aggregated read/write • NetLogger’s nlcalipers library aggregates reads/writes into periodic summaries GridFTP server XIOlayer xspd signaling XIO/XSP operation performance XIO layer Disk and Network Internet2 Joint Techs 2011. Clemson, SC
Combining XSP, Periscope, NLMI GridFTP server NLMI Clients Host stats XIO layer XSP layer Signaling perfSONAR protocols XIO layer XIO performance xspd Periscope ... GridFTP server perfSONAR services NLMI XIO layer XSP layer XIO layer Internet2 Joint Techs 2011. Clemson, SC
Visualization Internet2 Joint Techs 2011. Clemson, SC
Visualization cont. Internet2 Joint Techs 2011. Clemson, SC
Conclusions • Periscope provides a platform for perfSONAR analysis • Caching to reduce latency, centralized correlation • Integration with XSP provides transparent monitoring and awareness of application state • Still polling perfSONAR, though – Publish/Subscribe? Guilty parties Guilherme Fernandes Grad student, UD Ahmed El-Hassany Grad student, UD D. Martin Swany Faculty, UD Ezra Kissel Grad student, UD Internet2 Joint Techs 2011. Clemson, SC
Questions Contact: dkgunter@lbl.gov Internet2 Joint Techs 2011. Clemson, SC
Extra slides Internet2 Joint Techs 2011. Clemson, SC
UNIS example topology id : esnet domain id : urn:ogf:network:domain=ps.es.net, node _id : urn:ogf:network:domain=ps.es.net:node=albu-cr1 name : albu-crl description : Juniper address type : hostname value : albu-crl location latitude: +35.08 longitude : -106.64 Internet2 Joint Techs 2011. Clemson, SC
UNIS Example, cont. <unis:port id="urn:ogf:network:domain=ps.es.net:node=albu-cr1:port=134.55.40.186"> <unis:address type="ipv4">134.55.40.186</unis:address> <unis:address type="hostname">albucr1-sdn-a-albusdn1.es.net</unis:address> <unis:relation type="over"> <unis:portIdRef>urn:ogf:network:domain=ps.es.net:node=albu-cr1:port=ge-5/0/0</unis:portIdRef> </unis:relation> <unis:portPropertiesBag> <nmtl3:portProperties> <nmtl3:netmask>255.255.255.252</nmtl3:netmask> </nmtl3:portProperties> </unis:portPropertiesBag> </unis:port> </unis:node> </unis:domain> </unis:topology> Internet2 Joint Techs 2011. Clemson, SC