260 likes | 285 Views
Lessons Learned from SC2004. Jin Tanaka Satoshi Matsui APAN-JP/JGN2 NOCs KDDI Chris Robb Global NOC Indiana University. Agenda. Background Outline and requirements of demo conducted by Asian participants for SC2004 System of APAN/JGN2 NOCs for SC2004 Preparation works
E N D
Lessons Learned from SC2004 Jin Tanaka Satoshi Matsui APAN-JP/JGN2 NOCs KDDI Chris Robb Global NOC Indiana University
Agenda • Background • Outline and requirements of demo conducted by Asian participants for SC2004 • System of APAN/JGN2 NOCs for SC2004 • Preparation works • Reports on demo traffics • Analysis of problems and lessons • Proposal for improvement
Background • APAN-JP/JGN2 NOCs and Global NOC supported a lot of demonstrations conducted by Asian participants at Super Computing 2004 in Pittsburgh. In Bandwidth Challenge, we were related to the supports of all challenge except that of SDSC (San Diego Supercomputer Centre) group. • However, various problems had happened on the network performance and the inter-domain operation in this SC2004. • It is important to analyze the highlighted problems from operator's viewpoint, and to propose improvement plans for future high performance demonstrations.
Outline and requirements of demos conducted by Asian participants for SC2004
System of APAN-JP/JGN2 NOCs for SC2004 • Most of participants were expected to use JGN2 domestic line, TransPAC line, and JGN2 international line • A concrete work flow was made, involving both NOCs of APAN-JP and JGN2 (See the next slide) • Information-sharing was done by e-mail or tool between NOCs in order to realize coordinated operation • Utilization of mailing list with all demo participants and operation staff registered: sc2004@jgn2.jp • Providing information by web page of both NOCs • APAN-JP/JGN2 International NOC: a special page regarding SC2004 for public use http://www.jp.apan.net/NOC/sc2004/sc2004-requirements.htm • JGN2 domestic NOC: an operation information page for people concerned. (Providing topology maps、traffic graphs、and looking glass for routers/switches • Adopting tools for task progress management • Using Request tracker http://rt3.jp.apan.net which had been already used successfully on APAN-JP NOC. • Tickets created per project enabled sharing, managing and updating of task progress on both NOCs among people concerned.
System of APAN-JP/JGN2 NOCs for SC2004 ②Awareness of project contents、Network resource control Japanese researcher participants Project administration ①Project application ⑪Project report ③Approval ④Indication ⑦Provisioning ⑥Request for international cooperation and network management ⑨Demos, supporting end users ①Project application APAN-JP JGN2 InternationalNOC ③Approval JGN2 Domestic NOC Local researcher participants 24-hour NOC International NOCs ⑤⑩Coordination、 information-sharing、 confirmation of operation coverage ⑨Acceptable of filing、 troubleshooting, escalation ⑦Provisioning • ●24-hour NOC engineering support was provided. (Usually, • only NOC monitoring is supported 24-hours) • Shift work was adopted: day-shift and night -shift • ●International circuit carries were asked to strengthen circuit • monitoring and avoid planned outage • During SC2004, there was one planned outage postponed • ●Coordination was reinforced with international circuit carriers and JGN • domestic NOC ⑧Demos, supporting end users Asian researcher participants
Preparation works 1 • Requirements for network utilization by each participant were confirmed actively in advance • Path to be used • Network address for participant side and SC2004 venue side • Bandwidth expected to be used • MTU size to be used • Routing methods • With information shared by participants on the venue, testing schedules were adjusted • SC2004 group was created on the task-progress management tool “Request Tracker” • http://rt3.jp.apan.net/ ( see the next slide) • Ticket were generated per demo or per event • For information-sharing, e-mails were sent to all the people registered when tickets were updated. • Information page for SC2004 was created • Showing requirement items for each demo, network designs, schedules, relevant traffic graphs http://www.jp.apan.net/NOC/sc2004/sc2004-requirements.htm
For reference on Request Tracker • There were 99 tickets generated during SC2004 • 59 with requests etc. • 40 for welcoming new users • Tickets were updates on web and by e-mail. • Updating e-mails were sent to multiple addresses of people concerned.
Preparation works 2 • Configuration within APAN Tokyo XP was modified for SC2004 • MS6(BigIron15K) was installed for the AIST cluster server, providing uplink of 10(8)Gbit/sec. • JGN2 domestic and T-LEX were newly setup • 4 new VLANs were made in cooperation with JGN2 domestic NOC: JGN2, JAXA, AIST, Osaka UNIV. • In order to activate Jumbo Frame of Tokyo UNIV. Data Reservoir team, 10G link between APAN TokyoXP and T-LEX was modified to include VLAN ID tags. • The following routing controls were done on circuits between Japan and US • In using TransPAC LA 2.4G circuit • The original route: Tokyo – Abilene LA – SC2004 venue was set on general routing policy. • In using JGN2 10G circuit • For controlling traffics going to the venue Static routes to US were set after checking each user’s network address for each booth on the venue. • For controlling traffics coming from the venue Routes with longer subnet masks than those advertised with BGP to TransPAC LA circuit were set to be advertised toward TranPAC Chicago -> Abilene routers, making themselves preferred more within Abilene. For this, Global NOC (Mr. Chris Robb) was asked to accept prefixes of /27 or more at Abilene routers.
Preparation works 2 • TransPAC Chicago router terminating JGN2 10G circuit was set to allow remote login • With coooparation of Global NOC, an account was made on TransPAC tpr-procket (Procket 8801) located at Chicago (StarLIGHT). • The status of the above router was able to be monitored as needed from Tokyo.
10 Gbps2.4 GbpsLess than 1 Gbps For Reference for Network Topology and Routing for SC2004 U-Tokyo T-LEXAS23814 SC2004 Venue 192.31.116.0/24 For Others AIST,GTRC TsukubaWANAS18127 U-Tokyo For BWC163.220.52.0/23 203.181.194.128/27 For others 163.220.0.0/19 163.220.60.0/24 163.220.108.0/24 140.221.202.0/25 MAFFINAS18125 T-LEX 10Gbps AIST,GTRC For BWC 140.221.184.0/27 JGN2 Domestic 10Gbps MAFFIN 1Gbps JGN 2 10Gbps TransPAC(Chicago)AS22388 JGN 2Domestic L2Network JAXA 140.221.192.0/27 JAXA APANTokyoXPAS7660 203.181.194.96/27 Kitakyushu RC JGN 2AS17934 Kitakyushu RC 140.221.186.0/27140.221.187.0/24 AISTServers 202.180.40.0/28 Osaka-U eVLBI 1Gbps Osaka-U Abilene(Chicago)AS11537 133.1.33.0/25133.1.69.0/24192.50.1.192/26 140.221.218.64/27 CJK GenkaiXPAS7660 WIDE 1Gbps TransPAC LA 2.4 Gbps 140.221.214.128/27 eVLBI 203.181.194.0/28203.181.194.48/28 Caltech CERNETAS4538 WIDEAS2500 140.221.199.128/25140.221.199.0/25140.221.198.128/25 Abilene(LA)AS11537 CJK QGPOPAS2523 203.28.64.0/18 KORENAS9270 Haystack Caltech Abilene(WASH)AS11537 APII/Genkai Link 1 Gbps SnetAS7082 203.255.252.192/29155.230.20.120/32 eVLBI 140.173.174.0
Traffic Reports during SC2004 Illinois UNIV. & JAXA CalTech AIST CJK eVLBI Illinois UNIV. Tokyo UNIV. JGN2 circuit TransPAC LA circuit CalTech JAXA Illinois UNIV. Illinois UNIV. & JAXA JGN2 DomesticJAXA、Illinois UNIV (Kitakyusyu RC)、AIST、Osaka UNIV、Caltech (From Korea via Genkai XP) Tokyo UNIV. via T-LEX
Problem and Analysis (1) • Utilization of Request Tracker • With 40 members newly registered for SC2004, there were 15 members who did not login. It seems that Request Tracker was not utilized fully. 【Lessons & Learned】 • Quite a few members did not understand how to use Request Tracker. • Full explanation about this tool should be given to people concerned in the future • Requirement for schedule adjustment of preliminary tests • In contrast to the actual performance of bandwidth challenge which schedule was under control, on the venue, no schedule adjustment was made for preliminary tests. 【Lessons & Learned】 • Schedule adjustment should be done among participants, NOCs and network resource managers • Although schedule adjustment was done by NOCs this time, it seems possible for persons other than NOCs to do schedule adjustment. • Filing of schedules from participants should be indispensable • The demo information page created by APAN-JP NOC was effective for domestic participants, but it was rather confusing for folks of international NOCs.
Problem and Analysis (2) • Time difference between Japan and US • With 14-hour time difference between Japan and Pittsburgh, there were some requirements for setting modification and troubleshooting. Some of them could not be attended till the following day until a special shift work system was adopted at APAN-JP NOC. 【Lessons & Learned】 • As it is a general situation with regards to any international network operation, It seems a must to build up a system that can cover requirements anytime of the day. • Speed-up in selecting route for the venue • With multiple networks existing in Japan, Japan-US and US domestic, there were multiple possible routes toward SC2004 venue. There were some signal cases where set-up was delayed due to indecisiveness in selecting the route. 【Lessons & Learned】 • It seems necessary to hold early consultation between network managers, NOCs and participants long before domo performance. • NOCs should provide criteria parameter for selecting suitable route (bandwidth, traffic, MTU size, RTT). • NOCs should know network topology not only for home but also for abroad (Asia, US domestic including End-to-End). • Cooperation between networks home and abroad is important
Problem and Analysis (3) • Reliability of Procket router (Pro8812 version 2.4.4.1-P) • Although it was required to set up for advertising separate BGP routes only toward Chicago in addition to usual aggregated routes, unsuppress-map could not be set with Procket. As a result, filtering was applied for the remaining BGP peers. • With the interface counter indication not precise enough, there were difficulties in real-time monitoring. Problem with SNMP MIB values regarding POS/OC192 interface had also effect on traffic graphs. • With no SNMP MIB value for 1G/10G Ethernet sub-interface, collection of traffic data per project was not possible. • L3 discards were constantly monitored at 10GbE-LR interface connecting with T-LEX BigIron, though with no effect on performance. It is presumed that Procket was receiving packets it could not recognize. 【Lessons & Learned】 • AS throughput performance over 1G had not been experienced with Procket router in actual operation, router performance was in question. For this type of important demo, coordination with vendors should be built up in assuming hardware replacement. • Our domestic vendor suggested that setting for collecting flow data might have effect on performance; the relevant configs were all removed accordingly. • Fortunately, information on QOS parameters was provided by Mr. Clayton Wager, who was developer of Procket (now Cisco) routers, at SCinet. • Interface counter and SNMP MIB problems have recovered with version up to 2.5.0.173-B. We should have done version up for reliability.
Problem and Analysis (4) • Utilization of high-speed & precise traffic graphs • Actual throughput could not be grasped by general MRTG that graphs data of 5-minutes average/5-minutes interval. 【Lessons & Learned】 • Utilization of close-to-real-time precise traffic graphs has been found very useful in this type of demo both for participants and operation staffs. • The original traffic graphs system developed by Mr. Hirabaru of NICT makes graphs by collecting data every 10 seconds. It very much helped operation staffs grasp the situation during SC2004. http://mrtg.koganei.itrc.net/ • Recently, Mr. Ikeda also completed a traffic graphs by system that collects data every 10 seconds using SNAPP as a tool of APAN/TransPAC observatory. http://nms2.jp.apan.net/cgi-bin/snapp/index.cgi • The Traffic Weather MAP provided by SCinet helped grasp the overall network condition of the venue. We did not notice its existence until the bandwidth challenges were over, coordination with SCinet should have been built up at an early stage. http://weathermap.sc04.org/(disabled)
Problem and Analysis (5) • JGN2 Chicago circuit trouble • A link flapping trouble occurred on JGN2 Chicago circuit on Nov.10 18:30 – Nov .11 07:30(JST) • Tokyo UNIV. was not able to do the measurement test due to this trouble • The cause of flapping trouble was a problem on the transport equipment in Salt Lake City. After the end of SC2004, it was replaced and JGN2 circuit became stable. 【Lessons & Learned】 • We should have measured and offered the high-speed throughput data regard to this OC-192 circuit • The importance of the ensuring backup circuit is confirmed again. • If there is no protected circuit, It is desirable we are able to use other circuits or networks as backup as much as possible. e.g. TransPAC2, SINET, and IEEAF
Problem and Analysis (6) • Difficulties in measuring network performance • It was filed that expected throughput performance was not gained due to packet loss between Tokyo UNIV.’s machine and the SC2004 venue • With TCP, packet loss always occurred when exceeding 3Gbit/sec, which made throughput poor. • With UDP, only 7-8 Gbit/sec throughput was maximum. • As packet loss always occurred when exceeding a certain threshold, some shaping seemed to be made somewhere in the backbone network. • Investigation was made in coordination between Tokyo UNIV., APAN/JGN2 NOCs、Global NOC in US and the vendor. However, it was not detect within the time frame where the loss was occurring. • A similar issue, found on the previous day, had been cleared though the cause had not been found. • With interface counters of all the equipment checked between Tokyo UNIV. and SC venue, no error or loss was found.
Problem and Analysis (7) • Difficulties in measuring network performance (continue) • While L3 incomplete discards were detected on the Abilene Chicago router (Juniper T320) side connecting with StarLIGHT switch (Force10), Chris tried to modify the set up so as to include a Catalyst, however with no change in the situation. (This was only 1 hour before the entire SC2004 was over, which made us realize the importance of coordination with SCinet.) • This issue has not been solved yet and continuous investigation is necessary. • After time-consuming cause-determination work, it has been confirmed that NO LOSS occurs even in generating over 3Gbit/sec traffic, though in case of multi-stream, between: • T-LEX <-> CERN • TPR4 OC-192 POS i/f <-> Abilene Chicago router No packet loss ! StarLIGHT T-LEX APAN SC2004 JGN2 Abilene 1.5G Total 3.5G 2G
Problem and Analysis (8) • Difficulties in measuring network performance (continue) 【Lessons & Learned】 • It is preferable to be able to check and view the status of as many machined included in the topology as possible. Therefore, status-checking tools such as Router Proxy are important in addition to coordination between NOCs etc. • Substantial preliminary measurement and actual operation were necessary with regards to 10G networks such as OC-192 and 10GbE. • It is necessary to develop a system that enables precise determination of trouble cause ( trouble occurring at backbone portion? Or End-user portion?) • Troubleshooting for cases with no error or loss on interface is a challenge. Attending emergent software failure should be also counted. • Network topology toward SC2004 venue, especially US domestic portion, should be grasped precisely. • With cause-determination work this time, it is not likely that loss is occurring on JGN2 Chicago circuit portion. • Information –sharing with Request Tracker was useful in troubleshooting, though the cause was not found.
Proposal for improvement (1) • From 1G-based measurement to 10G-based measurement • Making 10G Iperf and BWCTL machines • Preparation of 10G Router Tester (i.e. Agilent Router Tester 900) http://advanced.comms.agilent.com/n2x/ • Multiple use of 1G measurement machine • Cooperated performance measurement on users’ ends: • Heavy-traffic users might be requested to conduct measurement regularly and provide the measurement data to NOCs. • Cooperation of heavy traffic users would be preferable for troubleshooting. • Strengthening of collaboration with SCinet and Global NOC • Making a clear operating flow • Building up a special system for collaboration • Enhanced Information-sharing among NOCs and participants in Japan and US • Substantial preliminary tests including US backbone networks • Provide NOCs with detailed information as much as possible • Perform preliminary tests as early as possible • On-the-venue arrangement by APAN NOC staff to realize quick response and precise status-grasping • APAN NOC member will go and support the Asian demonstrations at SC2005 venue.
Proposal for improvement (2) • Ensuring of another path and flexible routing change • Investigation by using TransPAC2/SINET/IEEAF JP-US circuits for temporary use, comparison etc. • Preparation of connecting 10G with Jumbo frame to SINET,T-LEX/IEEAF • Building up of a special cooperation system between networks including international circuits • Development of operation-status publication tool with regards to domestic equipments • Development of APAN Router Proxy is under study • Utilization of advanced schedule management tool • A simple tool was used this time. It is preferable to prepare a web-based tool enabling easy writing and browsing. • Detailed information and document with contact person of project are described • Setting up measuring machines for each node and the venue • Setting up measuring machine with necessary software/parameter sush as IPv4/IPv6/Jumbo Frame/Iperf at each node. * Utilization of Abilene Observatory • Setting up a measuring machine on the venue • Strengthening of collaboration with vender for emergency • To cope with any new issues, coordination should be strengthen with vendors
Proposal for improvement (3) • Merits to be maintained • Information-sharing: • Coordination between international and domestic NOC in Japan • Utilization of Request Tracker • Publication of information page for the event • Flexible APAN-JP NOC system • Attendance during night time • Allocation of operator per project for efficient management of multiple projects • Utilization of Abilene Observatory tools • 10G-based designing of APAN Tokyo XP with consideration for Jumbo Frame • Monitoring by high-speed & high-performance traffic graphs
Proposal for improvement (4) Difficulty of hop by hop 10G measurement • The boxes enabling 10G test are installed in the subordinate of every hop • This measurement scheme can most accurately find where the bottleneck is • NOC requires 10G BOX which enables high performance test Implementation might be difficult, because 10G interfaces are still expensive still NIC which can achieve the 10G throughput enough, are not appearing on the market very often For high-speed performance, researchers’ cooperation is necessary ! NOC hopes to set up a 10G box that NOC staffs can handle!
Proposal for improvement (5)Example of central management system • Schedule tests and real experiments • Communication tool between NOCs • and researchers • Share Information (provisioning/trouble/maintenance) Administration SCinet Coordination Ticketing Requirement Indication Coordination Participants Researchers Other NOCs e.g. EU TransPAC Abilene APAN JGN2 Collaboration Collaboration • Everyone of NOCs and participants can see this system on the web • Design related to traffic graph (e.g. Animated traffic map on SCinet) is useful for • grasping the network situation • Possibility of using scheduling tool is use examined by APAN/TransPAC Observatory team
Proposal for improvement (6) Establish the know-how for lower layers operation • The importance of the grasping circuit condition is confirmed again • SONET/SDH • For operation, it is very important to check the status of circuits in cooperation with circuit carriers • As a recent trend, backbone network based on L2 or Lambda is conspicuous • Layer2 • Difficulty in finding bottle-necks • Apply L3 monitoring technology e.g. ICMP ping, traceroute, other measurement tool • VLAN ID management from end-to-end • Lambda • Operators can’t monitor and measure performance of circuit/link • Burden for operation on end router/user