360 likes | 381 Views
This report analyzes and presents the statistics and trouble ticket details issued by APAN JP NOC, focusing on trouble analysis, equipment evaluations, and proposing network service level enhancements. The study covers maintenance works, outage classifications, and trouble ticket distributions.
E N D
Analysis of Trouble TicketsIssued by APAN JP NOC Jin Tanaka tanaka@kddnet.ad.jp KDDI APAN NOC Session in Busan, Korea on 27 August 2003
Agenda • Introduction to APAN JP Site NOC • Statistics of Trouble Tickets • Trouble analysis • Equipment in TokyoXP • TransPAC • Characteristics of Our Trouble • Proposal for improving Network Service Level
APAN JP Site NOC: Location: • Physically located at the KDDI Otemachi Bldg 12F in Tokyo, and APAN Tokyo XP equipment is installed on the 5F Staff: • 24×7 Operators standby Operators are charged with additional operations for other networks • Scientific, Academic, Commercial Duties: • Opening and closing of Trouble Tickets • Receiving problem reports • Trouble shooting • Development and maintenance of measurement and operation tools
KDDI Circuit Division Open View NNM Mail & Web Client Physical Layer Monitor APAN KDDI APAN KDDI 12F Operation Staff APAN Equipment 5F APAN JP Site NOC: Monitoring Environment NOC • HP Open View works independently in the NOC segment. • The NOC staff is utilizing Mail & Web clients enabling to detect alerts. • Physical Layer Monitor system of KDDI observes circuits. When any alerts are detected, • we can check the same status as KDDI Circuit Division.
Statistics of Trouble Tickets: • Objects • All trouble tickets issued by APAN JP NOC for the last 12 months (from 2002/Aug ~ 2003/July) • The total of tickets amount to about 200 tickets • Issue-selecting rules • Trouble • All the outages on TransPAC are covered. For others, outage of 15 minutes or more are covered. • Maintenance • All the maintenance works are covered (including such switch-hits over circuit within 1msec.)
Statistics of Trouble Tickets: Trouble Tickets on Tokyo XP Fig1: Trouble Tickets on Tokyo XP
Statistics of Trouble Tickets: Number of Monthly Tickets for Trouble/Maintenance Fig2: Number of Monthly Tickets/Maintenance
Statistics Number of Monthly Tickets for Trouble Fig2: Number of Monthly Tickets for Trouble on Circuit/Equipment/Others/Unknown
Statistics of Trouble Tickets: Number of Monthly Tickets for Maintenance Fig3: Number of Monthly Tickets for Maintenance on Circuit/Equipment
Statistics of Trouble Tickets: Total Length of Time of Trouble/Maintenance of APAN Tokyo XP Fig4:Time Volume of Trouble/Maintenance of APAN Tokyo XP
Statistics of Trouble Tickets: Total Availability of APAN Network Fig5: Total Availability of APAN Network
Results of Trouble Tickets Statistics • The total numbers of trouble and maintenance almost equal to each other • The number of tickets varies mainly in response to circuit trouble and maintenance, which is obvious especially on TransPAC • Availability of the whole APAN network is 96.83%. (97.45% when maintenance is excepted from outage)
Trouble Analysis: Trouble Tickets Classified by Area Fig6: Trouble Tickets Classified by Area
Trouble Analysis: Total Outage Time Classified by Area Fig7: Total Outage Time Classified by Area
Trouble Analysis: Average Outage Time Classified by Area Fig8: Average Outage Time Classified by Area
Trouble Analysis: Number of Trouble Tickets by Trouble-occurring Area Routing trouble of TokyoXP Int’l circuit to TransPAC Equipment of TokyoXP Local circuit in China Equipment of PHnet Fig9: Number of Trouble Tickets by Trouble-occurring Area
Trouble Analysis: Distribution by reason for Amount of Troubles Fig10 : Distribution by reason for Amount of Trouble
Trouble Analysis: Distribution by Reason for Outage Time Fig11 : Distribution by Reason for Outage Time
Equipment Trouble Analysis in TokyoXP: Classification by Vender for TokyoXP Fig12: Classification by Vender for TokyoXP
Equipment Trouble Analysis in TokyoXP: Classification by Software/Hardware for TokyoXP Fig13: Classification by Software/Hardware for TokyoXP
Trouble Analysis on TransPAC: Fig14: Tickets Volume on Northern/Southern links Fig15: Total Outage Time on Northern/Southern links
Trouble Analysis on TransPAC: Fig16:Ticket Volume on TransPAC links Classified by Circuit/Equipment Fig17: Total Outage Time on TransPAC links Classified by Circuit/Equipment
Trouble Analysis on TransPAC: Fig18: Ticket Volume of Circuit Troubles on TransPAC links Classified by reason Fig19: Time Volume of Circuit Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC: Fig20: Ticket Volume of Equipment Troubles on TransPAC links Classified by reason Fig21: Time Volume of Equipment Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC: Availability of TransPAC • Northern link Availability = 99.819422% (Including trouble and maintenance) • Southern link Availability = 99.807319% (Including trouble and maintenance) • Total Availability = 100 - ( (100 - 99.819422) * (100 - 99.807319) ) = 99.999652% • Redundancy is achieved by the northern and southern links • Fortunately we have no outage at the same time! Fig22: Availability of TransPAC Northern link Fig23: Availability of TransPAC Southern link
Characteristics of Our Trouble: Fig22: APAN Network Outages Table1: APAN Network Outages Minutes • Longest outage time per trouble 34:45:00 • Average outage time per trouble 2:32:09 Fig23: Distribution of APAN Network Outages by Length of Time
Characteristics of Our Trouble: • 70% of all the troubles are cleared up within 60 minutes • Equipment troubles are noticeable, causing long outage time in many cases. • Utilizing housing sites and cooperation with venders are important • Domestic troubles are noticeable, but the average outage time is short Sharing trouble information internationally is defficult (Time zone, language) • Trouble occurring on lower layers such as Layer1(circuit) and Layer2(Ethernet switch) are noticeable. • Having redundant circuits and equipment, as seen on the TransPAC network, will be useful for shortening outage time.
Proposal for Improving Network Service Level: • Shortening of trouble-handling time • Start trouble-handling and announce the information quickly • Operation tools which enabling us to issue trouble tickets automatically and announce information quickly. • Shorten trouble-shooting time • Remote trouble-shooting from other areas ( cf. Router Proxy on Global NOC) • These are under examination in TokyoXP • World Wide Information sharing • Installation of a shared information server Providing the following information • Performance and Operation status of the whole APAN network (cf. Animated Traffic map on Global NOC) • Trouble and Maintenance information • Syslog of routers in XPs and APs ※It is desirable that such a server should be installed on a commercial ISP, distant from the APAN networks.
Proposal for Improving Network Service Level: • Redundant Network configuration • TransPAC links shows redundantconfiguration is very effective in realizing high availability. It is desirable that we establish redundant configuration as much as possible. • Monitoring of lower layers • For the operation of worldwide networks, it is very important to check the status of international circuits in cooperation with circuit carriers. • Possibility of using new Ethernet technologies eg, • BNDP – Bridge Neighbor Discovery Protocol • LFS - Link Fault Signaling (10GbE: 802.3ae)