360 likes | 493 Views
Analysis of Trouble Tickets Issued by APAN JP NOC. Jin Tanaka tanaka@kddnet.ad.jp KDDI. APAN NOC Session in Busan, Korea on 27 August 2003. Agenda. Introduction to APAN JP Site NOC Statistics of Trouble Tickets Trouble analysis Equipment in TokyoXP TransPAC
E N D
Analysis of Trouble TicketsIssued by APAN JP NOC Jin Tanaka tanaka@kddnet.ad.jp KDDI APAN NOC Session in Busan, Korea on 27 August 2003
Agenda • Introduction to APAN JP Site NOC • Statistics of Trouble Tickets • Trouble analysis • Equipment in TokyoXP • TransPAC • Characteristics of Our Trouble • Proposal for improving Network Service Level
APAN JP Site NOC: Location: • Physically located at the KDDI Otemachi Bldg 12F in Tokyo, and APAN Tokyo XP equipment is installed on the 5F Staff: • 24×7 Operators standby Operators are charged with additional operations for other networks • Scientific, Academic, Commercial Duties: • Opening and closing of Trouble Tickets • Receiving problem reports • Trouble shooting • Development and maintenance of measurement and operation tools
KDDI Circuit Division Open View NNM Mail & Web Client Physical Layer Monitor APAN KDDI APAN KDDI 12F Operation Staff APAN Equipment 5F APAN JP Site NOC: Monitoring Environment NOC • HP Open View works independently in the NOC segment. • The NOC staff is utilizing Mail & Web clients enabling to detect alerts. • Physical Layer Monitor system of KDDI observes circuits. When any alerts are detected, • we can check the same status as KDDI Circuit Division.
Statistics of Trouble Tickets: • Objects • All trouble tickets issued by APAN JP NOC for the last 12 months (from 2002/Aug ~ 2003/July) • The total of tickets amount to about 200 tickets • Issue-selecting rules • Trouble • All the outages on TransPAC are covered. For others, outage of 15 minutes or more are covered. • Maintenance • All the maintenance works are covered (including such switch-hits over circuit within 1msec.)
Statistics of Trouble Tickets: Trouble Tickets on Tokyo XP Fig1: Trouble Tickets on Tokyo XP
Statistics of Trouble Tickets: Number of Monthly Tickets for Trouble/Maintenance Fig2: Number of Monthly Tickets/Maintenance
Statistics Number of Monthly Tickets for Trouble Fig2: Number of Monthly Tickets for Trouble on Circuit/Equipment/Others/Unknown
Statistics of Trouble Tickets: Number of Monthly Tickets for Maintenance Fig3: Number of Monthly Tickets for Maintenance on Circuit/Equipment
Statistics of Trouble Tickets: Total Length of Time of Trouble/Maintenance of APAN Tokyo XP Fig4:Time Volume of Trouble/Maintenance of APAN Tokyo XP
Statistics of Trouble Tickets: Total Availability of APAN Network Fig5: Total Availability of APAN Network
Results of Trouble Tickets Statistics • The total numbers of trouble and maintenance almost equal to each other • The number of tickets varies mainly in response to circuit trouble and maintenance, which is obvious especially on TransPAC • Availability of the whole APAN network is 96.83%. (97.45% when maintenance is excepted from outage)
Trouble Analysis: Trouble Tickets Classified by Area Fig6: Trouble Tickets Classified by Area
Trouble Analysis: Total Outage Time Classified by Area Fig7: Total Outage Time Classified by Area
Trouble Analysis: Average Outage Time Classified by Area Fig8: Average Outage Time Classified by Area
Trouble Analysis: Number of Trouble Tickets by Trouble-occurring Area Routing trouble of TokyoXP Int’l circuit to TransPAC Equipment of TokyoXP Local circuit in China Equipment of PHnet Fig9: Number of Trouble Tickets by Trouble-occurring Area
Trouble Analysis: Distribution by reason for Amount of Troubles Fig10 : Distribution by reason for Amount of Trouble
Trouble Analysis: Distribution by Reason for Outage Time Fig11 : Distribution by Reason for Outage Time
Equipment Trouble Analysis in TokyoXP: Classification by Vender for TokyoXP Fig12: Classification by Vender for TokyoXP
Equipment Trouble Analysis in TokyoXP: Classification by Software/Hardware for TokyoXP Fig13: Classification by Software/Hardware for TokyoXP
Trouble Analysis on TransPAC: Fig14: Tickets Volume on Northern/Southern links Fig15: Total Outage Time on Northern/Southern links
Trouble Analysis on TransPAC: Fig16:Ticket Volume on TransPAC links Classified by Circuit/Equipment Fig17: Total Outage Time on TransPAC links Classified by Circuit/Equipment
Trouble Analysis on TransPAC: Fig18: Ticket Volume of Circuit Troubles on TransPAC links Classified by reason Fig19: Time Volume of Circuit Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC: Fig20: Ticket Volume of Equipment Troubles on TransPAC links Classified by reason Fig21: Time Volume of Equipment Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC: Availability of TransPAC • Northern link Availability = 99.819422% (Including trouble and maintenance) • Southern link Availability = 99.807319% (Including trouble and maintenance) • Total Availability = 100 - ( (100 - 99.819422) * (100 - 99.807319) ) = 99.999652% • Redundancy is achieved by the northern and southern links • Fortunately we have no outage at the same time! Fig22: Availability of TransPAC Northern link Fig23: Availability of TransPAC Southern link
Characteristics of Our Trouble: Fig22: APAN Network Outages Table1: APAN Network Outages Minutes • Longest outage time per trouble 34:45:00 • Average outage time per trouble 2:32:09 Fig23: Distribution of APAN Network Outages by Length of Time
Characteristics of Our Trouble: • 70% of all the troubles are cleared up within 60 minutes • Equipment troubles are noticeable, causing long outage time in many cases. • Utilizing housing sites and cooperation with venders are important • Domestic troubles are noticeable, but the average outage time is short Sharing trouble information internationally is defficult (Time zone, language) • Trouble occurring on lower layers such as Layer1(circuit) and Layer2(Ethernet switch) are noticeable. • Having redundant circuits and equipment, as seen on the TransPAC network, will be useful for shortening outage time.
Proposal for Improving Network Service Level: • Shortening of trouble-handling time • Start trouble-handling and announce the information quickly • Operation tools which enabling us to issue trouble tickets automatically and announce information quickly. • Shorten trouble-shooting time • Remote trouble-shooting from other areas ( cf. Router Proxy on Global NOC) • These are under examination in TokyoXP • World Wide Information sharing • Installation of a shared information server Providing the following information • Performance and Operation status of the whole APAN network (cf. Animated Traffic map on Global NOC) • Trouble and Maintenance information • Syslog of routers in XPs and APs ※It is desirable that such a server should be installed on a commercial ISP, distant from the APAN networks.
Proposal for Improving Network Service Level: • Redundant Network configuration • TransPAC links shows redundantconfiguration is very effective in realizing high availability. It is desirable that we establish redundant configuration as much as possible. • Monitoring of lower layers • For the operation of worldwide networks, it is very important to check the status of international circuits in cooperation with circuit carriers. • Possibility of using new Ethernet technologies eg, • BNDP – Bridge Neighbor Discovery Protocol • LFS - Link Fault Signaling (10GbE: 802.3ae)