200 likes | 328 Views
The Research on Analyzing Time-Series Data and Anomaly Detection in Internet Flow. Yoshiaki HARADA Graduate School of Information Science and Electrical Engineering (ISEE) Kyushu University. Contents. Background Purpose Background Knowledge AS and Internet routing
E N D
The Research on Analyzing Time-Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical Engineering (ISEE) Kyushu University
Contents • Background • Purpose • Background Knowledge • AS and Internet routing • Property of Internet Flow • Analysis method • Progress of this research • Conclusion and Future Work
Background • Internet is growing as a Global Information Infrastructure • always-on connection by laptop PC, cellular, etc. • many service as music and video delivery • distance medicine and learning • reliable Internet system are required We should grasp tendency of flows in Internet to manage reliable Internet infrastructure
Background • It is difficult to grasp the tendencyof Internet flows • Amount of flow are increasing with development of Internet • A lot of Garbage such as DDos Attack and illegal accesses are flows in Internet. • Physical hazard such as electrical power failure and router failure • Expert engineers are requires to manage Internet system • It take a great deal of time and effort
Purpose • It is required that the method to detecting anomaly and tendency in Internet flow automatically • There are many research of macro analyzing research in Internet flow • It is difficult to grasp detail bias and anomaly because Internet flow are complicated • I suggest that micro analyzing method by segment Network Flows in port number, AS number ,area information and country etc. • I can analyze Flow Data in detail • The drop of false alarm can give reduce managing cost • I suggest that detecting anomaly in Network traffic, and visualize
Background knowledge • AS(Autonomous system) • Collection of IP networks and routers under the control of one entity (or sometimes more) that presents a common routing policy to the Internet. • An Internet Service Provider (ISP) • A very large organization • AS numbers are currently 16-bit integers, which allow for a maximum of 65536 assignments. AS:3 AS:1 AS:4 AS:2 Router
BGP table • BGP • BGP is the core routing protocol in Internet • It works by maintaining a table of IP networks or 'prefixes' which designate network reachability among autonomous systems (AS). • We find out the destination AS number by referring to the prefix Network Next Hop Metric LocPrf Weight Path *>i3.0.0.0 210.138.15.145 300 0 2497 2497 701 703 80 i *>i4.0.0.0 210.138.15.145 300 0 2497 2497 3356 i *>i4.23.112.0/22 210.138.15.145 300 0 2497 2497 174 21889 i *>i4.23.180.0/24 210.138.15.145 300 0 2497 2497 701 6128 30576 i reachable prefix (IP address) destination AS number
Flow-Data • Flow-Data • is the collection of unidirectional packets which used in same application • is exported by router • include the information that source (destination) IP address, port number, number of packet, etc. • are enormous quantity, so we use sampling data The example of Flow Data (of Kyushu University)
Analysis method • We propose that hierarchically building of database to enhance scalability I export Flow Data and BGP routing information maintained in server, and calculate AS number from Flow Data. I make database which include necessary data (AS number, port number, number of packets, etc..). analyzing traffic categorize I categorize database as country, area, and port number. I sort database and calculate correlation for each data which we want to see tendency. visualize anomaly detection I refer to the categorized database, and visualize. I calculated the database and detect anomaly.
Analysis method – BGP table and Flow Data • I use the collecting BGP table exported from QGPOP and the collecting Flow Data exported from Kyushu University • Flow Data • I analyze the sampled day’s data which is collected at 0-5 minutes in every hour • Sampling rate is 10% Flow Data Kyushu University BGP table Universities and research institutes Universities SINET Research institutes KOREN QGPOP Information communication network dedicated to academic research IIJ Korea Advanced Research Network Internet Initiative Japan
Analysis method 1 • Detailed Analysis and Categorize • I assign AS number to IP address with reference BGP table and Flow Data. • I categorize Flow Data as port number (communicative purpose), country, area information (Asia, Europe, etc.). • I analyze the distribution of the port number in each country. • The distribution of port number may be nonbiased in the countries which frequently accesses with illegal port number • illegal accesses use various (random) port number.
Time change of number of flows in Asia This figure shows time change of number of flows of top 5 country in decreasing order of amount Almost of traffic flew with Japan, and number of flows in Japan is increasing for a year.
Time change of number of flows in Asia This figure shows time change of number of flows of top 4 country in decreasing order of amount, except Japan. The number of flows in China is increasing for a year.
Analyzing distribution of port number • I analyze the distribution of port number used with port 53 flows. • I analyze the destination of port number accessed by the host which accessed the DNS server • The host is determined by the IP address on Flow Data database port:?? port:XX port:53 port:?? host DNSserver
The distribution of port 53 flows and port 25 flows 2007/01/04 ~ 02/22 every Wednesday’s Flow data (every one hours) Horizontal axis show the number of flows in port 25 Vertical axis show the number of flows in port 53 The number of port 53 flows is increasing with the number of port 25 flows (positive correlation)
Analysis method 2 • Anomaly detection • We handle the database compiled from Flow Data • We smooth the database to make data visualizing easier by adopting exponential smoothing method • Flow Data have periodicity (daily, or weekly), so we use Holt-Winters method
Anomaly detection • Data smoothing • When I analyze long term in Flow Data, I use Exponentially Weighted Moving Average (EWMA) method. • applies weighting factors which decrease exponentially. • The weighting for each older data point decreases exponentially • Flow Data have periodicity property, so we adopt Holt-Winters method in short term analysis. • Holt-Winters method is expanded EWMA method for the periodicity data Yi = α * Yi - 1+ ( 1 - α ) * Yi - 1 Yt+1 = at + bt + ct+1-m at = α( Yt + ct-m ) + ( 1 - α)( at-1 + bt-1 ) bt = β( at - at ) + ( 1 - β) bt-1 ct = γ( Yt - at ) + ( 1 - γ) ct-m
Anomaly detection • I smooth Flow Data by using EWMA or Holt-Winters method, and calculate threshold. • When the value exceed the threshold, I consider this point as anomaly threshold area Number of flows 1 cycle (one day) anomaly high threshold level low threshold level time 0
Visualization • I develop the tool which detect anomaly and visualize • The tool should analyze only specific Flow Data which is selected by user (port number, country etc.) • In Internet traffic, there are communication data which have large amount of packets, such as port 8000 (DVTS) • We want to grasp the tendency not only All Flow Data but also the Flow Data restricted to certain country, AS or port number. • It should be versatile tool.
Conclusion and future work • Implementation of analyzing Flow Data • The program that categorize Flow Data as country, AS number, and port number are completed • I will develop the program to find out the correlation between each port number. • Anomaly detection and visualization • I smooth the Database made by analyzing program, and calculate the threshold and detect anomaly in Flow Data • I develop the tool to visualize not only all data and anomaly, but also the data which is selected by user. • I conduct verification experiment for Flow Data include electrical power failure.