210 likes | 330 Views
Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0. Supercomputing Center of KISTI Kookhan Kim August 28, 2003. Supercomputing Center. Contents. Introduction FlowScan FlowScan+ 2.0 Traffic Measurement & Analysis Others. Introduction.
E N D
Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August 28, 2003 Supercomputing Center
Contents • Introduction • FlowScan • FlowScan+ 2.0 • Traffic Measurement & Analysis • Others
Introduction • We have various types supercomputers • NEC, IBM, Compaq, PC cluster • Supercomputing traffics • All traffics to calculate many kinds of data, which is generated between supercomputers and every users • Users would have authenticated and authorized ID • Until now, we did’t try to measure supercomputing traffic and analyze them yet • We want to know the characteristics of supercomputing traffics • who use it? • what applications & protocols used? • how much amount traffic generated? • To meet these demands, we improved FlowScan
What is FlowScan? • FlowScan is a passive measurement tool drawing traffic graphs by analyzing network flows exported by routers and switches • NetFlow is exported CISCO routers and switches • It was developed by Dave Plonka and managed by CAIDA (http://www.caida.org) • Main modules - Perl scripts • cflowd (a flow collection engine) • flowscan (central process in the system) • Our improvement focuses on this module • RRDtool (a visualization tool) • Definition : Flow • An IP flow is a unidirectional series of IP packets of a given protocol, travelling between a source and destination, within a certain period of time.
Enhanced FlowScan+ • The goal • Make a good passive measurement tool • The Motivations • Lack of traffic measurement tool that supports real time visualization and detailed traffic analysese on demand • To make user friendly tool, it can help everyone easy to use • Why FlowScan? • An open source program • It has good graphing function on the web • But yet it does not support query interface • Who is involved? • Supercomputing Center of KISTI • System Architecture Lab., Dept. of Computer Science, KAIST
FlowScan+ 2.0 NetFlow v7 FlowScan Original Module RRD Static graph Flow-Tools Analysis Module (FlowScan+ 1.0) Flowscan Link Query Parsed Data Aggregation (15 min) DB Dynamic graph Visualization Module (FlowScan+ 2.0)
FlowScan+ 1.0 Use MySQL Store NetFlow Information into DB Rawflows Aggregated data Query interface Access to the DB By Web Easy to use FlowScan+ 2.0 Flow-tools NetFlow version problem User Group Edit Small group, large group Divided by IP Class Visualization of DB query result JAVA Servlet, jfreechart FlowScan+ Main Point
FlowScan+ 2.0 : Flow-tools • NetFlow v5 & v7 have different PDU formats and do not correspond with including information • Cflowd, main NetFlow collection module in the FlowScan, cannot collect NetFlow v7 • We have to change NetFlow capture module • Flow-tools replace cflowd as NetFlow v7 collection modules
FlowScan+ 2.0 : User Grouping • There is no way to veryfy user(id) of the Supercomputer • The user-related information is only IP address in the NetFlow • By this information, we can consider that “who is generating traffic user” • If users always connect the supercomputer with same system, they have the same source/dest IP : it is no problem • But they can log in with other systems in the same office or same building • So we takes a user grouping concept • If completely different place log in, it is impossible analysis user(id) from NetFlow • Except from this siuation, we can verify supercomputing user with network IP of NetFlow
FlowScan+ 2.0 : User Grouping Group name group number Group ID user ID or related information We have classified only C class IP • - If one has many user ids • - When we compare the traffic of • a number of institutes with each • others • We should aggregate its total traffics • Large grouping
FlowScan+ 2.0 : Visualization • In FlowScan+, improved by adding MySQL, has free DBMS based on the query interface to get flow information • But results of query are text based information • difficulties to intuitive understand • It cannot display result plot as time sereis • To support this, FlowScan+ 2.0 takes a visualization servlet
FlowScan+ 2.0 : Visualization • The text result is only way that we can see the result of query interface until now • If we want to see the result of graphical plot as time passed • FlowScan+ 2.0 makes one more query into DB Visualization process & graph
Lion Kfddi2 Kordic Tiger Baram Ruby-8/80 Catayst6506 Cisco7513 Ruby-8/80 Catayst6506 FlowScan+ 2.0 NetFlow v7 export Si Si Si Si Si Si H-Ruby H-Opal NEC H-NFS COMPAQ C6506 C6506 PC Cluster IBM SUPER COMPUTERS Traffic Measurement topology • Our supercomputer is linked mesh type with 2 catalyst 6500 series switches • NetFlow v7 export • Drawing graph every 5min. • Storing aggregated data & rawflows into BD every 15min.
FlowScan+ 2.0 – traffic analysis Top user (by Institute) (2003/July/21 14:00 ~ /28 14:00) - 1 week measurement traffic - It is analyzed by large group - The pie graph draws again by the Excel sheets
FlowScan+ 2.0 – traffic analysis Application (2003/July/21 14:00 ~ /28 14:00) • It shows a strange result, we cannot expect • We want to know the cooupied portion by various applications • Involved in bio, physics, aerospace, chemistry and so on. • But those are operated in the supercomputer • Those applications are installed in the supercomputers • Users log in the supercomputer by telnet and ftp • Transfer theirs data & Operate application from remote sites
Other usage of FlowScan+ 2.0 • Detection of Network abnormalities • Port scanning • Cord Red virus • NIMDA virus • Mass mailing worm component • DDoS attack • Some features between flow and traffic amount • Byte : normal size traffic • Flow : explosive increase • Detection of emerging new applications • GRID applications, P2P applications and so on • If we should match new emerge application with defined its port number • Decrease unknown traffic portion
Conclusions • FlowScan+ developed by KISTI & KAIST • Characteristics of FlowScan+ 2.0 • Flow-tools • NetFlow version problem. • Group edit • It can be measure & analysis of traffics by each users • Visualization of results • It makes graphical plot as time serise. • Future Works • DB optimization to speed up • Installation packaging • More stability of flowscan • Aggregate merits of each versions
Thank you for your attention Questions ?