320 likes | 480 Views
MIRnet Administrative Data Analysis System (MADAS). Greg Cole, Natasha Bulashova Friends & Partners NCSA. Description. System converts netflow data into structured data stored in a series of relational database tables
E N D
MIRnet Administrative Data Analysis System (MADAS) Greg Cole, Natasha Bulashova Friends & Partners NCSA
Description • System converts netflow data into structured data stored in a series of relational database tables • System provides means of browsing summary statistics in graphic and table format • A work in progress since 1998; first version in summer of 1999, second in fall of 2000 (for HPIIS review), third in February 2001 http://www.friends-partners.org/madasd/ FOR MORE INFO...
Description 141.142.121.5|193.233.46.21|3130|3130|UDP-Other|55|6349|2|979067306|979067523 193.233.46.21|141.142.121.5|3130|3130|UDP-Other|55|6569|2|979067306|979067523 198.32.1.116|193.233.82.3|53|3271|UDP-DNS|1|482|1|979067480|979067480 195.208.55.40|194.81.150.167|63499|80|TCP-WWW|2|96|1|979067547|979067550 194.226.45.8|193.0.72.16|53|35432|UDP-DNS|2|634|1|979067717|979067721 195.208.55.40|194.81.150.168|63500|80|TCP-WWW|2|96|1|979067547|979067550 195.208.55.40|194.81.158.128|61492|80|TCP-WWW|2|96|1|979067677|979067680 194.226.65.17|128.61.81.129|51270|21|TCP-FTP|6|360|3|979067720|979067781 194.226.65.17|128.61.81.129|51271|21|TCP-FTP|6|360|3|979067720|979067781 195.19.10.238|18.72.1.2|0|2048|ICMP|1|1500|1|979067753|979067753 195.208.55.40|194.81.150.169|63501|80|TCP-WWW|2|96|1|979067547|979067550 193.233.46.21|141.142.121.5|3143|3128|TCP-Other|5|1486|1|979067620|979067620 141.142.121.5|193.233.46.21|3128|3143|TCP-Other|5|1043|1|979067620|979067620 195.208.55.40|194.81.158.129|61493|80|TCP-WWW|2|96|1|979067677|979067680 195.208.55.40|194.81.150.170|63502|80|TCP-WWW|2|96|1|979067547|979067550 212.192.244.68|193.0.0.193|1024|53|UDP-DNS|1|71|1|979067714|979067714
Process • Aggregate netflow data from Router • Load into primary database tables • Update summary tables • Update “heap” tables • Wait 10 minutes (and do it again)
All network flows must meet minimum traffic threshold to be included in live database (for MIRnet, this is set to 10K) Lose 3% of total traffic volume but reduce 95% of records All data kept in archives Currently maintains 17,000,000+ network flow records (June 1, 2001) Primary IPheaders table *************************** 1. row *************************** ip_source: 193.233.46.3 ip_destination: 152.3.233.71 port_source: 40C-45C port_destination: 25 protocol: TCP-SMTP packets: 199 octets: 285413 flows: 1 timestart: 2000-08-28 22:50:21 timeend: 1999-09-08 06:18:09 channel: BE periodbegin: 1999-09-08 06:11:49 periodduration: 600 keyid: 2 domain_source: 42 domain_dest: 28 *************************** 2. row *************************** ip_source: 195.208.220.5 ip_destination: 128.148.55.233 port_source: 80 port_destination: 1K-2K protocol: TCP-WWW packets: 11 octets: 11128 flows: 1 timestart: 2000-08-29 18:39:41 timeend: 1999-09-08 06:20:52 channel: BE periodbegin: 1999-09-08 06:11:49 periodduration: 600 keyid: 3 domain_source: 9 domain_dest: 125
Primary DNSdata table +----------------+---------------------------+----------------+----------------+-----------+ | ip_address | ip_name | createtime | modifytime | ip_domain | +----------------+---------------------------+----------------+----------------+-----------+ | 128.178.16.37 | icpmac12.epfl.ch | 20010110104036 | 00000000000000 | 6203 | | 156.17.180.31 | budm31.ar.wroc.pl | 20010110104036 | 00000000000000 | 3232 | | 62.32.36.134 | ip134-tpas-1.ti.net.ge | 20010110104032 | 00000000000000 | 6131 | | 194.82.81.146 | dyn081-146.stanmore.ac.uk | 20010110104029 | 00000000000000 | 9760 | | 194.83.11.34 | gosh-atm.ex.ac.uk | 20010110104026 | 00000000000000 | 9488 | | 194.81.127.202 | 194.81.127.202 | 20010110104025 | 00000000000000 | 2 | | 194.81.174.83 | 194.81.174.83 | 20010110104025 | 00000000000000 | 2 | | 195.25.253.130 | 195.25.253.130 | 20010110104024 | 00000000000000 | 2 | | 194.80.105.9 | paul.cvcp.ac.uk | 20010110104024 | 00000000000000 | 9456 | | 194.81.127.113 | 194.81.127.113 | 20010110104023 | 00000000000000 | 2 | | 131.114.187.5 | endo1.endoc.med.unipi.it | 20010110104023 | 00000000000000 | 6214 | | 193.99.163.9 | 193.99.163.9 | 20010110104022 | 00000000000000 | 2 | | 194.80.104.23 | 194.80.104.23 | 20010110104022 | 00000000000000 | 2 | | 194.80.104.3 | 194.80.104.3 | 20010110104022 | 00000000000000 | 2 | | 194.81.33.48 | imb.hope.ac.uk | 20010110104021 | 00000000000000 | 9526 | +----------------+---------------------------+----------------+----------------+-----------+ Currently maintains 806,431 DNSdata IP records (January 10, 2001)
Primary Domains table *************************** 1. row *************************** domainid: 715 domainname: anl.gov latitude: 41.858 longitude: -88.017 domainlabel: Argonne Natl Lab createtime: 20010103224037 modifytime: 20001227191828 origin: US shortlabel: Argonne Natl Lab location: pdomainid: 715 rdomainid: 715 loccity: Chicago locstate: IL loccountry: United States orgclass: US Government,US Govt DOE worldclass: North America regionclass: USA Great Lakes • Heart and soul of MADAS system • Adding new “intelligence” to this database enables entirely new classes of analysis • Currently maintains 11,771 domain records (January 10, 2001) *************************** 2. row *************************** domainid: 948 domainname: doe.gov latitude: 38.892 longitude: -77.017 domainlabel: US Department of Energy createtime: 20001227170946 modifytime: 20001227170946 origin: US shortlabel: US-DOE location: Washington, DC pdomainid: 948 rdomainid: 948 loccity: Washington locstate: DC loccountry: United States orgclass: US Government,US Govt DOE worldclass: North America regionclass: USA Atlantic Central
Other Primary Tables +------+--------------------------+---------------+ | code | country | worldclass | +------+--------------------------+---------------+ | ?? | Unknown | Unclassified | | AC | Ascension Island | Other | | AD | Andorra | Europe | | AE | United Arab Emirates | Middle East | | AF | Afghanistan(Islamic St.) | Middle East | | AG | Antigua and Barbuda | North America | | AI | Anguilla | Other | | AL | Albania | Europe | | AM | Armenia | Middle East | | AN | Netherland Antilles | Other | +------+--------------------------+---------------+ • IP Today (last 24 hours of ipheaders records) • Country Codes • Parent domains • Color mappings +----------+-------------+ | parentid | parentname | +----------+-------------+ | 1308 | ac.jp | | 3 | ac.ru | | 959 | ac.uk | | 986 | edu.tw | | 6 | free.net | | 735 | nasa.gov | | 41 | nlanr.net | | 4762 | ircache.net | | 100 | ras.ru | +----------+-------------+ +-------+---------+ | code | value | +-------+---------+ | ?? | pink | | CA | lblue | | CH | purple | | DE | lbrown | | DK | green | | EE | dgray | | FI | white | | FR | cyan | | IL | gold | | IT | lred | | JP | dpink | | NL | lpurple | | NO | gray | | Other | lyellow | | PL | orange | | RU | blue | | SE | lgray | | TW | yellow | | UK | marine | | US | lgreen | +-------+---------+
Capabilities • With these tables (updated every 10 minutes), we can provide all sorts of live (and historical) traffic analysis between world regions, countries, country regions, cities, institutions, organizations, network protocols by year, month, day, hour, minute, . . But . .
Database “mirsum” 8 tables updated live every 10 minutes 2 “Heap” (RAM-based) tables used for most live queries Pre-query “optimizer” selects best tables for current query Domain_date_proto Domain_date_proto_mm Domain_date Domain_date_mm Country_date_proto Country_date_proto_mm Country_date Country_date_mm Heap_domain_date_proto Heap_domain_date_proto_mm Need to use Indexed Summary Tables
A word about technologies • No proprietary software • Mysql for database • PHP for query interface • Web/CGI for stats interface • Perl for code/CGI base • DBI for interaction with Mysql • GD::Graph graphics libraries
Analysis that in original MADAS system took 400-500 lines of perl code, now looks like: Perl Code (object-oriented) #### 2 ########## # chart showing total volume with breakdown by top countries my $self = MADAS::Country->new( database => "mirsum", table => "domain_date", variable => "origin_dest", imagemapcgi => "/cgi-bin/madas/printtable.pl", imagemap => 0, percent => 1, graphtype => "bars", title1 => "Total MIRnet Traffic Flow by Destination Country", rh_input => \%in); $self->set_title2("Period: <b>" . $self->get_timebegin . "</b> - <b>" . $self->get_timeend . "</b>"); $self->doit();
US Regions Russian Regions
DOE NASA US Government DOD
Advantages • Higher-level analysis of network usage (“not just for engineers”) • System encourages “exploration” • Better understanding of ‘users’ and their applications • Immediate feedback on traffic problems/issues
Future Plans • Evaluate shared use of Domains and DNSdata tables (perhaps via LDAP) • Standard monthly and quarterly reports of traffic utilization • “Monster” query • “Project” level accounting/analysis more . . .
Future Plans (continued) • Create always-running “server” to maintain data, provide “instant stats”, manage web site/interface • Provide statistical analysis routines • Create database to maintain all “global” settings • Port-level analysis (looking for “napster”, etc.) more . . .
Future Plans (continued) • Explore integration/sharing with HPIIS projects (others?) • Develop data maintenance applications for Domains database • Develop ‘world-map’ graphics applications more . . .
Future Plans (continued) • Develop “partnerships” analyses (looking at domain-domain and machine-machine partnerships) • Add additional “organizational” classes (i.e., “US Govt DOE”, “University”) • Add state-level analyses • Clean-up/refine Domains database more . . .
Future Plans (continued) • Add “science” classifiers and “project” identifiers to regular traffic flows • Integrate this with database describing high performance network science applications • Integrate back-end reporting with front-end reservation system
Future plans (continued) • Authentication system for machine-level inquiry/analysis • Device independent display of usage (for text-only, email, WAP devices) • Handle IP address cache expiration problem • Etc. . . .