410 likes | 568 Views
Shane Alcock. The RIPE NCC Internet Measurement Data Repository. Introductions. Research Programmer with WAND NOT affiliated with RIPE NCC, just speaking on their behalf Passive measurement Organise packet trace captures Maintainer of the WITS website
E N D
Shane Alcock The RIPE NCC Internet Measurement Data Repository
Introductions Research Programmer with WAND NOT affiliated with RIPE NCC, just speaking on their behalf Passive measurement Organise packet trace captures Maintainer of the WITS website Experienced in dealing with measurement data sets
Outline Sharing Internet datasets Challenges Case studies The RIPE NCC repository Available datasets Other RIPE datasets that may be added
Sharing Measurement Data Internet measurement research requires data Often it is difficult to collect suitable data Privacy Security Cost of infrastructure Selecting appropriate times and locations
Sharing Measurement Data Sharing data with the community is an awesome idea Saves time and effort Promotes collaboration Enables validation of previous results Encourages others to share their data as well
Sharing Measurement Data WITS – Waikato Internet Traffic Storage http://www.wand.net.nz/wits CAIDA http://www.caida.org/data/ PREDICT https://www.predict.org/ CRAWDAD http://crawdad.cs.dartmouth.edu/data.php NLANR No longer exists :(
Challenges Community awareness Datasets are scattered amongst multiple hosts Lack of publicity and detailed information about datasets Meta-data DatCat (CAIDA) http://www.datcat.org Catalogue of publicly available datasets Not an actual repository – data is hosted externally Not a comprehensive resource
Challenges Repositories often maintained by research groups Limited funding, therefore limited resources People Expertise Disk space Bandwidth
Case Study: WITS Maintenance is intermittent Maintainer has many other responsibilities Disk space is a huge limitation No room on the FTP server to put new data sets Adding new disks costs both money and time Sanitizing datasets requires even more space as we must retain the original version as well Bandwidth Cost of commercial bandwidth hinders availability of data Enable access via KAREN (NZ national research network) only Fortunately, KAREN peers with many international NRENs
Challenges Permanence Research groups typically depend on competitive funding Funding runs out – repository vanishes Loss of data is a major issue No longer able to replicate and validate previous studies
Case Study: NLANR Large public archive of measurement data Auckland, Abilene traces (PMA) AMP US government ceased funding Repository no longer maintained Domain eventually expired CAIDA and WAND salvaged the data Traces now available on WITS Without intervention, the data could easily have been lost permanently
Challenges Avoiding inappropriate disclosure Anonymisation of sensitive information, e.g. IP addresses Developing policy to cover user access and agreements Many datasets have unique restrictions or policies Policy that is appropriate for one dataset is not for another Personal contact information IP addresses User payload in packet traces
Challenges Communication with users Data sharing is often not top priority for collectors Collection designed to suit their purposes Small changes to the collection process can often make the data more useful to a wider audience Encourage users to engage with collectors
Challenges Support Measurement data is complicated to deal with Steep learning curve Formats, e.g. PCAP vs ERF vs legacy DAG formats for traces Tools / Processing libraries Timezones Documentation of shared datasets is often poor User support is intermittent, due to lack of resources again
Challenges Size Internet measurement datasets are huge Push modern storage technologies to the limit Server hosting and maintenance
The RIPE NCC Repository RIPE NCC collects a lot of measurement data already They want to share this data with the community Most is already available through various repositories Develop a single common and consistent platform Hosting Browsing Accessing and downloading data Open to other collectors who wish to share data Still under development
Hardware 2 servers – Master and back-up Size: 9U Disk: 48x 2TB on 2 controllers – 2 cold spares CPU: 2x Quad core Xeon L5420 2.5GHz Memory: 32GB Chassis: Chenbro RM91250
Features of the RIPE NCC Repository Longevity RIPE NCC does not depend on competitive research funding Generating and keeping Internet measurement data for ~20 years Long time-series data Much less likely that the repository will disappear Emphasis on mirroring rather than replacing other repositories Host anonymized versions of data
Features of the RIPE NCC Repository Resources RIPE NCC manages servers, infrastructure Larger repository can justify a dedicated support staff Experience and expertise are important Diversity Variety of datasets from different collectors Increased awareness of new datasets One user account can access many different datasets Self sign-up for “basic access”
Features of the RIPE NCC Repository Communication Bridge the gap between data collectors and users Raise awareness of existing data Gather feedback from the user community Develop relationships with other data collectors Links to useful tools and libraries for processing data Share expertise as well as data
Available Datasets Data collected by RIPE NCC RIS routing database Reverse DNS delegations made by RIRs Data from external sources WITS Ex-NLANR data
Routing Information Server (RIS) 16 route collectors peering with 600 BGP routers Mostly within the RIPE region ~100 peers provide complete routing tables Routes are collected and published in MRT format Updates every 5 minutes Full table dump every 8 hours All data collected since 2000 has been retained
Routing Information Server (RIS) Other methods of access Last 3 months of data exported to MySQL database Weekly statistical reports Looking Glass queries Tools to query and visualise RIS data
Reverse DNS Zone s (Partial) Reverse DNS delegations made by RIRs Generated using RIPE DB reverse DNS objects ~410,000 reverse DNS objects
Auckland Passive traces taken at the University of Auckland Auckland II – VII were previously available through NLANR Frequently feature in measurement literature Currently available from WITS archive
Waikato Passive traces taken at the University of Waikato Long duration continuous traces Waikato I is available Other Waikato sets will be included at a later date
NLANR Other NLANR datasets that were preserved by WAND IPLS (also known as Abilene) Leipzig Active Measurement Project (AMP) Much of this data is also currently available from WITS
Other Datasets Collected by RIPE NCC Not currently in the repository but may be added later K-root and reverse DNS server statistics and traces Hostcount TTM DNSMON AS112 Other parts of RIPE DB These are covered in more detail in the paper
K-root Internet root name service operated by RIPE NCC PCAP traces of incoming port 53 traffic (DNS queries) 50 hours of traces included in CAIDA's DITL project DNS Statistics Collector (DSC) Summarises DNS traffic into 1 minute bins Generate graphs shown on the K-root website Raw data exported to DNS-OARC SNMP statistics Originate from RIPE NCC in Amsterdam Summarised and exported to an RRD
Reverse DNS 4 reverse DNS servers operated by RIPE NCC 50,000 queries per second (3x load of K-root) High query rate means regular trace collection is infeasible DSC used on each of the rDNS servers Raw data and graphs only available within RIPE NCC Could be made available if there was a need
AS112 AS number for RFC 1918 private address space http://public.as112.net/ Dynamic DNS update and rDNS server for AS112 Hosted by RIPE NCC Goal is to measure and analyse DNS updates for invalid addresses PCAP trace collected annually and contributed to DITL More frequent captures could be scheduled if needed DSC data also collected Graphs publicly available from RIPE NCC AS112 site
Hostcount Monthly DNS scan of ~100 TLDs within the RIPE region Count A and PTR records for both forward and reverse Ipv4 Also count forward AAAA for IPv6 addresses Not exhaustive, due to public zone transfers being disabled Statistics published via Hostcount website Raw data from 1990-2007 is archived off-line Current policy is to discard raw data after statistic extraction But this could be reversed if there is a need
Test Traffic Measurements (TTM) Active measurement system of ~100 probes Most probes located at ISPs and universities within Europe Not all are included in public measurements Regular series of active tests UDP one-way delay, traceroute, DNSMON, IPv6 PMTU Also supports ad-hoc measurements by authorised users Ping, HTTP page fetch Can also develop and run arbitrary tests Results not released outside of RIPE NCC
Test Traffic Measurements (TTM) Bulk data published using CERN ROOT Performance graphs on the TTM website
DNSMON Measures the reachability and latency of DNS Collected using 60 TTM probes Root domain, .com, .net, .org, e164.arpa, 24 CC-TLDs measured IPv4 and IPv6 performance measured Summary statistics and graphs are publicly available Only paying subscribers can access most recent graphs Raw data also available upon request
RIPE DB Internet number registration objects for the RIPE region IP addresses and AS numbers Reverse DNS objects Used to create zone files for the reverse DNS service Route registry objects Used to provide an Internet Routing Registry Conforms to RPSL and RFC 2650
RIPE DB Public queries supported via command-line and web Daily limit imposed on queries that include personal info Bulk data is available via FTP Personal details are not included Can subscribe to a near real-time mirror of the database Restrictions on personal data are very broad Can result in inappropriate limitations Better access policies and mechanisms should resolve this
Links RIS http://www.ripe.net/ris RIPE DB http://www.ripe.net/db K-root http://k.root-servers.org TTM http://www.ripe.net/ttm Hostcount http://www.ripe.net/is/hostcount/stats DNSMON http://dnsmon.ripe.net/dns-servmon AS112 http://www.ripe.net/as112 WITS http://www.wand.net.nz/wits
Conclusion Repository is a 'beta' Server exists and some datasets are available for download Interested users can be given access Looking for feedback and ideas Development of policy, particularly for access Data collection Improving the RIPE datasets to be more useful to researchers Acquiring more external datasets Contributions of data, analysis tools
Contact http://data-repository.ripe.net data-repository-info@ripe.net salcock@cs.waikato.ac.nz