240 likes | 373 Views
Of maps and costs : Aggregating large-scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol. IIT RTC Conference October 15 - 17, 2013. David Goergen 1 Vijay K. Gurbani 2 Radu State 1. OUTLINE. Premise ALTO: background FCC dataset Processing
E N D
Of maps and costs: Aggregating large-scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol IIT RTC Conference October 15 - 17, 2013 David Goergen1 Vijay K. Gurbani2 Radu State1
OUTLINE • Premise • ALTO: background • FCC dataset • Processing • Evaluation and discoveries IIT RTC conference
Premise • Essential to study trends and derive network analytics • Twoextremesexist • Complete and highlydetailsraw data • Userslost in details • High amount of data • Highlyaggregated and summerized reports • Humanreadable format • i.e. charts, presentations, reports • Oftencannotbefurtherinvestigated • There is a need for an intermediateway • ALTO Protocol seems a good choice. IIT RTC conference
ALTO Introduction ALTO solves the general rendezvous problem: Given a choice of resources, which one is the best candidate? Recurring pattern in many domains: Peer-to-peer (BitTorrent) Which peers are close to me? Which peers have high upload bandwidth? Content delivery networks (CDN) Rendezvous me with nearest surrogate Network routing and distance calculation Shortest path computation Data centers and cloud computing Where is my nearest data center? Which server is lightly loaded? Which data center has the lowest network utilization? IIT RTC conference
ALTO Introduction History Circa 2008 --- Comcast and BitTorrent P2P traffic dominates the Internet Internet Service Providers wanted a well-behaved network ISPs wanted to reduce transit costs. BitTorrent traffic exhibits greedy behaviour to optimize local maxima at the expense of other time-sensitive traffic. May 2008 IETF Workshop on P2P Infrastructure held in MIT to arrive at mitigating solutions Outcome: 2 Working Groups LEDBAT: Low Effort Extra Delay Background Transport ALTO: Application Layer Traffic Optimization IIT RTC conference
ALTO Introduction ALTO is: An Application Layer Traffic Optimization Protocol An IETF Working Group An IETF (soon-to-be) standard RFC A restful API that provides topology maps and cost maps to clients A restful API that provides building blocks to construct: Ranking service Endpoint cost service Endpoint property service Map Filtering service What is an endpoint? An IP address, a MAC address, an aggregation of IP addresses, ... IIT RTC conference
ALTO Introduction ALTO Architecture ISP Provisioning policies Routing protocols Dynamic network information ALTO client ALTO server ALTO service discovery External interfaces Standardized protocol Not subject to standardization Third parties, content providers, ... IIT RTC conference
ALTO Introduction 2 main abstractions: Network Map Cost Map Network specified in terms of Partition/Provider ID (PID): aggregation of endpoints identified by a provider-defined network location identifier. Costs are normalized and have two attributes: Type: What does the cost represent? Air-miles, hop count, ... Mode: How to interpret the cost. Numerical (mathematical operations) Ordinal(position-based preferences) These abstractions help! IT, meet NOC. NOC, meet IT! IIT RTC conference
Network map ALTO Introduction: Maps (Network and cost) Datacenter 2 Datacenter 1 Datacenter 3 Problem: Complexity and network structure exposed. Graphics sources: http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg IIT RTC conference
Network map Hides complexity behind “partition IDs” ALTO Introduction: Maps (Network and cost) Datacenter 2 PID 2 Datacenter 1 PID 1 Datacenter 3 PID 3 Graphics sources: http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg IIT RTC conference
Cost map Network cost of linking the partitions ALTO Introduction: Maps (Network and cost) Datacenter 2 PID 2 Datacenter 1 PID 1 20 1 10 30 Datacenter 3 22 PID 3 5 Graphics sources: http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg IIT RTC conference
ALTO Introduction: Example ALTO maps Cost map Network map IIT RTC conference
FCC Dataset specification • One country • Time Period: 01.01.2012 to 31.12.2012 • 7,782anonymised volunteers spread across the country • Each hourly triggers a defined set of common web sites • i.e. Google, YouTube, CNN, … • 75-78 million records per month • 6-7 GB of data per month IIT RTC conference
FCC Dataset specification • Consists of several files organized per month • Linked together through unit_id field • For our first evaluation we use curr_dns file • extract distinct unit_id which are consistent over a certain period • Use these to create a topology map for the ALTO protocol IIT RTC conference
FCC Dataset specification IIT RTC conference
Processing • Find a stable set of unit_id • DNS resolver appears in every file • Location is fixed. • Location is resolved using geo-ip database • Unit_id close to DNS resolver location IIT RTC conference
Hadoop cluster specs • Hadoop 2.0.0-cdh 4.3.0 • 4 nodes • hexacore 2.4GHz Xeon • 120 GB RAM • HDFS 27.54 TB • 2 x 1GB Ethernet bonded IIT RTC conference
Hadoop job process IIT RTC conference
Outcome • Output contains • unit_id • DNS Resolver IP • Occurrence • Geo. location • Post process • Filter all non stable unit_id • Occurrence < 12 month IIT RTC conference
Interesting Observation • Someunit_id are locatedoutside US • Assume user has manuallyconfigured DNS resolver • OpenDNS and Google DNS resolverswereignored • Large convergence to single point (Potwin,KS) • Potwinis the geographical center of the US • ISPs generally locate their primary or secondary DNS name servers • continue to further investigate on minimizing the impact • Someunit_id change ISP and/or location IIT RTC conference
Stable unit_id IIT RTC conference
Next steps • Attempt to create network map • Rough PID groupings accomplished by unit IDs belonging to same ISP. • More formal PID groupings for further study (e.g., group by bandwidth speed irrespective of ISP, lowest jitter, …). • Attempt to create a cost map • Different cost maps for different applications (e.g., use udp latency or jitter as a cost metric for VoIP applications). • Cross-reference with other dataset (e.g., US Census Dataset). IIT RTC conference
Next steps • Using stable unit IDs as landmarks in a virtual coordinate system. IIT RTC conference
Thank you for your attentionQUESTIONS? IIT RTC conference