60 likes | 169 Views
What Data Do We Need and Why Do We Need It?. Jim Pepin Chief Technology Officer University of Southern California. Network Data: Research Depends on It. Solutions depend on understanding the problem… Advances in many areas depend on analysis of real data
E N D
What Data Do We Needand Why Do We Need It? Jim Pepin Chief Technology Officer University of Southern California
Network Data: Research Depends on It • Solutions depend on understanding the problem… • Advances in many areas depend on analysis of real data • Network Management: Traffic engineering, net design • Network Control:Improving routing protocols • High Performance: Better transport protocols • Security:Tracking/stopping DoS and worm attacks • Over 30% of papers in top networking conference (SIGCOMM’04) depended on data collected by others • Most common providers: • ISPs (e.g., ATT, Sprint, I2) • Service Providers (e.g., Akamai) • Individual campuses (e.g., UNC, UOregon, USC – some campuses give data only to local researchers)
Network Data: More than Just Packet Traces • Some data more sensitive than others • Dynamic routing information: routing protocol advertisements • Static design information: Router configuration files, peering arrangements, policies • Operational events: alarms, trouble tickets (very few sources of this important info!) • Traffic logs: netflow records, packet header traces • Application data: URLs, p2p filenames, DNS queries • Tension – how much correlation to permit? • Data that can be correlated across multiple sites most valuable in measuring network-wide events, e.g. worms • Techniques for privacy anonymize and blur identity
Example of Data Provider • DHS PREDICT • DHS support for network research • Not for operational use by DHS • Major Players • Peer review ground rules • Generic sources for legitimate research • LANDER Project • Example of PREDICT supplier • Joint project of USC-ISI networking division and USC/ISD Center for High Performance Computing and Communications • USC-HPCC is manager of WAN for USC/CIT/JPL. • ISI provides networking research background • HPCC provides data storage and computational resources • We work together on ground rules and MOUs • LANDER funds collection systems, support staff and disk/tape space
What is hard and easy • LANDER ground rules • Scrambled headers is primary product today • Requires MOU with researcher • No collection of data payloads. • Working on very strict MOU for very limited use of non-scrambled header data for very select uses in very controlled environment. • Build collection management system integrated with other PREDICT sites. • How we do this • Very close co-operation between ISI, ISD and university legal • MOUs will be very clear and understandable for the researcher • USC can reject any application • USC will review any publication based on unscrambled headers and all work processing these headers will be done inside HPCC
Why would we do this • The Internet needs to be studied and engineered • What is the modern equivalent of Bell Labs for phone system? • How did we get to where we are today? • Co-operation between researchers and operators. • We can’t allow ourselves to have complete bunker mentality • We need to be selective in what we provide, but in case of demonstrated need provide what is needed consistent with policies • If we don’t do this no one will • The risks can be managed if we take the time and effort to work with campus management (legal, CIOs etc) to mitigate • Researchers can be brought into these discussions if cast correctly • If we don’t study how the network works our ability to manage it will degrade to zero over time