40 likes | 158 Views
Project . Design, build, and operate a large-scale distributed application. Issues to worry about: Scalability, reliability, efficient use of resources, easy to operate, reuse, Large-scale deployment platform. (PlanetLab) Limited handholding Groups of up to 3 students. TO DO: Start
E N D
Project Design, build, and operate a large-scale distributed application. • Issues to worry about: Scalability, reliability, efficient use of resources, easy to operate, reuse, • Large-scale deployment platform. (PlanetLab) • Limited handholding • Groups of up to 3 students. • TO DO: Start thinking about your group.
Gnutella Network Topo crawl Topo information (e.g., neighboring nodes) Main idea: recursively crawl the entire network Support provided: libraries, bootstrap node
Project steps • P1. warmup: familiarize yourself with PlanetLab, setup the environment, develop a monitoring service • P2. start crawling in controlled environment: centralized / single node crawler. • P3. large –scalecrawler • Master-worker design • Deployed on planetlab (and using at least 100 nodes) • Options: • Single node vs. distributed. • Blocking vs. non-blocking IO; • Volume of data gathered
Alternatives: Your own project Goal: Design, build, and operate a large-scale distributed application Some ideas • Crawl and analyze data form other p2p or social networks: • e.g., Twitter, Skype, YouTube • Hard: closed protocols (Skype) • Cool: no (or few) independent measurements • Explore Amazon service performance: e.g., S3 • Performance: latency, throughput, consistency • multiple vantage points (migration?) • Hard: limited budget (!), black box • Cool: real, well engineered service, huge scale • Others: • location services, • Network health monitoring