1 / 40

CS290F, Winter 2005 Large-scale Networked Systems

CS290F, Winter 2005 Large-scale Networked Systems. Instructor Ben Y. Zhao ( ravenben at cs.ucsb.edu , 1151 Engineering I) Office hour: Thur, 3-4pm Lecture time: TuTh, 1:00-3:00 pm Place: 1401 Phelps Teaching Assistant Ted Huffmire ( huffmire at cs.ucsb.edu ) Office hours: TBA. Overview.

Download Presentation

CS290F, Winter 2005 Large-scale Networked Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS290F, Winter 2005Large-scale Networked Systems • Instructor • Ben Y. Zhao (ravenben at cs.ucsb.edu, 1151 Engineering I) • Office hour: Thur, 3-4pm • Lecture time: TuTh, 1:00-3:00 pm • Place: 1401 Phelps • Teaching Assistant • Ted Huffmire (huffmire at cs.ucsb.edu) • Office hours: TBA

  2. Overview • Administrivia • What this class is, and is not • Enrollment, etc… • What you have to do for this class • Grading • Break • Course projects • organization • some initial ideas

  3. Administrivia • Course webpage • http://www.cs.ucsb.edu/~ravenben/classes/290F • check it periodically for announcements • Class mailing list • cs290f at lists.cs.ucsb.edu • go to: http://lists.cs.ucsb.edu/mailman/listinfo/cs290f • sign up today! • Deadlines • Unless otherwise specified, it means 10 minutes before the lecture • Special circumstances should be brought to my attention before deadlines

  4. Class Registration • Please send me (ravenben@cs.ucsb.edu ) an e-mail with the subject “cs290f registration" and the following: • Last and first name • Student ID • Your department • Preferred email address • URL of your home page • Please indicate explicitly if I can add you to the on-line web page that lists each student enrolled in the class (only your name and URL will be made public).

  5. Class Enrollment • Class size capped at 40 • currently enrollment is full • for good discussion, we should be at ~ 25 • For those unable to enroll • email me with your student ID (perm #) • I expect enrollment to open up in first 1-2 weeks

  6. Goals of this Course • Understand • How do peer-to-peer systems work? • What are the design issues in building large networked systems? • Where is networking research heading? • Get familiar with current network research • Understand solutions in context • Goals • Assumptions

  7. Goals of this Course (cont’d) • Appreciate what is good research • Problem selection • Solution and research methodology • Presentation • Apply what you learned in a class project

  8. Reasons to Not Take This Class • It’s not... • a survey class on p2p systems • a reading seminar • I might do a p2p reading group / seminar later • will be focused more on reading papers • Unreasonably high expectations • this is my first grad class • it’s in my research area • And… • smaller class will benefit everyone • also, this class will likely be offered next year

  9. What do you need to do? • A research-oriented class project • Paper reading / reviews • Lead class discussion on one research paper • Participate in discussions in class • this is meant to be an *interactive* class

  10. Research Project • Investigate new ideas and solutions in class project • Define the problem • Execute the research • Work with your partners • Write up and present your research • Ideally, best projects will become conference or workshop papers • e.g., SIGCOMM, MOBICOM, SOSP/OSDI, NSDI • IPTPS, WCW, HotOS, WMCSA, NOSSDAV

  11. Research Project: Steps • I’ll distribute a list of projects • You can either choose one of these projects or come up with your own • Pick your project, partner, and submit a one-page proposal describing: • The problem you are solving • Your plan of attack with milestones and dates • Any special resources you may need • A midterm presentation of your progress (5 minutes) • Poster session • Submit project papers

  12. Paper Reviews • Goal: synthesize main ideas and concepts in papers • Number: 2-3 per class • Length: no more than ½ page per paper • Content • Main points intended by the author • Points you particularly liked/disliked • Other comments (writing, conclusions…) • Submission • Submit each review via email before class on lecture day • See class web page for details

  13. What is a Presentation? • Presentations are in Powerpoint • or PDF if you hate MSFT • Target a 30 minute presentation • that’s about 15 slides • idea is to engage the class in discussion on your points • leave the audience with thoughtful questions • You’ll be graded on several things: • did you present the high-level points of the paper? • did you get a good discussion going • did you relate the paper to others in the course (context!)

  14. Choose Your Paper • Pick your paper from the lecture schedule online • the readings and dates will be “finalized” by Thursday • Email me to “claim” the paper • I’ll let you know if it’s already been taken • presenters names will be updated online asap

  15. About Grading • This is a graduate networking class • key is what you learn, not the grade • Evaluation of projects • 2 components: execution and impact • execution: • proposal, design, implementation, evaluation, presentation • impact: • how novel is your work, how does it advance the state of the art? • good execution  B+/A-, impact  A/A+

  16. Class Topics • Study existing p2p systems: • Gnutella, Freenet, Tapestry, OceanStore, PAST, RON, I3, Vivaldi • Study fundamental system issues: • security, fault-tolerance, system measurement and deployment • Study issues in alternative network environments: • sensor networks • ad-hoc and mobile networks

  17. Break • Take a break, get up, walk around, stretch • come back in 10 minutes

  18. Course Projects • Small project teams • ideally teams of 3 • 2 (likely too small to tackle significant project) • 4 (likely too large to coordinate well) • Choose your team members • use the class enrollment webpage (I’ll put that up shortly) • use the class mailing list • Send me a one-page project proposal by Jan 15: • what is the problem you’re solving (w/ related works) • motivation/challenges (why is this important and new) • plan of attack (w/ milestones and dates) • any resources you might need

  19. Background on Structured Overlays • Each data item and machine (node) in the system has associated a unique ID in a large ID space • ID space is partitioned among nodes • DHT: Hash table like interface • put(id, data) • data = get(id) • data items are stored at the node responsible for its ID • DOLR: directory service interface • publish (id) • routeToId (id, message) • data stored anywhere and located via directory

  20. Structured Peer-to-Peer Overlays • Assign random nodeIDs and keys from secure hash • incrementally route towards destination ID • each node has small set of outgoing routes, e.g. prefix routing ID: ABCE ABC0 To: ABCD AB5F A930

  21. What’s in a Protocol? • Definition of name-proximity • each hop gets you “closer” to destination ID • prefix routing, numerical closeness, hamming distance • Size of routing table • amount of state kept by each node as f (N), N = network size • # of overlay routing hops • worst case routing performance (in overlay hops, not IP) • Network locality • does choice of neighbor consider network distance • impact on “actual” performance of P2P routing • Application Interface

  22. Chord • NodeIDs are numbers on ring • Closeness defined by numerical proximity • Finger table • keep routes for next node 2i away in namespace • routing table size: log2 n • n = total # of nodes • Routing • iterative hops from source • at most log2 n hops Node 0/1024 0 128 896 256 768 640 384 512

  23. Chord II • Pros • simplicity • Cons • limited flexibility in routing • neighbor choices unrelated to network proximity* but can be optimized over time • Application Interface: • distributed hash table (DHash)

  24. Tapestry / Pastry • incremental prefix routing • 11110XXX00XX 000X0000 • routing table • keep nodes matching at least i digits to destination • table size: b * logb n • routing • recursive routing from source • at most logb n hops Node 0/1024 0 128 896 256 768 640 384 512

  25. 2175 0157 0154 0123 0880 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 4 4 4 4 4 5 5 5 5 5 7 7 7 7 7 3 3 3 3 3 6 6 6 6 6 Neighbor Map For “2175” (Octal) 0xxx 20xx 210x 2170 1xxx ---- 211x 2171 ---- 22xx 212x 2172 3xxx 23xx 213x 2173 4xxx 24xx 214x 2174 5xxx 25xx 215x ---- 6xxx 26xx 216x 2176 7xxx 27xx ---- 2177 4 3 2 1 Routing Levels Routing in Detail Example: Octal digits, 212 namespace, 2175  0157 5712 0880 3210 4510 7510

  26. Project Ideas • Discuss 10 project suggestions • some related to peer-to-peer protocols and systems • Legend: based on how well-defined projects are not necessary how difficult they are • Well-defined projects (5) • Less-defined project (3) • You need to define project’s goals (2) • You can also work on variants of these, or make up your own topics • Need to send me a one page proposal by Jan 15 • I’ll provide feedback/meetings to iterate on topic

  27. Project 1: Implement Chimera • Implement a structured peer-to-peer protocol that is a hybrid of Tapestry, Pastry, and Bamboo • implement in C/C++ • focus on usability and performance • Key features • simplicity (leafsets like Pastry) • stability (p2p exchange of routing tables like Bamboo) • performance (locality like Tapestry) • support “standard” p2p API and easy to build up on • might support multiple teams

  28. Project 2: the DIY P2P protocol • Study the related literature on existing p2p protocols, and design something different • must have something significantly novel/new • can be new contribution in performance, security, theoretical interest, specialized for a useful application… • The challenge • finding an area yet unexplored in the p2p space • The result • well-specified protocols on routing, node insertion/deletion, failure handling • Warning: this is not easy

  29. Project 3-5: P2P Applications/Systems • All applications built on common API • should work w/ existing protocols and new ones developed in class • Project 3: a lightweight p2p CDN/web-cache • use a DHT/DOLR to quickly search for desired object • use request tracking to determine optimal data placement • implement a client-side http proxy • The challenge • maintaining data stability as nodes come and go • providing fairness / load balancing across nodes

  30. Project 4: P2P EBay • Design and implement a scalable and secure auctioning / e-commerce system on a p2p infrastructure • support large # of transactions/time • support large # of users and items • be stable and resilient against attacks • The challenge • understanding the types of attacks • selecting a reasonable subset to resist • maintaining scalability despite security enhancements

  31. Project 5: Lightweight Data Synchronization • Design and implement Quartz • lightweight p2p data sharing system • store your most critical files (<100MB) online • use simple application-specific handlers to provide fast data synchronization (a la CVS) • synchronize your HTML bookmarks across machines • synchronize your papers, homework files, financial records • end to end encryption • The challenge • keeping data highly-persistent on a dynamic p2p network • optimizing per-node operational overhead • a simple user interface that people will *actually use*

  32. Project 6: P2P Security • Security is one of the biggest challenges in p2p • large population spread across networks and domains • high probability of node compromise • need to function in presence of malicious nodes • Existing work on security discouraging • Sybil attacks hard to prevent • “bad” users can create lots of identities online and generate collusion attacks • hard to avoid malicious nodes • you’re 1, they are many…

  33. Project 6: a P2P reputation system • Design and evaluate a highly-adaptive p2p reputation system • quickly recognize malicious (or compromised) nodes • use p2p collaboration to share reputation information • form “trusted circles” • use “third-party” anonymous verification to build reputations • The challenge • building reliable reputations that are highly adaptable • doing so without generating massive network probe traffic • minimizing impact if system is circumvented

  34. Project 7: Distributed Workload Logging • Modelnet and Emulab provide emulation environments for up to 1000’s of virtual nodes • good for reproducible experiments • bad: in a dedicated cluster, unlike real world conditions • PlanetLab provides real-world platform for distributed applications (~200+ nodes across Inet) • good for deploying / hardening your application/system • bad for reproducible results • What we need • a way to gather interesting (and unexpected) environmental conditions from distributed networks like Planetlab, and use it as trace to drive Modelnet/Emulab

  35. Project 7: cont. • The goal: • design, implement, and deploy a distributed sensor and logging interface • needs to be scalable, lightweight (minimize impact on environment), and accurate • gather a large data set, and build an environmental trace

  36. Project 8: P2P Benchmarking • Design and test and benchmark for P2P routing • needs to account for a wide-range set of application-level behaviors and working conditions • throughput • latency • key lookup versus data read/store • rate of node churn • The challenge • being comprehensive • being fair to goals of different protocols

  37. Project 9: Quantifying P2P Performance • Test a set of protocols against a number of network topologies • mostly simulations (might need to implement some protocols) • test using different types of network models, GT-ITM, BRITE, Waxman, etc… • vary protocol parameters to measure impact on basic performance • The challenge • making sense of large quantities of test data • getting a better understanding of inherent properties of network models and how they impact overlay networks

  38. Project 10: P2P + Ad-hoc • Take p2p routing algorithms, and apply to the ad-hoc or sensor net space • consider new constraints: power, processing, no underlying IP • design new routing protocol • validate via simulation • The challenge • there’s fairly little previous work, and it’s not clear the idea is feasible

  39. Support Infrastructure • CSIL machines • you should all have accounts • A new 32 node networking cluster • just assembled, being installed now • Dell Xeon servers, 2.4Ghz, 2G RAM • cluster building in progress • will have modelNet installed, and support large-scale network emulation tests (~300 nodes)

  40. Summary of TODOs • What you need to do: • send me registration email with subject: cs290f registration • sign up online for class mailing listhttp://lists.cs.ucsb.edu/mailman/listinfo/cs290f • pick a paper to present and email me to claim it(*after* Thur, first come first serve) • find other team members and pick a project topic(first draft of proposal due Jan 15)

More Related