460 likes | 573 Views
Network Programming Intro to Distributed systems Fall 2013. Some material taken from publicly available lecture slides including Srini Seshan’s and David Anderson’s. L1-Intro. Dongsu Han. Today’s Lecture. Administrivia What is a distributed system and what does it do?
E N D
Network ProgrammingIntro to Distributed systemsFall 2013 Some material taken from publicly available lecture slides including SriniSeshan’s and David Anderson’s L1-Intro Dongsu Han
Today’s Lecture • Administrivia • What is a distributed system and what does it do? • Walking through an example • Overview of the topics covered in this course
Instructors • Instructor • 한동수 (Dongsu Han) • dongsu_han@kaist.ac.kr, N1 814 • Office hours: Wednesday 1-3pm • Teaching Assistant • 정은영 • notav@ndsl.kaist.edu,N1 820
Course Goals • Become familiar with the principles and practice of distributed systems • Understand the challenges and common techniques in distributes systems design • Learn how to write distributed applications that use the network • How does Dropbox work? • How does a Content Distribution Network work?
Course Format • ~30 lectures • References (no single textbook): • Distributed systems: Principles and Paradigms • Distributed systems: Concepts and Design (CDK) • Computer Networks: A Systems Approach • Exams: Midtermand Final • Programming assignments • 5to 6 assignments • Loosely tied to lecture materials (start early)
About Programming Assignments • Systems programming in Low-level (C) • Must be robust, error handling must be rock solid • Handle concurrency • Understand the system’s failure modes • Interfaces specified by documented protocols • 1 or 2 TA led hands-on session on programming/debugging
Grading • 10% late penalty per day • Can’t be more than 3 days late • Exceptions: documented medical/personal emergency. • Two “late points” to use over the entire course (up to one point for each assignment) • Regrade request must be done in writing within a week of the original grading.
Grading • Weight assignment • 20% for Midterm exam • 25% for Final exam • 45% for Homework/programming assignment • 10% for class participation • You MUST demonstrate competence in both projects and tests to pass the course
Collaboration • Working together important • Discuss course material • Work on problem debugging • Programming assignment mustbe your own work • Partial credit (points) • Will run plagiarism detection on source code • “Copy and paste” codes will get severely penalized • Implication: You will fail this course if you copy someone else’s code.
Why do I need this course? • “Everything” is distributed these days. • Web, google, dropbox, kakao talk, youtube, calendar, email, facebook, cais, the cloud,… • “Everything” relies on distribute systems • Learn how they really work. • Learn how to design systems that scale.
Why do I need this course? • Enables new things • Search engine • Facebook (Social Networking Systems) • Dropbox • Make existing thing more efficient • Scale Facebook for the next billion users • Scale CAIS to work well even when everyone tries to access it • Make computer graphs rendering faster using clusters
Examples of Scale • Updates/Posts • Twitter: The record is 25,088 tweets per second (when Castle in the Sky was broadcast in Japan) • Searches • Google: 5,134,000,000searches per day. • Network operations • 10Gbps == 14,880,952 packets per second (@64bytes) • Fast key-value store: 50~70 Mops/sec
Examples of Scale • Akamai running 105,000 servers in more than 1,900 networks • Number of networks on the Internet: 45,000 (2013/8/26) • Microsoft: 1 million servers • Google envisions 10 million servers.
What do they enable? means $
In-class Activity • Form a group of 3 • Questions to answer: • Name a few distributed systems you know of. • Pick one and draw its components as best as you can. • Think about how many servers there are or how many requests it handles per second. (Make some assumptions) • Describe each component in ~2 sentences. • No right or wrong answer. • 10 mins. • 1 or 2 teams will present.
Today’s Lecture • Administrivia • What is a distributed system and what does it do? • Walking through an example • Overview of the topics covered in this course
A Real Distributed System • Google search
Remember IP... hosts.txt www.google.com 66.233.169.103 www.cmu.edu 128.2.185.33 www.cs.cmu.edu 128.2.56.91 www.areyouawake.com66.93.60.192 ... From: 128.2.185.33 To: 66.233.169.103 <packet contents>
Domain Name System . DNS server ` Local DNS server who is www.google.com? ask the .com guy... (here’s his IP) who is www.google.com? ` ` .com DNS server ask the google.com guy... (IP) ` Decentralized - admins update own domains without coordinating with other domains Scalable - used for hundreds of millions of domains Robust - handles load and failures well www.google.com is 66.233.169.103 who is www.google.com? 66.233.169.103 google.com DNS server
But there’s more... Which google datacenter is 128.2.53.5 closest to? Is it too busy? who is www.google.com? ` 128.2.53.5 Search! 66.233.169.99 google.com DNS server
Front-end Front-end Front-end Front-end Front-end Front-end Front-end Query
How do you index the web? • Get a copy of the web. • Build an index. • Profit(insert advertisements) There are over 60trillion individual web pages Hundreds of millions of websites
How do you index the web? • Crawling -- download those web pages • Indexing -- harness 10s of thousands of machines to do it • Profiting -- we leave that to you. • “Data-Intensive Computing”
i1 i1 i1 i2 i2 i2 i3 i3 i3 i4 i4 i4 ... ... ... Storing: Google File System Data is split into chunks Replicate: Handle load doc1,2,3,..n GFS distributed filesystem Replicated Consistent Fast
Indexing hello hadoop goodbye hadoop hello you hello me hello world goodbye world doc2 doc1 Inverted index goodbye doc1 =>1 doc2=>1 hadoopdoc1=>2 hello doc1=>3 doc2=>1 me doc1=>1 world doc2=>2 you doc1 =>1
... MapReduce / Hadoop Data Chunks Computers hello Doc 1~n Sort by key Storage you Data Transformation Data Aggregation Storage
... MapReduce / Hadoop Why? Hiding details of programming 10,000 machines! Programmer writes two simple functions: map (data item) -> list(tmp values) reduce ( list(tmp values)) -> list(out values) MapReduce system balances load, handles failures, starts job, collects results, etc. Data Chunks Computers Sort Storage Data Transformation Data Aggregation Storage
All that... • Hundreds of DNS servers • Protocols on protocols on protocols • Distributed network of Internet routers to get packets around the globe • Hundreds of thousands of servers • ... to find ryuhyunjin in under 0.2 second
Today’s Lecture • Administrivia • What is a distributed system and what does it do? • Walking through an example • Overview of the topics covered in this course
What is a Distributed System? • A distributed system is: “A collection of independent computers that appears to its users as a single coherent system” "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." – Leslie Lamport
Distributed Systems • The middleware layer extends over multiple machines, and offers each application the same interface.
What does it do? Hide complexity to programmers/users Hide the fact that its processes and resources are physically distributed across multiple machines. Transparency in a Distributed System
How? • The middleware layer extends over multiple machines, and offers each application the same interface.
What we will learn • Learn principles in distributed systems design. • Distributed systems differ from traditional software because components are dispersed. • Many assumptions break for this reason. • We will study important aspects that we must consider in dealing with distributed systems.
Challenges (1/2) • Heterogeneity • Networks, computer hardware, OS, programming languages, different implementations • Openness • Different implementation or extension can be added. (e.g., Firefox, Internet Explorer). Key interfaces are published. • Security • Confidentiality, integrity, and availability • E-doctor (you don’t want someone else to see your record)
Challenges (2/2) • Scalability • Does the system remain effective with a significant increase in # of users? • Failure handling • Detection, masking, tolerating failure, recovery • Concurrency • Need synchronization when accessing shared resources. • Transparency • Quality of service • Video quality may suffer when the network is overloaded.
Next Lecture • How does the Internet work? – Intro to networking • Programming assignment • Socket programming • If you got an A+ in Computer Networks course, please see the instructor after class (might let you skip this).