Distributed Systems and Security: An Introduction

Distributed Systems and Security:An Introduction Brad Karp UCL Computer Science CS GZ03 / M030 29th September, 2008

Today’s Lecture • Logistics • Course Communication • Overview of Distributed Systems • What are they? • Why build them? • Why are they hard to build well? • Detailed Syllabus: Distributed Systems • Assessment regime • Questionnaire

Logistics • Meeting Times/Locations • Monday: 1 PM – 2 PM, MPEB 1.20 • Tuesday: 1 PM – 2 PM, MPEB 1.20 • Thursday: 3 PM – 4 PM, Bedford Way G03 • Schedule • 29th September – 30th October: Distributed Systems • 3rd November – 7th November: Reading Week (no lecture) • 10th November – 11th December: Security

Course Communication • Course web page • http://www.cs.ucl.ac.uk/staff/B.Karp/gz03/f2008/ • Detailed calendar: readings, lecture topics, coursework, announcements/corrections • Your responsibility: check page daily! • Course mailing lists • {gz03,m030}@<department’s domain> • Subscribe by sending mail to {gz03-request,m030-request}@<department’s domain>with one-word subject join • Must subscribe from UCL CS email address • Used for course announcements • Your responsibility: check email daily!

What Is a Distributed System? • Multiple computers (“machines,” “hosts,” “boxes,” &c.) • Each with CPU, memory, disk, network interface • Interconnected by LAN or WAN (e.g., Internet) • Application runsacross this dispersed collection of networked hardware • Butuser sees single, unified system

What Is a Distributed System?(Alternate Take) “A distributed system is a system in which I can’t do my work because some computer that I’ve never even heard of has failed.” • Leslie Lamport, Microsoft Research (ex DEC)

What are shortcomings of this design? Start Simple: Centralized System • Suppose you run Gmail • Workload: • Inbound email arrives; store on disk • Users retrieve, delete their email • You run Gmail on one server with disk Email Sender Email Reader Email Sender Gmail Server (PC) Email Reader Email Sender Email Reader

Why Distribute? For Availability • Suppose Gmail server goes down, or network between client and it goes down • No incoming mail delivered, no users can read their inboxes • Fix: replicate the data on several servers • Increased chance some server will be reachable • Consistency? One server down when delete message, then comes back up; message returns in inbox • Latency? Replicas should be far apart, so they fail independently • Partition resilience? e.g., airline seat database splits, one seat remains, bought twice, once in each half!

Why Distribute?For Scalable Capacity • What if Gmail a huge success? • Workload exceeds capacity of one server • Fix: spread users across several servers • Best case: linear scaling—if U users per box, N boxes support NU users • Bottlenecks? If each user’s inbox on one server, how to route inbound mail to right server? • Scaling? How close to linear? • Load balance? Some users get more mail than others!

Performance Can Be Subtle • Goal: predictable performance under high load • 2 employees run a Starbucks • Employee 1: takes orders from customers, calls them out to Employee 2 • Employee 2: • writes down drink orders (5 seconds per order) • makes drinks (10 seconds per order) • What is throughput under increasing load?

What would preferable curve be? What design achieves that goal? Starbucks Throughput • Peak system performance: 4 drinks / min • What happens when load > 4 orders / min? • What happens to efficiency as load increases?

Why Are Distributed SystemsHard to Design? • Failure: of hosts, of network • Remember Lamport’s lament • Heterogeneity • Hosts may have different data representations • Need consistency (many specific definitions) • Users expect familiar “centralized” behavior • Need concurrency for performance • Avoid waiting synchronously, leaving resources idle • Overlap requests concurrently whenever possible

Security • Before Internet: • Encryption and authentication using cryptography • Between parties known to each other (e.g., diplomatic wire) • Today: • Entire Internet of potential attackers • Legitimate correspondents often have no prior relationship • Online shopping: how do you know you gave credit card number to amazon.com? How does amazon.com know you are authorized credit card user? • Software download: backdoor in your new browser? • Software vulnerabilities: remote infection by worms! • Crypto not enough alone to solve these problems!

Detailed Syllabus • No textbook • Readings: research papers on real, built distributed systems that illustrate concepts • You must read papers by day assigned! • Lectures will assume you have. • Lectures: (some) background for papers; review system from paper; discuss system • See schedule on course web page

How Will You Be Evaluated? • 2 courseworks, 15% total • One will involve significant programming: 10% • Out: 7th Oct, Due: 28th Oct • Other will be written problem set: 5% • Out: 27th Nov, Due: 11th Dec • 85% final exam • 2.5 hours; rubric: 3 of 5 questions • M030 (4th-years): must get >= 40% to pass • Overall: • M030 (4th-years): must get >= 40% mean to pass • GZ03 (DCNDS, SSE): must get >= 50% mean to pass

Coursework Policies • Courseworks for GZ03/M030 are individual • Assessed • No collaboration permitted • Every line of code you submit must be written by you alone • Do not show your code to other students • For non-programming coursework, do not discuss solutions with other students • Cite all material you used to produce solution • Lateness policy • <= 2 working days late: 10% reduction in marks • > 2 working days late: 0 marks, fail coursework • Always communicate emergencies to instructor early—well before deadline, if at all possible Courseworks are challenging! Start them on day handed out!

Distributed Systems and Security: An Introduction

Distributed Systems and Security: An Introduction

Presentation Transcript

Chapter 15: Security

Distributed Operating Systems Spring 2004

Secure Operating Systems

Midterm Review CS 230 – Distributed Systems (ics.uci/~cs230)

Determining Global States of Distributed Systems

Distributed Systems

CS60002 Distributed Systems

CS 333 Introduction to Operating Systems Class 20 - Security

15-446 Distributed Systems Spring 2009

Subject Revision Distributed Systems: Principles and Paradigms

Massively Parallel/Distributed Data Storage Systems

Access Control, Operating System Security, and Security System Design

Introduction Background Distributed DBMS Architecture Distributed Database Design

Distributed Databases

End to End Security and Privacy in Distributed Systems and Cloud

Outline

DISTRIBUTED COORDINATION BASED MODELS

Security of Shared Data in Large Systems

Distributed Systems

Chapter 23

Unit 7 Organisational Systems Security

Security