360 likes | 468 Views
Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi. Our Perspective. Inktomi builds two distributed systems: Global Search Engines Distributed Web Caches Based on scalable cluster & parallel computing technology
E N D
Invariant BoundariesDr. Eric A. BrewerProfessor, UC BerkeleyCo-Founder & Chief Scientist, Inktomi
Our Perspective • Inktomi builds two distributed systems: • Global Search Engines • Distributed Web Caches • Based on scalable cluster & parallel computing technology • But very little use of classic DS research...
“Distributed Systems” don’t work... • There exist working DS: • Simple protocols: DNS, WWW • Inktomi search, Content Delivery Networks • Napster, Verisign, AOL • But these are not classic DS: • Not distributed objects • No RPC • No modularity • Complex ones are single owner (except phones)
Concept: Invariant Boundaries • Claim: we don’t understand boundaries • Solution: make “Invariant Boundary” explictit • Goal: simpler, easier, faster to correctness Invariantmay not hold InvariantHolds
Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries
Santa Clara Cluster • Very uniform • No monitors • No people • No cables • Working power • Working A/C • Working BW
Boundaries for Kinds of State • Stateless • Front ends • Immutable state • Soft State (rebuild on restart) • Durable Single-writer (e.g. user data) • Emissaries, horizontal partitioning • Durable MW • Fiefdoms, DBMS
Persistent State is HARD • Classic DS focus on the computation, not the data • this is WRONG, computation is the easy part • Data centers exist for a reason • can’t have consistency or availability without them • Other locations are for caching only: • proxies, basestations, set-top boxes, desktops • phones, PDAs, … • Distributed systems can’t ignore location • Invariant Boundary is small
Workstations & PCs AP AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code Berkeley Ninja Architecture Base: Scalable, highly-available platform for persistent-state services Internet PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc.
Workstations & PCs AP AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code Berkeley Ninja Architecture Base: Scalable, highly-available platform for persistent-state services Consistent Shared Single user Internet Soft-state, immutable state PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc.
Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries
Data is only consistent *inside* Data is not consistent(“reference” data) Consistent A
Data is only consistent *inside* Data is not consistent(“reference” data) Consistent B Consistent A
Consistency Availability Tolerance to networkPartitions The CAP Theorem Theorem: You can have at most two of these invariants for any shared-data system
Consistency Availability Tolerance to networkPartitions The CAP Theorem Theorem: You can have at most two of these invariants for any shared-data system Corollary: consistency boundary must choose A or P
Consistency Availability Tolerance to networkPartitions Forfeit Partitions Examples • Single-site databases • Cluster databases • LDAP • Fiefdoms Traits • 2-phase commit • cache validation protocols • The “inside”
Consistency Availability Tolerance to networkPartitions Forfeit Availability Examples • Distributed databases • Distributed locking • Majority protocols Traits • Pessimistic locking • Make minority partitions unavailable
Consistency Availability Tolerance to networkPartitions Forfeit Consistency Examples • Coda • Web cachinge • DNS • Emissaries Traits • expirations/leases • conflict resolution • Optimistic • The “outside”
ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution(e.g. schema) “small” Invariant Boundary The “inside” BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) “Simpler” and faster Easier evolution (XML) “wide” Invariant Boundary Outside consistency boundary ACID vs. BASE but it’s a spectrum
Consistency Boundary Summary • Can have consistency & availability within a cluster. No partitions within boundary! • OS/Networking better at A than C • Databases better at C than A • Wide-area databases can’t have both • Disconnected clients can’t have both
Three Basic Issues • Where is the state? • Consistency vs. Availability • Communication Boundaries
The Boundary • The interface between two modules • client/server, peers, libaries, etc… • Basic boundary = the procedure call • thread traverses the boundary • two sides are in the same address space • What invariants don’t hold across? C S
Different Address Spaces • What if the two sides are NOT in the same address space? • IPC or LRPC • Can’t do pass-by-reference (pointers) • Most IPC screws this up: pass by value-result • There are TWO copies of args not one • What if they share some memory? • Can pass pointers, but… • Need synchronization between client/server • Not all pointers can be passed
Partial Failure • Can the two sides fail independently? • RPC, IPC, LRPC • Can’t be transparent (like RPC) !! • New exceptions (other side gone) • Idempotent calls? • Use Transaction Ids (to solve replay problem) • Reclaim local resources • e.g. kernels leak sockets over time => reboot • RPC tries to hide these issues (but fails) • Use Level 4/7 switches to hide failures?
Resource Allocation • How to reclaim resources allocated for client? • Usually timeout exception cleans up • Release locks? (must track them!) • How to avoid long delays while holding resources? • How long to remember client? • Delayed responses (past timeout) must be ignored • Problem with leases: • Great for servers, but… • Client’s lease may expire mid operation • Hard to make client updates atomic with multiple leases (2PC?) • Which things have leases? (can be hidden)
Trust the other side? • What if we don’t trust the other side? • Or partial trust (legal contract), or malicious? • Have to check args, no pointer passing • Limited release of information (leaks) • Kernels get this right: • copy/check args • use opaque references (e.g. File Descriptors) • Most systems do not • TCP, Napster, web browsers • Security boundaries tend to be explicit • Holes come from services!
Multiplexing clients? • Does the server have to: • Deal with high concurrency? • Say “no” sometimes (graceful degradation) • Treat clients equally (fairness) • Bill for resources (and have audit trail) • Isolate clients performance, data, …. • These all affect the boundary definition
Boundary evolution? • Can the two sides be updated independently? (NO) • The DLL problem... • Boundaries need versions • Negotiation protocol for upgrade? • Promises of backward compatibility? • Affects naming too (version number)
Example: protocols vs. APIs • Protocols have been more successful than APIs • Some reasons: • protocols are pass by value • protocols designed for partial failure • not trying to look like local procedure calls • explicit state machine, rather than call/return(this exposes exceptions well) • Protocols still not good at trust, billing, evolution
Example: XML • XML doesn’t solve any of these issues • It is RPC with an extensible type system • It makes evolution better? • two sides need to agree on schema • can ignore stuff you don’t understand • Must agree on meaning, not just tags • Can mislead us to ignore/postpone the real issues
Example: services • Claim: you can’t magically convert a class to a service • Behavior depends on the boundaries that callers cross…. • Trusted? Multiplexed? Partial failure? Namespaces? • Shouldn’t TRY to be transparent • Instead: make it easier to state boundary assumptions (and check them)
Annotated Interfaces • IDL can annotate interfaces: • “Timeout” => client may not respond • “Trusted” => no malicious clients • “Malicious’ => client may be hostile • “Multiplexed” => many simultaneous callers • “Idempotent” • Etc. • Could be checked at statically in some cases, dynamically in others • Annotations + Boundaries => fewer bugs
Lessons for Applications • Make boundaries very explicit • Not just client/server • Independent systems • Third-party software • Third-party services (RPC to vendors/partners) • Have a few big modules… • Otherwise too many boundaries, and no invariants • Examples: Apache, Oracle, Inktomi search engine • Big Modules are well supported • Big Modules justify their cost (Lampson)
Partial checklist • What is shared? (namespace, schema?) • What kind of state in each boundary? • How would you evolve an API? • Lifetime of references? Expiration impact? • Graceful degradation as modules go down? • External persistent names? • Consistency semantics and boundary?
Conclusions • Most systems are fragile • Root causes: • False transparency: assuming locality, trust, privacy… • Implicitlychanging boundaries: consistency, partial failure, … • Some of the causes: • focus on computation, not data • ignoring location distinctions • overestimating consistency boundary • degraded boundaries (RPC is not a PC) • Invariant Boundaries • Help understanding, documentation • Simplify detection • Simpler and easier than full specifications (but weaker)
ACID vs. BASE • DBMS research is about ACID (mostly) • But we forfeit “C” and “I” for availability, graceful degradation, and performance This tradeoff is fundamental. BASE: • Basically Available • Soft-state • Eventual consistency