170 likes | 331 Views
Ken Birman Professor, Dept. of Computer Science. A High-Assurance Cloud Computing Agenda. Context. Today’s cloud computing platforms are best for building “apps” like YouTube, web search Highly elastic, pipelined (“asynchronous”) services But very weak guarantees and limited security
E N D
Ken Birman Professor, Dept. of Computer Science A High-Assurance Cloud Computing Agenda
Context • Today’s cloud computing platforms are best for building “apps” like YouTube, web search • Highly elastic, pipelined (“asynchronous”) services • But very weak guarantees and limited security • The cloud comes with its own mantra: • Don’t use ACID! BASE is better… • CAP theorem proves it (or does it?)
The Wisdom of the Sages Cornell Dept of Computer Science Colloquium
eBay’s Five Commandments • As described by Randy Shoup at LADIS 2008 Thou shalt… 1. Partition Everything 2. Use AsynchronyEverywhere 3. Automate Everything 4. Remember: EverythingFails 5. EmbraceInconsistency Cornell Dept of Computer Science Colloquium
Vogels at the Helm • Werner Vogels is CTO at Amazon.com… • His first act? • Introduced a series of weak consistency options • Replaced the older strongly consistent “pub/sub” infrastructure with slower but more scalable one • In small systems, raw speed wins • In the cloud • Weaker forms of guarantees oftenscale far better than strong ones Cornell Dept of Computer Science Colloquium
James Hamilton’s advice • Key to scalability is decoupling, loosest possible synchronization • Any synchronized mechanism is a risk • His approach: create a committee • Anyone who wants to deploy a highly consistent mechanism needs committee approval …. They don’t meet very often Cornell Dept of Computer Science Colloquium
Consistency Consistency technologies just don’t scale! Sept 11, 2009 P2P 2009 Seattle, Washington Cornell Dept of Computer Science Colloquium
What’s consistency? A consistent distributed system will often have many components, but users observe behavior indistinguishable from that of a single-component reference system Reference Model Implementation Cornell Dept of Computer Science Colloquium
Our Perspective? • We’re being too quick to give up on consistency and other assurance properties • CAP, BASE are really about database consistency • Other very strong forms of consistency can be the foundation for a new science of highly assured, high speed, scalable cloud computing • We have the science to back our vision • The new Isis2 system makes it real
Highly Assured Cloud: Isis2 • Named for an old Cornell story • In 1990 our first Isis Toolkit became the core of the NYSE, French Air Traffic Control System and US Navy AEGIS • Isis2 : A completely new system but same idea • Makes it easy to create high-assurance cloud apps • Offers consistency, fault-tolerance, security • FreeBSD code release later this spring
Virtual synchrony meets Paxos (and they live happily ever after…) A=A+1 A=3 B=7 B = B-A Non-replicated reference execution • Virtual synchrony is a “consistency” model: • Synchronous runs: indistinguishable from non-replicated object that saw the same updates (like Paxos) • Virtually synchronous runs are indistinguishable from synchronous runs Synchronous execution Virtually synchronous execution Cornell Dept of Computer Science Colloquium
Example: Parallel search Group g = new Group(“/amazon/something”); g.register(LOOKUP, myLookup); Replies = g.query(LOOKUP, “Name=*Smith”); public void myLookup(string who) { divide work into viewSize() chunks this replica will search chunk # getMyRank(); ….. reply(myAnswer); } • g.callback(myReplyHndlr, Replies, typeof(double)); • public void myReplyHndlr(double[] fnd) { • foreach(double d in fnd) • avg += d; • … • }
Scalable Aggregation • Used if group is really big • Request, updates: still via multicast • Response is aggregated within a tree query va vb vc vd Level 0 a b c d Agg(vavb) Agg(vcvd) a c Level 1 Example: nodes {a,b,c,d} collaborate to perform a query Level 2 a Agg(vavbvcvd) reply
Aggregated Parallel search Group g = new Group(“/amazon/something”); g.register(LOOKUP, myLookup); Replies = g.query(LOOKUP, 27, “Name=*Smith”); public void myLookup(int rid, string who) { divide work into viewSize() chunks this replica will search chunk # getMyRank(); ….. SetAggregateValue(myAnswer); } Rval = GetAggregateResult(27); Reply(Rval/DatabaseSize); • g.callback(myReplyHndlr, Replies, typeof(double)); • public void myReplyHndlr(double[] fnd) { • The answer is in fnd[0]…. • }
Our Early Users? • Partnering with Cisco to apply these ideas in core Internet routers (NEBULA/R3 projects) • Creating a continuously available CRS-1 story • Close dialogs with Microsoft, IBM, Intel • Funding from National Science Foundation, Air Force, talking to DARPA and ARPAe • Government, military and smart power grid will all need highly assured cloud options
Challenge of the week • Debugging a system that targets thousands of nodes with tens of cores each is hard! • We benefit from our own strong model • But physical access to non-virtualized large-scale systems is “difficult” today • And many block IPMC and UDP • Better tools will need to be part of a better assurance property • Else we know how it should work but not how it does work, or even whether it works correctly!
Summary? • The word on the street is that cloud computing will rule but that the cloud can’t do high assurance • At Cornell we just don’t believe that • Not long from now we’ll put a solution in your hands showing how it can be done