600 likes | 692 Views
A Brief Introduction To High Assurance Cloud Computing With Isis2. Cornell University. Ken Birman. Isis 2 System. A prebuilt technology that automates many of the hard tasks involved in replicating services and the data on which they depend Targets cloud computing settings
E N D
A Brief Introduction To High Assurance Cloud Computing With Isis2 Cornell University Ken Birman
Isis2 System • A prebuilt technology that automates many of the hard tasks involved in replicating services and the data on which they depend • Targets cloud computing settings • Available in open-source from isis2.codeplex.com • Intended to be easy to use… • … but lacks commercial support, and also lacks some of the polish of a commercial product
Isis2 System • Elasticity (sudden scale changes) • Potentially heavily loads • High node failure rates • Concurrent (multithreaded) apps • Long scheduling delays, resource contention • Bursts of message loss • Need for very rapid response times • Community skeptical of “assurance properties” • C# library (but callable from any .NET language) offering replication techniques for cloud computing developers • Based on a model that fuses virtual synchrony and state machine replication models • Goal is to make it easy for users to deal with some very hard problems seen in distributed settings
Supported languages… • The main library is written in .NET / C# • …and so it is best used from C# • Evolved from Java • IronPython also works, and you can use a version of C++ called C++/CLI • .NET runs on Windows • Our developers use Visual Studio • … but also can be used on Linux • You work with Mono and use the Mono developer environment or compile using “mcs” at the command line
Other ways to access Isis2… • New “outboard” option • A recent addition…. It allows native C++ and C users to access a subset of the Isis2 functionality • Initial focus is on file replication for memory-mapped files and other big memory-mapped objects • This outboard feature can be accessed from a Isis2command-line tool • To do so you run the Isis2 system as a server • Commands to it can then be issued from any language
Why Isis2? • The new version of Isis is a completely new system, but builds on a long history • First version was the Isis Toolkit (~1985-1996) • That system was used to create the French Air Traffic Control system, the NYSE trading platform, the US Navy AEGIS core communications library and many other mission-critical applications • After the Isis Toolkit, we created Horus and Ensemble (focus was on ties to formal methods) • Isis2 is our latest and best technology. Pronounced “Isis two”, not “Isis squared”
What does Isis2 do? • A library to help you create highly resilient, secure, scalable applications. The methods automate important tasks • You can run your program on a few machines of your own, or on a big cluster, or in an enterprise network, or even on a cloud platform like Amazon EC2 • Isis2 is a “specialist” in solving problems involving data replication within “groups” of programs
Key Concept: Replication Imagine that some collection of programs are running on some set of machines. Data is said to be replicated if each of them has a copy, and sees the updates to that copy
Replication in pictures • This is an example of a non-replicated server with a client who issues some form of request Time Update the monitoring and alarmscriteria for Mrs. Marsh as follows… Service instance Response delay seen by end-user would include Internet latencies Service response delay Confirmed
Replication in pictures • Here we replicate data on multiple programs. The group of programs offers better reliability than a program, and could perform better too Update the monitoring and alarmscriteria for Mrs. Marsh as follows… Service group Response delay seen by end-user would include Internet latencies Service response delay Confirmed
Things to notice • Although there is a group of programs, the client still talks to some representative. • Standard “web services” requests work in the usual way • The group members are just normal programs and could be running on the same machines, or different machines. They could be running the same code, or different code. • They cooperate to replicate data • No variables or memory is physically shared by the programs. • Instead, each has a private replica. When an update happens, a message is sent to tell all the group members update their copies
Steps in using Isis2 The big picture…
First questions • Step back and draw a sketch of your application • Client systems, if it has remote access • The service you want to build that will use Isis2 • Data files the service may need, or databases • Other subsystems or services it will interact with
Our example from earlier Request Service group Updates that modify replicated data Computes Response
Where will the service run? • Will you run your replicated service… • Next to its clients, on the same machines? • On EC2, RackSpace, GooglePlex, MSFT Azure, ….? • In a cluster of computers up the hallway? • How will clients connect to it? • Web Services remote method invocation? • CORBA RPC? TCP connections?
Now build a non-replicated instance • Before adding Isis2 mechanisms, build a single instance of your server and test access to it from your client, or whatever else might access it • Debug this code. • Adding Isis2 functionality after the fact isn’t hard and may really be much easier than using it from the outset! • But planning ahead can help: using C#, C++/CLI or IronPython will be far easier than other options
… some people get stuck at step 1! • Building a cloud-hosted service involves steps you may never have tried before • For example, with Visual Studio, you create a new kind of service instance, and need to tell it which methods are remotely available, and then need to install and run it on your machine, which may involve firewall changes to allow clients to access it… • … Isis2 can’t really help with that. But it isn’t terribly hard. Just do a small step at a time following the instructions for the package you decide to use
Why not native C or native C++? • We do support a stripped-down way to use Isis2 from these and other languages, but you only can access part of the functionality • We’ll discuss these features later, but not today • If you know Java, you will find it easy to migrate to C#, and we would recommend that route • C# works on both Windows and Linux (via Mono)
Designing the replicated functions • Now you are ready to design the replication features of your service • Ask what data needs to be replicated? • How will it be loaded initially? How will it be updated? • Will you do load-balanced queries (if so they can just access any service member), or parallel queries? • What synchronization or coordination is needed? • Start with a good image of how you want it to work
State machine model • Think about a state machine • Some sort of object or class, with state in it (data) • The state is updated deterministically with events occur (updates…), and it can be checkpointed and later loaded back in from the checkpoint • With Isis2 we simply replicate a state machine! • You design an object or subsystem that has the replicated data in it • The updates become multicasts, totally ordered, and also group membership changes
Replication in pictures • By the time you are ready to code, you’ll have a picture similar to this one, specific to your setting Update the monitoring and alarmscriteria for Mrs. Marsh as follows… Service group Ordered multicast to replicate update Response delay seen by end-user would include Internet latencies Service response delay Confirmed Hosted on EC2
From this picture… • Now your needs are more concrete • How will I represent the replicated state in my group? • How can I copy the current data to a joining member? • I don’t want to reply to the client until all the replicas have been updated. What’s the best way to do that? • I’m worried about speed. What will limit response delays? How fast can a service like this run? • This set of MOOC modules will help you answer such questions, but experiments and optimization of your solution are certainly needed too…
Let’s replicate! (some data…)
How would you solve this problem? • Assume you have access to a client-server technology, like the ones used to build web services • You would need a list of group members • Then when an update happens, the program receiving the request could update the others
Replication version 0 X=5 X=5 X=5 X=5 X=155 Set x=x+150 X=155 X=155 X=155
Things you’ll need to think about • Keep track of group membership. Can it change? • If new members join, how do they learn the initial value of x, or other “replicated state”? • When someone joins, how do we update the group membership list? • If a member fails, how are they dropped from the group? • Implement the 1-to-(N-1) “multicast” • Handle reliability issues: what if a request times out? • Handle security: should we use SSL? Something else? • What if two conflicting updates happen concurrently?
This sounds hard! • … in fact it is very hard • Any form of replication is equivalent to a famous problem called distributed consensus • There are many published papers on this topic; you’ll want to use a good solution • Once you solve the basic issues you’ll still face many challenges of performance, scale, portability, respecting rules for the particular runtime setting…
Key Concept: Consistency We say that replicated data is consistent if multiple users accessing it can’t detect whether or not it was replicated.
Consistent replicated data X=5 X=5 X=5 X=5 X=155 Set x=x+150 X=155 X=155 X=155 X=22 Set x=22 X=22 X=22 X=22
Inconsistent replicated data X=5 X=5 X=5 X=5 X=155 Set x=x+150 X=155 X=155 X=22 Set x=22 Isis2 never lets this happen. It automatically enforces ordering for you X=22 X=22 X=22 X=155 Thinks x=155 Thinks x=22 Thinks x=22 Thinks x=22
Possible sources of inconsistency? • Conflicting updates could reach members in different orders • Maybe someone dropped an update (network message loss), or applied one twice. • Update sender could have failed while sending the updates, so that some copies were sent, but others weren’t sent • Confusion about membership: perhaps some member was joining and the update initiator didn’t realize it, hence didn’t send the update to it, or the initial state was “wrong”
A subtle risk • Failure handling poses hard problems! • Suppose that process A is supposed to send an update reliably to processes B, C and D • A might use TCP, or some other reliable protocol. But if process C fails, A needs to give up. • What if the network temporarily fails, and A can’t reach C? This is indistinguishable from C failing, except that later, C will be accessible again!
Split brain syndrome • We say that the network “partitioning” issue causes a “split brain” problem • Some members of our group think that A and B the healthy members. Others think that C and D are healthy and that A and B have failed. • Which “side” is right? • Who should be in charge if this group is doing something critical, like deciding which plane can land on a particular runway?
Split brain syndrome • Inconsistency can be very dangerous! Flight US 1827 clear to take off on runway 3-B Flight AA 27 clear to land on runway 3-B
Split brain syndrome Flight US 1827 clear to take off on runway 3-B A B C D • Consider this air traffic control “server”. It helps make sure runways are used safely.
Split brain syndrome Flight US 1827 clear to take off on runway 3-B Flight AA 27 clear to land on runway 3-B Isis2 never lets this happen. It automatically prevents split-brain behavior A B C D Temporary network failure: A and B can talk to each other but can’t reach C or D, and vice-versa
Other settings that need consistency • Medical systems that give information about patient status and current drug regimen • Smart power grid system that controls power infrastructure components such as transformers • Self-driving vehicles that coordinate while driving at high speed on highways • Control systems for chemical plants and refineries • Corporate accounting systems, and payroll • … you can make quite a long list.
Cloud computing: CAP theorem • The most common tools for building cloud computing solutions don’t help with consistency • They treat the property as a special need and assume you’ll find your own ways to do this • This reflects the so-called CAP theorem • “You can have at most two out of these three:Consistency, Availability and Partition Tolerance.” Eric Brewer, Berkeley • Cloud platforms assume you are not sophisticated enoughto make tradeoffs. They weaken consistency to get the fastest possible response times under the widest possible range of operating conditions.
Isis2 to the rescue! By using a preexisting library like Isis2 you don’t need to solve these problems yourself The system solves them for you In Egyptian mythology, Isis rescued Osiris after he was torn to pieces in an epic battle with Horus. She restored him to life and he went on to rule the underworld. Their child, Set, later defeated Horus and banished him. The story illustrates the value of fault-tolerance
Isis2 makes developer’s life easier Benefits of Using Formal model Importance of Sound Engineering • Formal model permits us to achieve correctness • Isis2 is based on protocols that can be expressed mathematically and proved correct • Think of Isis2 as a collection of modules, each with rigorously stated properties • Isis2 implementation needs to be fast, lean, easy to use • Developer must see it as easier to use Isis2 than to build from scratch • Seek great performance under “cloudy conditions” • Forced to anticipate many styles of use
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering aseen for event upcalls and the assumptions user can make
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Key Concept: Local Data Each replica is a completely standard program with its own private, local variables. Nothing is “physically shared”! So in particular, the variable “Values” is local: each program has its own copy of this variable The different copies get the same messages, in the same order. This allows them to apply the same updates.
C# Dictionary Generic Type • C# supports generic types • Objects in which some other type is a parameter: Dictionary<string,double> Values = new Dictionary<string,double>(); • A C# Dictionary maps a key to a value. This Dictionary maps strings to doubles. • The variable name is Values. Each program has its own private copy.
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Isis2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary<string,double> Values = new Dictionary<string,double>(); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, “Harry”, 20.75); List<double> resultlist = new List<double>(); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist); • First sets up group • Join makes this entity a member. State transfer isn’t shown • Then can multicast, query. Runtime callbacks to the “delegates” as events arrive • Easy to request security (g.SetSecure), persistence • “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
Concept: A “multi-query” • Our lookup is • Multicast to the group • All members respond • A chance for parallelism • Each can do part of the job: e.g. search 1/nth of a database • Reduces response delays Lookup “Harry” in the Ithaca phone directory Front end With n replicas... ... we get an n times speedup! Names with Harry in them: ....