Reliability in cloud and mobile apps

Doh! Reliabilityin cloud and mobile apps http://www.flickr.com/photos/johanl/4934459020

Traditional client-server vs cloud • Traditional client-server • Usually highly-reliable server available on demand • Cloud • Garbage hardware that could fail at any time • Challenge: ensure reliability of apps nonetheless

BTW, cloud servers aren’t necessarily very well-configured, either • Example GAE servers • 128MB-1GB RAM; 600MHz-4.8GHz CPU https://developers.google.com/appengine/docs/java/config/backends • Amazon EC2 “medium” servers • 3.75 GB RAM; 2.0-2.4 GHz 2007 Opteron CPU http://aws.amazon.com/ec2/instance-types/ • My cheap, busted up, 4-year old laptop • 3 GB RAM; 2x2.4 GHz Intel (Core Duo) CPU cores May 14, 2012

Yet, from the current GAEService Level Agreement (SLA) They really mean to be highly reliable!!!!So how do they do it? How can you make the most of it? https://developers.google.com/appengine/sla

SLAs often quote reliability as “nines” • Two nines: 99%, 3.65 days downtime every year • Easy to do with cheap hardware + backup • Three nines: 99.9%, 8 hours every year • Can be done with reasonably good hardware • Four nines: 99.99%, < 1 hour every year • Not all systems can do this • Fine nines: 99.999%, 7 minutes every year • Very hard to achieve, and very expensive • Each “nine” approximately doubles the cost

Key reliability principles • Replication • Provide a means for monitoring • Consider using a hybrid cloud

Replication of computation • GAE automatically will copy your code • Starting up multiple servers to handle requests • If your server generally responds quickly to requests • And there is extra hardware available at the moment • Automatically balancing load Replication Monitoring  Hybridize

Data also needs replication • You can control the level of replication • Old-fashioned (traditional client-server) • Set up a “master” database server • Configure the master to copy its data to “slaves” (e.g., every night) • Cloud-based approach • Let the infrastructure replicate data automatically • GAE: You have two options… master/slave, and high-replication datastore Replication Monitoring  Hybridize

High-replication datastore (HRD) vs master/slave datastore (MSD) • HRD makes backup copies across datacenters (and > 2 copies—MSD has only 2 copies) • HRD includes a more sophisticated algorithm for resolving errors on (some) servers • MSD: writes all go to the master (if available); master copied to slaves; reads all go to the master (if available) [Deprecated!] • HRD: more sophisticated algorithm where the different servers (no master) form a consensus Replication Monitoring  Hybridize

Pros of using HRD • Pro: Reliability is vastly improved • Largely due to replication of data across datacenters • Pro: support for cross-group transactions in Python • Apparently? Test before relying on it! • Maybe available in Java? • Config change needed? https://developers.google.com/appengine/docs/python/datastore/overview#Cross_Group_Transactions Replication Monitoring  Hybridize

Cons: Latency and eventual consistency • Con: Latency can be pretty big (> 1 second) • Writes (and reads) go to multiple servers, multiple datacenters • Con: Data just written might not appear in a read • GAE might write to server X but then read from Y • Data on X might not be copied to Y right away

Coping with problem #1, latency • Cache a copy of data on client • Eliminates the need to hit the server • Bonus: improves reliability when server is offline • Write a copy to memcache • So you can read back faster • Only do this for data you read a lot, of course Replication Monitoring  Hybridize

Coping with problem #2, writes not appearing on read • Don’t assume that an entity you just wrote will immediately appear in a query (in HRD) • Wait a few seconds to read back • Or automatically append the written entity to the query results if you don’t see it Replication Monitoring  Hybridize

Example pseudocode(must be fancier for sorted queries) Course mycourse = create a new entity pm.makePersistent(mycourse) List<Course> courses = query for courses booleansawit = false foreach (Course course in courses) if (course.id == mycourse.id) {sawit = true; break;} If !(sawit) courses.add(mycourse); Foreach (Course course in courses) do something with course

Coping with another reliability problem (#3), exception on commit • If you use transactions (locks), you will get exceptions on multiple simultaneous writes • True for MSD, HRD, or any other platform that relies on optimistic locking • Use a try/catch/retry approach • Repeatedly try to write your updates if they fail on the first try Replication Monitoring  Hybridize

Example pseudocode int retries = 10 while (--retries >= 0) { try { Start transaction Course mycourse = get the course entity make modifications to mycourse pm.makePersistent(mycourse) commit transaction retries = 0 } catch (JDOException) { log the exception } }

Monitoring • You should provide a means of monitoring your system’s uptime • Common approach: Script on client elsewhere • Could be another cloud service (e.g., EC2) • Script accesses the server • Client tracks success rate + latency Replication  Monitoring Hybridize

What to monitor • The services of the application itself • You probably need to include some test data • Also three other “dummy” services • One that just returns • One that reads from datastore • One that writes to datastore and reads back Replication  Monitoring Hybridize

Things you can do with data • Detect when one/some of your application’s services have crashed • Or are getting slow • Detect if any problems are your fault • i.e., one of your own application’s services has failed but the dummy services are working • Decide whether/when/how to redesign • Changes to your own application • Integrate a different cloud platform Replication  Monitoring Hybridize

Consider using a hybrid cloud • Distributing code and data across platforms • Example: EC2 + GAE • Example: EC2 + your own servers • Ways that hybrid can help • Taking advantage of specialized APIs • Fail-over when one platform fails • Protecting access to data Replication  Monitoring  Hybridize

Hybrid cloud scenario #1 • Your application analyzes some binary files. The analyzer code only runs on Windows. Unfortunately, Azure is very expensive. • Solution: • Deploy the analyzer on Azure • Expose its functionality via network calls • Deploy most of the code on GAE (nice and cheap) • The GAE part of the application calls the Azure part of the application and stores result in GAE Replication  Monitoring  Hybridize

Hybrid cloud scenario #2 • Your application is on EC2 and has demonstrated high performance + reliability. But the outage a few years back scared your manager. • Solution: • Tweak the application to run on GoGrid (very similar to EC2) • But continue hosting on EC2, where your application has shown excellent performance. • Tweak your client so that if your EC2 server stops responding, then it calls GoGrid instead • Write scripts on GoGrid and EC2 to sync data. Replication  Monitoring  Hybridize

Hybrid cloud scenario #3 • Some of your data is very sensitive and cannot be trusted to cloud providers. Other data and associated computations are not sensitive and have periodic demand spikes. • Solution: • Deploy the sensitive data on your server and the not-so-sensitive data+computation on cloud. • In your client, invoke the company server for computations on sensitive data and invoke cloud servers for not-so-sensitive data+computation. Replication  Monitoring  Hybridize

Key reliability principles • Replicate • Replicate your code • Use the high-replication datastore • Be prepared to cope with problems • Replicate data to client and memcache • Detect and handle writes-not-appearing-on-read • Try/catch/retry approach to handle failure • Provide a means for monitoring • Consider using a hybrid cloud • For APIs, fail-over, securing data

Reliability in cloud and mobile apps

Reliability in cloud and mobile apps

Presentation Transcript

Mobile Marketing and Mobile Apps

Mobile Apps

interoperability in cloud and mobile apps

Cloud Reliability and Security

Mobile Apps!

security in cloud and mobile apps

Mobile Apps

Quality Attributes of cloud and mobile apps

Reliability in Cloud

Mobile Apps

portability in cloud and mobile apps

performance in cloud and mobile apps

Mobile Apps in Education

Mobile Apps in Education

SQLite in Mobile Apps

mobile apps

Mobile Apps

Cloud Based Performance Testing of Mobile Apps

mobile apps