1 / 43

Introduction to Google App Engine

Introduction to Google App Engine. Google App Engine. Does one thing well: running web apps Simple app configuration Scalable Secure. Google App Engine. GAE is part of Google Cloud and is Platform As A Service cloud (PAAS).

aeaves
Download Presentation

Introduction to Google App Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Google App Engine

  2. Google App Engine • Does one thing well: running web apps • Simple app configuration • Scalable • Secure

  3. Google App Engine

  4. GAE is part of Google Cloud and is Platform As A Service cloud (PAAS) • Using Googles Infrastructure to host and build your Web Applications • Free Account --- for limited bandwidth ....see http://code.google.com/appengine/ for details

  5. infrastructure vs. platform - What is “The Platform”? Application-specific code Libraries: shared by multiple applications Platform: same for all applications infrastructure: hidden by platform

  6. What does GAE Provide

  7. GAE provides ---why not on your own? • how many developers or architects have experience and the mindset to build applications that support 100s of thousands of concurrent users up all the time? • Scaling Big is Really Hard • "Commoditization of Software Architecture and Scaling Skills" • Horizontal scaling model • this is not a model that most web application developers have experience with. • instead of using more capable hardware (vertical scaling), you use more instances of less-capable hardware, each handling a slice of the work, often doing the same function (e.g. sliced between groups of users). • intent is to reduce centralization of resources • ultimate goal is to simply be able to add more instances of the hardware without limit to meet increased scale requirements.

  8. App Engine Does One Thing Well • App Engine handles HTTP(S) requests, nothing else • Think RPC: request in, processing, response out • Works well for the web and AJAX; also for other services • App configuration is dead simple • No performance tuning needed • Everything is built to scale • “infinite” number of apps, requests/sec, storage capacity • APIs are simple

  9. GAE has limitations with free account • What is Free and What is NOT • FREE: All applications have a default quota configuration, the "free quotas", which should allow for roughly 5 million pageviews a month for an efficient application. You can read more about system quotas in the quota documentation. • PAY FOR MORE: As your application grows, it may need a higher resource allocation than the default quota configuration provides. You can purchase additional computing resources by enabling billing for your application. Billing enables developers to raise the limits on all system resources and pay for even higher limits on CPU, bandwidth, storage, and email usage

  10. Services • URLFetch – fetch web resources/services • Images – manipulate images: resize, rotate, flip, crop • Google Accounts • Mail • XMPP – instant messages • Task Queue – message queue; allow integration with non-GAPPs • Datastore – managing data objects • Blobstore – large files, much larger than objects in datastore, use <key, object> to access

  11. What kind of “Apps” can Google App Engine support • What languages are supported:  python, java, php and Go

  12. App Engine Architecture (java) SDC: Secure data connector JDO: java data object JPA: java persistent API

  13. The App Engine Cron Service allows you to configure regularly scheduled tasks operate at defined times or regular intervals. tasks are commonly known as cron jobs. These cron jobs are automatically triggered by the App Engine Cron Service. For instance, you might use a cron job to send out an email report on a daily basis, or to update some cached data every 10 minutes, or refresh summary information once an hour. A cron job makes an HTTP GET request to a URL as scheduled. The handler for that URL executes the logic when it is called. A cron job request is subject to the same limits as those for push task queues. YAML is the abbreviated form of “YAML Ain’t markup language” is a data serialization language which is designed to be human -friendly and works well with other programming languages for everyday tasks.  Creating a cron job • Create the cron.yaml file in the root directory of your application (alongside app.yaml). • Add one or more <cron> entries to your file and define the necessary elements for your job, including the required <url> and <schedule> elements. The following example creates a basic cron job that runs daily: cron:- description: "daily summary job"  url: /tasks/summary  target: beta  schedule: every 24 hours

  14. The target specification is optional and is the name of a service/version. If present, the target is prepended to your app's hostname, causing the job to be routed to that service/version. If no target is specified, the job will run in the versions of the default service that are configured for traffic. • Create a handler for the cron job URL. The handler should execute any tasks that you want scheduled. • The handler should respond with an HTTP status code between 200 and 299 (inclusive) to indicate success. • Other status codes can be returned and can be used to retry the cron job. • Task queues let applications perform work, called tasks, asynchronously outside of a user request. If an app needs to execute work in the background, it adds tasks to task queues. The tasks are executed later, by worker services.

  15. JDO details • Developing applications is, in general, a complicated task, involving many components. Developing all of these components can be very time consuming. The Java Data Objects API (JDO) was designed to alleviate some of this time spent, providing an API to allow java developers to persist object-oriented data into any database, and providing a query language using the same Java syntax as the developer is already familiar with. • DataNucleus JDO provides an implementation of this JDO standard, allowing you, the user, to persist your object-oriented data to not only the RDBMS datastores the standard was intended for, but also to a wide range of other datastores. These include popular map stores such as Cassandra and HBase, the Neo4j graph store, spreadsheets in Excel or OpenDocument formats, JSON formatted Amazon and Google Storage options, the popular MongoDB JSON-like document store, as well as ubiquitous LDAP and more besides.

  16. JDO in GAE • Java Data Objects (JDO) is a standard interface for accessing databases in Java, providing a mapping between Java classes and database tables. There is an open-source plugin available for using JDO with Datastore. • The App Engine Java SDK includes version 2.x of the DataNucleus plugin for App Engine. This plugin corresponds to version 3.0 of the DataNucleus Access Platform, which enables you to use the App Engine Datastore via JDO 3.0. • See the Access Platform 3.0 documentation for more information about JDO. In particular, see JDO Mapping and JDO API.

  17. GAE architecture featuring java

  18. Blob store The Blobstore API allows your application to serve data objects, called blobs, that are much larger than the size allowed for objects in the Datastore service. Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files. Blobs are created by uploading a file through an HTTP request. Typically, your applications will do this by presenting a form with a file upload field to the user. When the form is submitted, the Blobstore creates a blob from the file's contents and returns an opaque reference to the blob, called a blob key, which you can later use to serve the blob. The application can serve the complete blob value in response to a user request, or it can read the value directly using a streaming file-like interface.

  19. Blobstore Introducing the Blobstore Google App Engine includes the Blobstore service, which allows applications to serve data objects limited only by the amount of data that can be uploaded or downloaded over a single HTTP connection. These objects are calledBlobstore values,orblobs. Blobstore values are served as responses from request handlers and are created as uploads via web forms. Applications do not create blob data directly; instead, blobs are created indirectly, by a submitted web form or other HTTPPOSTrequest. Blobstore values can be served to the user, or accessed by the application in a file-like stream, using the Blobstore API.

  20. GAE: Java or python? Or how about php or Go • First Question: What do you or your developers know? • Benefit of Python: powerful python syntax, library, possibly shorter code • Benefit of Java: rich language that is mature, many packages, can use JDO/JPA • Better portability if you need to use Bigtable to store data

  21. Java on GAE • Java Servlets, JSPs on Google AppEngine • You provide your app's servlet classes, JavaServer Pages (JSPs), static files and data files, along with the deployment descriptor (the web.xml file) and other configuration files, in a standard WAR directory structure. • App Engine serves requests by invoking servlets according to the deployment descriptor.

  22. Scaling and resulting app limitations

  23. Scaling in GAE how is it achieved ---leads us to some limitations in coding • Low-usage apps: many apps per physical host • High-usage apps: multiple physical hosts per app • Stateless APIs are trivial to replicate • Datastore built on top of Bigtable; designed to scale well • Abstraction on top of Bigtable • API influenced by scalability • No joins • Recommendations: denormalize schema; precompute joins

  24. Restrictions on Java in GAE – a new way of thinking to get scalability • To allow App Engine to distribute requests for applications across multiple web servers, and to prevent one application from interfering with another, the application runs in a restricted "sandbox" environment. • The JVM runs in a secured "sandbox" environment to isolate your application for service and security. • The sandbox ensures that apps can only perform actions that do not interfere with the performance and scalability of other apps.

  25. Restrictions on Java in GAE – a new way of thinking to get scalability • An app cannot spawn threads, write data to the local file system or make arbitrary network connections. (cant allow to store to local file system when things are distributed ---could be problems...you need to use the datastore instead) • HOWEVER--Apps use the URL Fetch service to access resources over the web, and to communicate with other hosts using the HTTP and HTTPS protocols. Java apps can simply usejava.net.URLConnection and related classes from the Java standard library to access this service.  • app also cannot use JNI or other native code.

  26. Again What you CAN NOT DO!!! • write to the filesystem. SOLUTION --- Applications must use the App Engine datastore for storing persistent data. Reading from the filesystem is allowed, and all application files uploaded with the application are available. • open a socket or access another host directly. SOLTUION --- An application can use the App Engine URL fetch service to make HTTP and HTTPS requests to other hosts on ports 80 and 443, respectively. • spawn a sub-process or thread. CAVEAT A web request to an application must be handled in a single process within a few seconds. Processes that take a very long time to respond are terminated to avoid overloading the web server. • make other kinds of system calls.

  27. More on Scaling

  28. Automatic Scaling to Application Needs • You don’t need to configure your resource needs • One CPU can handle many requests per second • Apps are hashed onto CPUs: • One process per app, many apps per CPU • Creating a new process is a matter of cloning a generic “model” process and then loading the application code (in fact the clones are pre-created and sit in a queue) • The process hangs around to handle more requests (reuse) • Eventually old processes are killed (recycle) • Busy apps (many QPS query per sec) get assigned to multiple CPUs • This automatically adapts to the need • as long as CPUs are available

  29. Preserving Fairness Through Quotas • Everything an app does is limited by quotas, for example: • request count, bandwidth used, CPU usage, datastore call count, disk space used, emails sent, even errors! • If you run out of quota that particular operation is blocked (raising an exception) for a while (~10 min) until replenished • Free quotas are tuned so that a well-written app (light CPU/datastore use) can survive a moderate “slashdotting” • The point of quotas is to be able to support a very large number of small apps (analogy: baggage limit in air travel) • Large apps need raised quotas • currently this is a manual process (search FAQ for “quota”) • in the future you can buy more resources

  30. What do you mean can’t spawn new threads???

  31. No creating new threads!!! • A Java application cannot create a new java.lang.ThreadGroup nor a new java.lang.Thread. These restrictions also apply to JRE classes that make use of threads. For example, an application cannot create a new java.util.concurrent.ThreadPoolExecutor, or a java.util.Timer. An application canperform operations against the current thread, such as Thread.currentThread().dumpStack(). • SOLUTIONS -- here it is to rethink the use of threads • make the separate threads web services (other apps) you call and invoke and get back results • Task queues+ Task options • If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. The app adds tasks to task queuesto be executed later.  • (PYTHON: http://code.google.com/appengine/docs/python/taskqueue/) • (JAVA: http://code.google.com/appengine/docs/java/taskqueue/overview.html) • MapReduce on Google AppEngine (Distributed Computing)-- http://code.google.com/p/appengine-mapreduce/ • PYTHON ONLY http://www.youtube.com/watch?v=EIxelKcyCC0 ,http://code.google.com/appengine/docs/python/dataprocessing/overview.html • JAVA documentation http://code.google.com/p/appengine-mapreduce/wiki/UserGuideJava

  32. The solutions to no new threads • SOLUTIONS -- here it is to rethink the use of threads • Solution 1: make the separate threads web services (other apps) you call and invoke and get back results • Solution 2: Task queues+ Task options • If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. The app adds tasks to task queuesto be executed later.  • (PYTHON: http://code.google.com/appengine/docs/python/taskqueue/) • (JAVA: http://code.google.com/appengine/docs/java/taskqueue/overview.html) • MapReduce on Google AppEngine (Distributed Computing)-- http://code.google.com/p/appengine-mapreduce/ • PYTHON ONLY http://www.youtube.com/watch?v=EIxelKcyCC0 ,http://code.google.com/appengine/docs/python/dataprocessing/overview.html • JAVA documentation http://code.google.com/p/appengine-mapreduce/wiki/UserGuideJava

  33. Another Limitation ---your app must respond in a certain time

  34. If your app fails to respond in a certain time you will have problems –your app will be timed out by GAE • All requests (including tasks) in app engine have a time limit of XXX seconds (30 seconds --but, see current limits on google documentation). If your calculations will take longer than that, you will need to figure out a way to break them down into smaller chunks. App engine's sweet spot is web apps, not number crunching. • WHY are they doing this? They want to serve the most web apps they can on their platform • Solution: again task queues for longer processing needs\

  35. What about Data

  36. GAE Datastore (storage organization) • Data model • Property, entity, entity group • Schemeless: properties can have different types/meanings for different objects • Allow (1) object query (2) SQL-like query • Transaction • Can be applied to a group of operations • Persistent store (check BigTable paper) • Strongly consistent • Not relational database • Index built-in • Memcache • Caches objects from bigtable, to improve performance

  37. Hierarchical Datastore • Entities have a Kind, a Key, and Properties • Entity ~~ Record ~~ Python dict ~~ Python class instance • Key ~~ structured foreign key; includes Kind • Kind ~~ Table ~~ Python class • Property ~~ Column or Field; has a type • Dynamically typed: Property types are recorded per Entity • Key has either id or name • the id is auto-assigned; alternatively, the name is set by app • A key can be a path including the parent key, and so on • Paths define entity groups whichlimit transactions • A transaction locks the root entity (parentless ancestor key) • Recall the chubby lock service in bigtable paper

  38. Indexes • Properties are automatically indexed by type+value • There is an index for each Kind / property name combo • Whenever an entity is written all relevant indexes are updated • However Blob and Text properties are never indexed • This supports basic queries: AND on property equality • For more advanced query needs, create composite indexes • SDK auto-updates index.yaml based on queries executed • These support inequalities (<, <=, >, >=) and result ordering • Index building has to scan all entities due to parent keys

  39. Free tier • First 5GB • Daily limits see Online for CURRENT QUOTAS

  40. What happens when you exceed your budget (free or what you set) • When an application consumes all of an allocated resource, the resource becomes unavailable until the quota is replenished. • This may mean that your application will not work until the quota is replenished.

  41. What happens when you exceed your budget (free or what you set) • For resources that are required to initiate a request, when the resource is depleted, App Engine by default returns an HTTP 403 Forbidden status code for the request instead of calling a request handler. The following resources have this behavior: • Bandwidth, incoming and outgoing • For all other resources, when the resource is depleted, an attempt in the application to consume the resource results in an exception. This exception can be caught by the application and handled, such as by displaying a friendly error message to the user. In the Python API, this exception is apiproxy_errors.OverQuotaError. In the Java API, this exception is com.google.apphosting.api.ApiProxy.OverQuotaException.

  42. Pricing • The part exceeding the free quota • User defined budget • Look On line for CURRENT PRICES

  43. Security • Constrain direct OS functionality • no processes, threads, dynamic library loading • no sockets (use urlfetch API instead) • can’t write files (use datastore) • disallow unsafe Python extensions (e.g. ctypes) • Limit resource usage • Hard time limit of 30 seconds per request • Most requests must use less than 300 msec CPU time • Hard limit of 1MB on request/response size, API call size, etc. • Quota system for number of requests, API calls, emails sent, etc • Free use for 500MB data and 5M requests per month • 10 applications per account

More Related