Why migrate my apps to the Cloud? Application / Usage profiles Challenges Client / Server-side Technologies Examples

Developing Applications for Cloud Computing PlatformsJeremy CohenDepartment of ComputingImperial College LondonThe Influence and Impact of Web 2.0 on e-Research Infrastructure, Applications and UsersNeSC, Edinburgh24th March 2009

Outline • Why migrate my apps to the Cloud? • Application / Usage profiles • Challenges • Client / Server-side Technologies • Examples

Why migrate? • Need more compute power / storage than easily accessible locally / free up local resources • Avoid costs/problems of local resource hosting • Power, cooling, space, maintenance, … • Flexibility / Scalability • Discontinuous demand • Rapid growth / decline • Provisioning resources in-house takes too long

Why migrate? • Pay only for what you use • Local networking / bandwidth constraints • Move some/most costs from Capex to Opex • Greater control – firewalls, resource types, etc. • Transparent technology refresh

Why not migrate? • Unsuitable application model • Security concerns – confidential data / algorithms / … • Specific hardware/infrastructure requirements (e.g. high-performance inter-node linking) • Infrastructure location issues • Latency concerns • Resource/data storage locations • SLA guarantees not satisfactory

What services are on offer? • Limited number of raw infrastructure providers • Increasing numbers of higher level service providers • Infrastructure – dynamic DNS, load balancing, etc. • Brokering / Marketplace • Software toolkits • Simplified resource management – APIs, GUIs • Consultants / Application enablers • Different payment models

Application Profiles Where does your app fit in?

Application profiles • Batch applications – limited / no interactivity • HPC applications • Client / server – Web 2.0 apps, Software-as-a-Service • Standalone interactive applications Data in Results out

Application profiles • Batch applications • Code takes some input data and carries out processing, returning result data • Generally no interactivity • Individual tasks may be • Computationally intensive – long running • Computationally simple but high throughput • May require significant data to carry out processing – either as input or from third-party source • Likely to be produced as a native executable so may require a specific CPU type for execution ✔

Application profiles • Web 2.0 apps – client / server model • High throughput, interactivity • May be data intensive / processor intensive • Loosely-coupled, client/server design • Message-based communication between application components • Handle state / sessions for support of multiple concurrent clients • SaaS • Service enabled application core • Client-side (web) application provides remote GUI ✔

Application profiles • Standalone interactive applications • Traditional desktop applications • Highly interactive but generally not highly processor intensive • Tight coupling between application functionality and user interface • Generally not designed for access by multiple (concurrent) users ✖ ?

Application profiles • HPC Applications • Processor/Memory intensive • Data intensive • Generally batch applications but may have elements of interactivity • May be parallelised – operation across multiple CPUs (e.g. MPI, OpenMP, Hadoop, …) • May require extensive communication between parallel nodes (high performance interconnects required) • Visualisation / steering of output often necessary ✓

Usage profiles • Frequency • How frequently an application is used • Is usage predictable? • Load • Does application require significant processing power? • Is the processing requirement similar for each application run? • Is it dependent on input data? • Can required processing capacity be identified programmatically in advance of an application run?

Usage profiles • Data volume / proximity / coupling • How much data is involved in a run of the application? • Is data proximity of importance – if there is a lot of transfer of data between storage and execution resource, data should be stored close to where the app is run • How tightly coupled is the data – can data transfer be optimised? • Availability / Reliability – need SLA? • Are guarantees on uptime / reliability needed? • If the resources running the application go down, how long will it take / how complex will it be to restart it?

Usage profiles • Information Security • How critical is data/code security? • IP in code (algorithms, etc.), data • Data protection issues – where can data be sent / stored? • Is third party data being used? Can this be transferred to another location for processing? • Latency requirements • Real time data processing applications • Are there specific requirements for latency on network connections? • Are these catered for under SLA?

Challenges – Preparing Your Application for the Cloud

Preparing your application • What are you aiming for? • One-off/occasional manual execution of an application on a remote resource from a terminal • e.g. long running HPC app, don’t want to hog CPU on local resource for a long period of time • Use a Cloud platform such as Amazon EC2 to create an instance of a Cloud resource and interact with it via a terminal to upload and run your application • Full remote deployment of application • Remote execution / interaction

Preparing your application • Batch applications (e.g. scientific HPC codes) • If native code, need to ensure CPU/OS requirements are supported • Same goes for apps based on JIT / interpreted languages • Does application have a GUI? • Data transfer issues – if very data intensive, data transfer may present problems • Dynamic deployment / wrapping?

Preparing your application • Web 2.0 / SaaS applications • Deploy necessary application server and server-side code • If supported by Cloud provider, bundle deployed system in platform wrapper for easy restart / creating additional nodes • Storage considerations • How much output data is there? • Where are you going to put it?

Preparing your application - Web 2.0 Server Application Logic • Aim for loosely-coupled SOA model Client/Server Messaging over HTTP Connection Internet / Network Client Interface Application Component Application Component Application Component • Decoupling of GUI from backend

Preparing your application - Batch • Getting native executables onto remote platform and controlling execution • Deploy app at runtime – e.g. via job manager / middleware installed on Cloud instance Service Wrapper Interface • Lightweight application wrapping • Provide service interface for basic execution control of apps • e.g. start, getOutput, getError • Static deployment of application into Cloud instance Native Libraries Native Code Executable Messaging APIs

Technologies – Server-side / Client-side Service-enabling your application

Server side software / technologies • Cloud environments may provide a managed interface to physical hardware, or a virtualised platform on which you install your own OS/application image • An Application Server / Servlet Container may be needed to host your application and provide the messaging infrastructure to communicate with it • e.g. Apache Tomcat, Glassfish, JBoss, etc.

Server-side software / technologies • Services / Messaging / Transport – Getting messages to Cloud apps • Web Services (WSDL, SOAP) – • Apache Axis, JAX-WS, … App Server Service Description (e.g. WSDL) • HTTP GET/POST • JMS • Adobe BlazeDS • RMI • CORBA, … Client Messaging (e.g. SOAP over HTTP)

Client-side software / technologies • JavaScript Libraries – e.g. • Prototype, jQuery, Yahoo • Dojo, Script.aculo.us, … • Client-side tools / RIA Platforms • Web development – e.g. • HTML, Javascript, AJAX, … • RIA platforms – e.g. • Adobe Flex • Sun JavaFX • Microsoft Silverlight • …

Examples – The MESSAGE Project Dynamic Application Deployment

The MESSAGE Project • Mobile Environmental Sensing System Across a Grid Environment • 3 year project starting October 2006 • Funded jointly by EPSRC and DfT (~£4m), under EPSRC’s e-Science demonstration programme • 5 Universities, 19 industrial partners • Pioneering combination and extension of leading edge grid, sensor, communication and positioning technologies • Create radically new sensing infrastructure based on combination of ad-hoc mobile and fixed sensors • www.message-project.org

MESSAGE Objectives • To extend existing e-Science, sensor, communication and modelling technologies to enable the integration of data from heterogeneous fixed and mobile environmental sensor grids in real time to provide dynamic estimates of pollutant and hazard concentrations. • To demonstrate how these data can be usefully correlated with a wide range of other complementary dynamic data on, for example, weather conditions, transport network performance, vehicle mix and performance, driver behaviour, travel demand, pollutant exposure and health outcomes. • To implement relevant e-Science tool sets and (fixed and mobile) sensor and communication system in a number of selected real-world case study applications, involving close collaboration with business and the public sector, and to thereby to demonstrate their value to the research and policy community.

Architecture Overview • Three Layer Architecture • Application Layer • Realtime Data Layer • Sensor Layer

MESSAGE Project – Data Capture Data Capture Platform Reliable, efficient capture of data from an environment with an unreliable communications infrastructure and varying load. • Different types of sensors, different pre-processing requirements • Different communications technologies • Real time streaming and intermittent burst Multiple DBs distributed across several sites. Scalable Cloud-based processing infrastructure Multiple sensor and communications technologies.

Processing data from sensors • Sensors join and leave the network stochastically • Joining sensors need to know where to send their data – this information is provided by the Root Gateway: Root Gateway • Difficult to know how many sensors active at any time • Scalable infrastructure = more flexibility, less waste Sensor Gateway Sensor ? ? Sensor Gateway Sensor Sensor Sensor Sensor Sensor Sensor Gateway Sensor Gateway Sensor Gateway

MESSAGE Project – Cloud Computing • Using Amazon EC2 (http://aws.amazon.com/ec2) to provide scalable computing infrastructure for MESSAGE • An Amazon Machine Image (AMI) has been prepared for the Sensor Gateway software • Sensor Gateway AMI is stored in the Amazon S3 Simple Storage Service • Resources based on this image can be started on-demand • Paid for on a CPU-hour basis

MESSAGE Project – Cloud Computing • Minimal Linux distribution to reduce image size and provide faster start up • Image contains only necessary software to run Sensor Gateway: • Java, Glassfish Application Server, Sensor Gateway Web Service • Start up scripts start application server and Sensor Gateway service when image boots up • Root Gateway Service has uses embedded client to start / stop Sensor Gateway instances as required • Pre-processing may be carried out by Sensor Gateway nodes, data then sent on to database for storage

MESSAGE Project – Cloud Computing Scalable Sensor Gateway Pool Cloud Computing Resources Data Storage Visualisation/Application Platform Sensor Gateway Sensor Gateway Sensor Gateway Sensor Gateway Sensor Sensor Sensor Sensor Sensor Sensor Sensor Sensor Sensor Sensor Gateway

Dynamic application deployment

Dynamic application deployment • Have varying application requirements • Avoid preparing separate Cloud resources for each application • Use Cloud resources with a generic configuration • Use a deployment service to move application executables into execution environment as required, at runtime • Well suited to HPC, batch type applications that need to be run occasionally • Potential for automating workflow execution on Cloud resources

Dynamic application deployment JSDL Job Description JSDL Job Description • JSDL Job description sent to GridSAM service on execution resource Application 1 (Executable, Libraries) Cloud Computing Resource Input Data Service Interface GridSAM Job Submission and Monitoring Service using local fork launcher Application 2 (Executable, Libraries) • Application and input files staged onto execution resource for execution Input Data

Conclusions • Many different considerations when moving applications to a Cloud environment • Not necessarily suited to all apps but new models/services emerging • U • Use a deployment service to move application executables into execution environment as required, at runtime • Well suited to HPC, batch type applications that need to be run occasionally • Potential for automating workflow execution on Cloud resources

THANK YOU! jeremy.cohen@imperial.ac.uk

Why migrate my apps to the Cloud? Application / Usage profiles Challenges Client / Server-side Technologies Examples