470 likes | 491 Views
0-60 cloud-native application development with netflix OSS stack. Ram Gopinathan Principal Technology Architect, T-Mobile. About Me. Full stack engineer with 22+ years of experience building web, mobile and distributed applications.
E N D
0-60 cloud-native application development with netflix OSS stack • Ram Gopinathan • Principal Technology Architect, T-Mobile
About Me • Full stack engineer with 22+ years of experience building web, mobile and distributed applications. • Primary focus on containers, microservices and IoT applications • Multi-cloud experience (Azure, AWS, GCP) • Live in downtown Seattle • Avid runner
About T-Mobile • As America's Un-carrier, T-Mobile US, Inc. is redefining the way consumers and businesses buy wireless services through leading product and service innovation. • NASDAQ traded public company – TMUS • Two flagship brands: T-Mobile and MetroPCS • Based in Bellevue, Washington
Connecting with customers • Multiple ways for customers to connect with T-Mobile • Stay connected & listen to customers
Objective • Gain hands on experience designing and developing a cloud-native application using Netflix OSS stack
Use Case – Store Locator App • API • Query stores based on latitude/longitude and distance • Index store data from a CSV file • Create/Delete Index • Web user experience (Micro-app)
Outline • API Design using Swagger/Open API • Create API specification in Swaggerhub • Generate Server Stub • Developing core platform components • Eureka Server • Spring Cloud Config Server • Zuul Gateway • Develop the store locator microservice • Implement the use cases for store locator • Wire up a JenkinsCI pipeline to build and deploy store locator app.
Local development setup • Docker • Spring Tool Suite • JDK 8 or above • Git • Maven or Gradle • Springboot • Swagger tools
Cloud-Native Applications • Microservices, Containers and Service meshes • Loosely coupled • DevOps, Continuous Delivery, Canary releasing and automated canary analysis • Runs on modern dynamic environments • Public, Private and Hybrid clouds • Robust automation “Cloud native is an approach to building and running applications that fully exploit the advantages of the cloud computing model.” Source: What are cloud-native apps - Pivotal
API First Design • Strategy that puts the developer interests at first before building the product. • Develop API first before building the product (Web or Mobile) Image Source: https://dzone.com/articles/an-api-first-development-approach-1
What is Swagger/OpenAPI ? • A document or set of documents that defines or describes an API • Typically created before the API even exists • Confirms to Open API specification • “yaml” or “json” • Advantages • API developers have a better understanding of what to build • Consumers know what to expect from the service • Enables Client and Service development in parallel OpenAPI mindmap: https://goo.gl/QARYk5
Swaggerhub • Author API specifications • Collaborative authoring • Share with consumers for comments and feedback • Generate client and server stubs • Login with Github account
Swagger Inspector • Test APIs in the cloud • Generate Documentation
Swagger Good Practices • Store in “git” based repository • Versioning • Collaboration with developers and consumers • Ensure API specification, API Documentation as well as API Implementation are all well aligned • Re-use & Minimize redundancies by breaking the API specification document into multiple documents
Store locator API design • Login to Swaggerhub and create the API specification • Generate Server Stub
Overview of Netflix OSS projects • Hystrix • Zuul • Eureka
Healthy State User Request App Container (Tomcat, Jetty, etc.) Dependency C Dependency B Dependency A Dependency F Dependency E Dependency D Dependency I Dependency G Dependency H Source: https://github.com/Netflix/Hystrix/wiki
Blocked request due to latency User Request App Container (Tomcat, Jetty, etc.) Dependency C Dependency B Dependency A Dependency F Dependency E Dependency D Dependency I Dependency G Dependency H Source: https://github.com/Netflix/Hystrix/wiki
Resiliency Problem ... User Request User Request User Request User Request User Request User Request App Container (Tomcat, Jetty, etc.) Dependency C Dependency B Dependency A Dependency F Dependency E Dependency D Dependency I Dependency G Dependency H Source: https://github.com/Netflix/Hystrix/wiki
Circuit Breaker Pattern • Protection from latency and failure from dependencies • Prevent failure from re-occurring constantly • Prevents cascading system failures • Pattern for building resilient and fault tolerant systems
How does it work? • Wrap the function/service call you want to protect in a circuit breaker • Circuit breaker monitors for failures • Once failure reaches a certain threshold circuit breaker trips • Default 20 failures in 5 seconds if you are using Hystrix library from Netflix • All further calls returns an timeout, error or fallback strategy • Typically you’ll want to trigger an alert or throw a metric
Circuit Breaker States • Closed • All is well • Open • Failure threshold reached and circuit breaker tripped • Half Open • Breaker is ready to make a real call as trial to see if problem is fixed
Strategies for resetting the breaker • Simple method is to manually reset when things are working well again • Breaker itself can detect if underlying calls are working again • Implementing Health checks in backend dependencies can be useful here • Implement self resetting behavior by trying the protected call again after a suitable interval and resetting the breaker when calls succeed
Operational Practices • Log any change in circuit breaker states • Breaker should reveal details about their state for deeper monitoring • Allow operations staff to trip or reset breakers • Consider alerting operations when breaker state is open • Could also increment metric for operational dashboard
Client Considerations • Clients leveraging circuit breakers need to react to breaker failures • As with any remote calls consider what to do incase of failure • Queue operation to retry later when things are working well again • Failure to retrieve data can be mitigated by returning stale data from cache
Hystrix • Created by Netflix • https://github.com/Netflix/Hystrix • Protection from latency and failures from dependencies • Fallback & Gracefully degrade • Enables monitoring and alerting for operations “Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resiliency in complex distributed systems where failure is inevitable.” Source: https://github.com/Netflix/Hystrix
Zuul • Created by Netflix • https://github.com/Netflix/zuul/wiki • Front door for all requests from devices and web applications • How Netflix uses Zuul • Authentication & Security • Insights and Monitoring • Dynamic Routing • Stress Testing • Load Shedding • Static Response handling • Multi-region Resiliency “Edge service that enables dynamic routing, monitoring, security and resiliency”
Eureka • Created by Netflix • https://github.com/Netflix/eureka/wiki/ • Eureka Server • Eureka Client • Run a “sidecar” for non java services and clients Used for dynamically discovering service instances for the purposes of load balancing and fail over
Externalizing configuration APP • Build application once and deploy to any environment • App gets configuration settings during startup based on environment • Configuration settings backed by git repository • Versioning • Traceability • Updates don’t require application rebuild and redeploy CONFIG SERVER Config repo commit commit Developers Operators
Developing core platform components • Eureka Server • Spring Cloud Config Server • Zuul Gateway
Develop the storelocator microservice • Implement the admin operations • Implement storelocator query operation
Pipeline as code with Jenkins • Implement build/test/deploy pipeline in Jenkinsfile • Pipeline definition “Jenkinsfile” lives in the same repository as your source code • Treat pipeline as another piece of code checked into source control • Versioning, History, Traceability etc.
Dynamic build slaves with docker • Reduced cost • Greater control and flexibility for development teams • Install tools, frameworks, libraries your application needs in container • Add the Dockerfile to repository for versioning and history • Variety of plugins available that supports running of dynamic slaves in docker, mesos, Amazon ECS etc. Jenkins Master Execute Pipeline Job Dynamically Provision slave container and execute pipeline job inside it Slave Slave Slave Slave Container Platform
Wireup Jenkins Pipeline • Run Jenkins in Container • Implement dynamic build slaves to run pipeline jobs using Docker and Amazon ECS • Pipeline as code for continuous delivery
Q/A • Getting in Touch • Blog: http://rprakashg.io/blog • Twitter: http://twitter.com/rprakashg • LinkedIn: https://www.linkedin.com/in/rprakashg • Email: mailto:rprakashg@gmail.com