220 likes | 342 Views
OpenStack Continuous Delivery Best Practices: Putting what IBM learned from this Community into practice Andrew Trossman, Distinguished Engineer, Cloud Management, IBM Kendall Lock, Director of Cloud Management, IBM. Breaking the Addiction of Long Cycle Times. Inertia is very powerful
E N D
OpenStack Continuous Delivery Best Practices: Putting what IBM learned from this Community into practiceAndrew Trossman, Distinguished Engineer, Cloud Management, IBMKendall Lock, Director of Cloud Management, IBM
Breaking the Addiction of Long Cycle Times • Inertia is very powerful • OpenStack and Cloud technologies enable much of the change, but… • Modifying team behavior is key!!!
Modifying Behavior…Leveraging Technology The 12 Step Program to Continuous Delivery
Step 1: Admit you have a problem“I am powerless over my long release cycles, and my life has become unmanageable” The Emotional Reaction The Truth • “Big Rocks Make Big Splashes – we need big releases to Market” • “Since the Release is 9 months long, I need to fight to the death now to get my favorite feature in” • “Be careful about customer feedback – we are already full” • You can still make a “Big Splash” by accumulating delivered features • Delivering incrementally means being able to change your mind – react to customers and competitors • Software is not meat – we do not price it by weight
Step 2: Accept a methodology greater than yourselfThe Continuous Delivery Pipeline
The Ugly Truth…Part 0 • Although all people on Planet Earth if asked “do you think you will do better with fewer features of high quality or more features with average quality?” will say “the former of course”, Stakeholders have a hard time changing their own behavior • Don’t want to take a month to put Continuous Delivery automation in place – “doesn’t offer direct customer value” • “Totally agree we should do this…just have these one or two urgent customer things to deal with first” – can quickly put 6 business mortgages on the project that can never get paid off
The Ugly Truth…Part 1 • Our team had a distinct set of roles: • “I am the programmer – my job is to write features that someone could use” • “I am the tester – my job is to try to break stuff by using my hands and other deadly weapons” • So…test automation must be the other guy’s job… • The developers thought test was beneath them • The testers didn’t have programming skills to do automation (but did have the pessimistic mentality to understand what things to try to break)
Step 3: Turn your life over to Continuous DeliveryDeal with Common Objections • “We have deadlines to meet and won’t get all of our code done if we also have to write a bunch of automation.” • Deadlines can be like the call of the Siren – beware! • Quality can never be beaten in with a mallet • “Talk to ‘the other guys’ whose job it is to make that stuff.” • The “Other Guys” aren’t coming…they are you! • Automation is not a role, it is an accountable behavior • “Running test automation will slow down our end-to-end build and therefore our team’s productivity will go down.” • Builds have to be like breathing – you don’t even think about it. • We had to force test automation into our culture – no new capability can be written without appropriate test automation in place • Developers now write test automation to check in code, and Testers pair up in the Scrums to identify which kinds of tests are needed
The Ugly Truth…Part 2 • Our build process was taking 6 hours end-end • Very serialized, manual steps in the middle • In our early stages, we found test blockers in more than 50% of our builds • We used a manual “fire bucket brigade” to verify the build as green • Without good test automation, we had to do manual testing of a bucket of scenarios to declare the build good (literally passing the baton from Beijing to North Carolina before declaring the answer)
Step 4: Take inventory of yourselfDetermine the starting state of your pipe…and drive key ingredients! • What kind of automation do you have? • Build, Deployment, and Test Automation are Must-Haves • We chose BuildForge and Jenkins, Chef, and Rational Performance Tester and Selenium • How upgradable is your stuff? • “Continuous Delivery” can’t mean “Continuously Down for Maintenance”, so you need zero or short downtimes • Learn from OpenStack design patterns – multiple service instances and separation of code from data, leverage well-known HA techniques while you get there • Do you have missionaries to spread the word? • Find a leader or leaders who “get it”, and empower them • The executive sponsors have to eliminate the question of “if”, so the invested leaders get to focus on the “how” • Those leaders may not be the ones that traditionally had the title – look for them and get behind them to demonstrate the value of the new behavior
Build and Test infrastructure and how information flows through the system
Step 5: Admit your wrongs (and enact a plan to work on them)Design for Disaster First • Fast Rollback is your new best friend – it is your safety net! • Insurance against disasters (if a mistake gets through to your users), and buys you time to build up your automation • Avoids the most common derail factor in the early stage (“Well, we had a bad problem in our app so we better go back to our long test cycles to make sure we never do that again”) • Developers are optimists – you want to keep that confidence without sacrificing the occasional mistake • Images made this much easier – rebooting a pre-configured working system could be done in a few minutes • Design for incrementals - feature switches, strangler patterns • Flip the switch back if the new thing is soft in the middle • May have side effect of customer acceptance testing
The Ugly Truth…Part 3 • We said “ok, now test automation is part of our lives – go do it” • But…it turns out that wasn’t quite enough direction for a distributed team to know what to do and how to do it well • The initial result: • We spent time up front deciding on the baseline of tests we wanted automated that covered our main use cases (Good) • It took twice as long as we had hoped to get the baseline of test automation in place (Bad) • Because our tests were designed and written by people who seemingly ranged in skill from “experienced professional” to “drunk guy hitting on your Mother”, we had about 1/3 of our builds initially marked “Failed” when after investigation, the test automation was the part that was broken, not the code (Worse)
Step 6: Remove your defectsTest automation is hard to do well…Tempest is setting a course • Lesson #1: Treat test automation like any other code • Review it, approve it, reject or promote it • We needed the idea of “non voting” test cases – those that don’t stop the build because they are still in hardening phase • Lesson #2: Tests are atomic, they shouldn’t depend on previous results • Common test artifact repositories make it much easier to compose longer test scenarios from individual test cases • Lesson #3: Tests can’t have allergies – being sensitive to code tweaks will greatly slow you down • Use “immutable markers” where possible – APIs, window IDs, configuration labels; enforce those to stay consistent • Lesson #4: Understand what your users will vary, then design your tests to loop through new variations that you want to introduce • Different images/software, different environment configurations, different shopping cart contents, different pricing promotions… • We introduced an “Automation Lead” who designed and reviewed the work • Lesson #5: Load/stress test your system continuously
Step 7: Remove shortcomings with humilityMake it easy to do what you want to avoid serializing on the pipeline • Make it easy to perform a “sandbox build” • OpenStack makes this easy to do through quick spawning of a virtual system with an image that has the tools already setup • Think about “Test-as-a-Service” to avoid waiting for the “Big Bang” • Deliver your test harness as a Cloud service – allow developers to request a subset of testing against their private builds • Publicize your results – successes and failures • The “Build Website” should be at the top of your Browser Favorites • Use “peer pressure” to drive constant learning and tuning • We dedicated some team members to build up our basic machinery – we rotated the “Ops” responsibility • Builds must be like breathing…you do it repeatedly or you die
Step 8: Apologize to those you have harmedWalk in your users’ shoes, and show them (quickly) that you care • Use what you build before your customers do wherever possible • Become your own best reference/case study • We forced the current system to be used for dev/test – if it doesn’t work, everyone gets blocked so mistakes become personal • Apply a publish/subscription model for the end of your Continuous Delivery Pipeline • Have your application “poll” for updates, and allow each use to update at the appropriate frequency • Internal sites update daily to force constant pressure on reliable, incremental delivery • Customer beta sites may update less frequently (weekly) since digestion is slower • Update your customer production sites as frequently as you can to make them happy (and bring their friends)
Step 9: Make direct amends, except when doing so would harm themMake your feedback loop as tight as possible • Build a channel for your users to talk to you • Facebook “Like” concepts • Comment inline to your app • Web analytics to see where users derail • Talk back • Produce a “did you know” bubble or feed • Highlight the people who gave you your best ideas publicly – encourage participation • Nobody has time – whatever you do, keep it to 5 minutes or less • We implemented “why not try…” guides along with our live site to steer users to new things we wanted to try out
The Ugly Truth…Part 4 • The components of our product had different design heritage • Some had data and code together on one disk • Some had single instance services • Some maintained local state • We lacked the OpenStack design philosophy – everything multi-instance, code and data clearly separated • Refactoring the design all at once is very expensive and risky, so we adopted “traditional HA” techniques for parts of our application (such as System Automation heartbeating and DRBD file system replication)
Step 10: Continue to inventory yourself, and promptly deal with shortcomingsMake your application “Production Friendly” • Roll forward whenever possible, roll backward when you must • We used Chef for scripting this automation, and employ the three key Cloud ingredients: Compute, Network, and Storage • Launch new instances and install new code • If needed, update data schema in a backward compatible way (add columns but don’t remove or rename) • Switch IP to new instance (can switch back if failure is detected later) • Work out automation around “lights out operation” • Dev/Test has no rules, but promoting things to Production incurs the overhead of all the IT management practices – integrating those tools and procedures automatically will eliminate a walk through some very thick mud • We used workflows attached to our provisioning steps to integrate the tools and procedures we have to follow
Step 11: Meditate to Continue ForwardLike always – inspect what you want to occur • Shift tracking methology from “pacing” metrics to “follow through” and “reactivity” metrics • Being on schedule is much less of an interesting point of view • Customer acceptance and usage becomes front and center • Root cause any escapes from your Continuous Delivery Pipeline and don’t allow them to persist • Speed still matters…but it is more about efficiency • Constantly ask what could be done to improve team effectiveness, and then do what you determine
Step 12: Once awakened, carry the message to others in despair • Our experience in 6 months: • 7:1 labor reduction (compared to the normal effort required to deploy and test our version) • More than 3300 builds • 50% reduction in problem resolution time • Thanks for coming…and good luck with your journey!
Thai Hindi Gracias Traditional Chinese Spanish Russian Grazie Obrigado Brazilian Portuguese Arabic Danke Thank You Merci German English French Simplified Chinese Tamil Korean Japanese