3 Ways to Drive Fast Feedback with Continuous Delivery and Incident Response
Serhat Can is a Technical Evangelist at OpsGenie, makers of the OpsGenie incident response orchestration platform for DevOps teams.
One of the most impressive books on DevOps, The DevOps Handbook, emphasizes three fundamental principles underpinning DevOps: systems thinking, amplifying feedback loops, and continual experimentation and learning.
Gene Kim, author of The DevOps Handbook, has described amplifying feedback loops as creating right to left feedback loops.
Why should we amplify feedback loops?The first version of a new feature or product often doesn’t fully cover the needs of our customers. Even when we spend weeks or months building something, the final product is often doomed to miss important pieces. Most of the time, our customers don’t even know what they need! In the end, there is no way we can avoid shipping useless or broken software if we are going fast enough. Instead of striving to avoid this, we should embrace the idea of shipping small pieces of value. If we ship faster, we can break and fix things when they are small. When things get bigger, they become harder to manage and refactor. Early feedback allows us to interfere and fix along the way when we still can. It enables us to learn from our customers, and from our mistakes, at the right time. All this sounds awesome, but it requires a lot effort. Two important enablers that help us get the invaluable feedback we need is Continuous Delivery and incident response. We divided the benefits into three pieces: ship often and small, recover fast, and learn from failures.
Ship often and smallCompanies that do Continuous Delivery can deploy their software with ease and in small pieces. Deployments don't require weeks or months. Manual processes and approvals are ideally eliminated or at least only put in place before going live. Whenever we have something new to show to our customers, we push the code to the release branch, run tests, and deploy our code. This ensures faster time to market, facilitates early feedback, and allows higher quality and lower risk releases. By leveraging techniques like blue-green or canary deployments in Continuous Delivery, teams can release in confidence. The beauty in techniques like canary deployments is that they help teams test their fixes or new features by experimenting on a small number of users in the customer base. This becomes very important when facing a lot of users in Production.
Recover FastMTTR (Mean Time to Resolve) is our metric to identify how fast we recover. Although this metric can simply be calculated by subtracting an alert’s creation time from resolve time, reducing this number is no easy job! Recovering quickly requires a lot of practice and preparation before the incident. Teams should have on-call schedules with multiple rotations showing who is on-call, and when. Ideally, these teams manage their own on-call schedules to avoid dependencies that can slow down operations. Escalations are another key ingredient to be able to call for backup when the on-call responder is not available or doesn’t know how to fix the problem. Accuracy and contextual information on alerts is a key point which is too deep to dive into in this post. To put it simply, actionable alerts are needed to triage and remediate issues faster. However, it requires a lot of effort and the right tooling. In the cloud era, automatic repair is possible and actionable alerts, which support actions that can trigger code pieces, make automated remediation easier than ever.
The Periodic Table of DevOps Tools v.3 is now available!
The XebiaLabs Periodic Table of DevOps Tools v.3 features 52 new DevOps tools, an integrated DevOps Diagram Generator, and a new look. Check it out!