8 IT Operations Metrics That Offer Vital Feedback to Developers
DevOps is supposed to represent a seamless partnership between development and operations, accelerating and simplifying the processes needed to deliver continued value. IT operations fills one of the most vital roles in DevOps: to supply feedback to developers and provide the final link in a continuous value chain.
Yet, too often, IT operations ends up in its own silo and is ignored by development. Problems not surfaced during pre-deployment testing are often tagged as something that's "IT's concern." Application SLA compliance, for instance, is often viewed as IT's domain, and not anyone else's.
One way to invite collaboration and encourage buy-in for IT priorities is by tracking change metrics. These metrics can provide valuable feedback that directs developer strategies and aligns them with organizational goals. These metrics ensure developer teams have all angles of value creation in mind, solving ongoing problems efficiently and effectively.
It can be a challenge choosing the right change metrics to track, as they have to reflect and align with development and product management priorities. A recent paper from the DevOps Enterprise Forum offers some guidance on this. It recommends choosing metrics that help make change constant, effortless, and targeted towards maximum value co-creation across DevOps. With this in mind, here are a few core change metrics — mostly involving the production environment — that IT can use to enable a value stream management approach across DevOps.
1. Change Failure Rate
Developers may be tempted to push off issues caused by their changes based on the type of issue it generates, but tracking the change failure rate metric makes these late-stage defects more visible. Data on failures can offer detailed feedback on the issue, explain why specific changes carry higher risks of failure, and hold developers accountable for changes that generate production issues. Machine learning (ML) models can help categorize change metadata (feature type, configuration item [CI], release build, etc.) and make it easier to discover the source of change defects on a categorical level.
2. Lead Times
The lead time metric (also called cycle time) measures the time it takes for a proposed feature or request for change (RFC) to be deployed into production. It offers valuable insights into the entire DevOps process's efficiency, and it demonstrates how well your organization can agilely meet your user's evolving needs.
Long lead times suggest harmful bottlenecks in the development or deployment process, while short ones indicate feedback is incorporated efficiently and effectively.
3. Defect Escape Rate
Until you deploy into production, defects may be missed. These defects risk being ignored by development because operations found them during post-production activities. Or worse, they're found by end-users.
Errors and defects are a natural part of the development process and should be planned for throughout the entire project. A high defect escape rate reveals incomplete development practices or processes because they hide defects in pre-production and release them to production. It's a valuable measure of your overall product quality from the earliest stage to the last.
4. Unplanned Work Rate (UWR)
Understanding how much work you spend on unplanned work (or "firefighting" activities) is essential for DevOps as a whole. Yet, most DevOps teams measure UWR in isolation instead of aggregating it across the entire team. E.g., for development, QA, operations, etc.
When you measure UWR across DevOps, your organization is better able to track workflows and identify how they impact everyone as a whole. A high UWR may reveal inadequacies across defect tracking and fixing, which were not detected earlier in the workflow. Just as importantly, these sources of unplanned work are usually not just a problem with one working group. Your organization should aim for a UWR of 25% or lower, and ideally, a UWR of less than 20%.
5. Change-related Incident Volume
Sometimes, it's hard to track which change caused a given defect or incident. Tracking the ticket volume associated with a given change gives DevOps teams more visibility. ML can help create detailed causation models for defects and link them to trouble tickets and then specific change releases.
And since not all tickets can be attributed to a single change, it can help to expand this metric to track ticket volume in alternative ways. For example, measuring ticket volume over a specific time window after a change, an incident cluster based on keyword tracking done with natural language processing, or root cause analysis.
6. Deployment Time
This metric measures the time from when IT operations receives the new change and when it's successfully deployed into production. Also known as deployment success rate, it's a helpful metric for DevOps teams who want to measure the balance between their code repository pull requests and the repository's speed.
7. Deployment Size
The larger a change is, the more chance there is for defects and incidents. Ideally, changes should be deployed in small groups to reduce the risk of introducing defects and decrease the implementation time. Some DevOps teams also track batch size and complexity metrics with each deployment to identify and mitigate risk to the business and ensure they're still delivering value to customers.
8. Mean Time to Detection (MTTD)
Not all change-associated problems are created equal. Even the smallest problem can have an outsized impact on a business. Tracking the mean time to detection can help DevOps identify the scale of risk introduced by particular changes, development teams, coding practices, or feature and CI categories. MTTD is often measured in combination with change failure rates to indicate the impact of changes on production deployments accurately.
IT Ops Metrics Hold DevOps Accountable
The whole point of having a DevOps team is to remove the barriers between groups working together to deploy products. Yet IT operations and development teams can still exist within silos inside of DevOps. It can be challenging to convince each group they need to communicate more and understand how their work impacts the other.
Data can help bridge those gaps and ensure that everyone considers all aspects of value creation. By tracking specific change metrics, IT operations can deliver feedback more effectively and complete the DevOps loop with development to reduce risk across the entire product lifecycle.
All members of DevOps want to avoid performance degradation and unplanned downtime, as they can negatively impact the business and dramatically increase the cost of a given change. Accounting for the value loss across the change lifecycle is the key to guiding more efficient practices and processes in development and across DevOps. The path toward more reliable code that consistently delivers value is through data. Choose the metrics that matter to your organization, and see how they can help you.
Learn more about how AI can empower you to derive more value than ever from your DevOps processes in our recent webinar: "How AIOps helps IT Change and Service Management be more reliable and nimble"