This post is from the Numerify blog and has not been updated since the original publish date.
How IT Operations Can Anticipate Change-Related Risks (Part 3)
Using data generated from key systems of record and analyzed using a System of Intelligence (SOI), IT Operations can identify change-related risks to production.
Achieving true change risk identification requires two main tasks:
- Exploration of Key Performance Indicators (KPIs) that score or indicate risk, and creation of an inventory of the most actionable ones, and,
- Implementation of Change Advisory Board (CAB) policies that define what qualifies as a risk, what risks are acceptable, and which risks must be mitigated or avoided.
Dashboards with KPI visualization allow rapid assessment of risks, especially if they possess end user drill down capabilities. Risk factors can be color coded, lighting up red when risk levels threaten the production environment. Each individual risk factor can then be explored in-depth to formulate hypotheses on risk mitigation steps.
Combining data and CAB decision-making standardizes the process of identifying risks and codifies what must be done in response. The result is a much more agile process of identifying production risk. Leadership can rapidly assess and address risks with minimal delays, making IT change risk management more nimble overall.
How IT analytics dashboards help visualize, identify, and investigate risk
To rapidly identify and quantify risks, the metrics IT analytics tracks must be capable of communicating not just volume but context. Some metrics, such as incident volume, largely speak for themselves. But others, such as reassignments, are more abstract or less compelling when missing key context.
A solution is to combine standard IT operations health metrics with scoring KPIs, such as a weighted score representing an application's stability in production. Machine learning (ML) trains these high-order KPIs to be more accurate and expressive over time.
An example high-order scoring KPI that can be used to indicate change-related risk is a "Change Credit Score." This KPI can measure past track record and team process adherence using factors like those highlighted in the dashboard above.
Being able to drill down using a dashboard visualization makes these high-order KPIs more powerful and more meaningful. This allows for more specific diagnoses when identifying risk and more specific recommendations when responding to it.
For example, a highly functional IT analytics tool would allow you to drill down into a "Change Credit Score" to look at the specific teams causing the global indicator to turn red. In turn, you can drill down further to identify the specific people and then the specific factors that indicated risk.
As one Numerify client in the healthcare sector expressed: "The ability to take large sets of complex data and put them into a picture that is intuitive is a huge win. The picture may seem very simple, but behind that chart are complex formulas and metrics that make the ability to analyze complex trends easier and get to answers much faster."
Using AI and ML to score and highlight change-related risk
As illustrated in the dashboard above, high-risk changes can be identified based on historical trends, incident cluster analyses, and predictive models. A ML algorithm can be trained to flag high-risk changes or risk factors based on these analytics processes.
High-order KPIs and ML model training serve as instrumental components because of their ability to accurately and expressively communicate risks. More accurate results can be generated over time thanks to ML's capacity to associate incidents and problems with risk factors.
One potential roadblock to this ML-based IT analytics technique is that the creation of high-performing ML models often requires experience and expertise. Organizations that want to avoid painful trial-and-error or a long, costly wind-up phase can look to industry leaders that have pre-built ML models, which can be adapted to their unique setting and challenges.
Setting policy and procedure to enable intuitive, agile IT change risk management
To properly assess change-related risk, everyone has to agree on the terms that qualify a risk to the production environment. A change-related risk's threat to production can be evaluated based on its potential scope of impact, as well as the likelihood for those impacts to be felt.
A scoring model standardizes how risks are measured and assessed. CABs and stakeholders align their perception on the possible scope and impact of a risk, achieving mindshare in an environment that is often highly subjective. By qualifying risk levels into theoretical ranges or categories, organizations can then codify their response to a risk according to ranges within KPI measurements.
Different levels of risk warrant different responses. E.g.: A CAB may decide to accept the threat of certain risks if they are minor, assign revisions to a change group if the risk should be mitigated, or freeze certain changes if the risks are so great that they must be avoided entirely.
Standardized scoring models describe risks transparently and in detail, giving CAB and others the tools they need to set response protocols. Scoring can eventually be used to manage risky dev or ops behaviors and incentivize less-risky ones. Standardization also creates an environment where predictive and prescriptive models can be more accurate, further streamlining risk management almost to the point of automation.
Organizations that use these analytics techniques spend less time disagreeing on risk or debating how to act. They simply respond to the parameters, as indicated by the data, describing just how bad a risk might be and what can be done about it.
The overall effect is to make IT change risk management more agile, with fewer delays between risk identification and the decisive actions needed to address it.