This post is from the Numerify blog and has not been updated since the original publish date.
How to Prioritize IT Problems for Maximum Impact
Prioritizing which problems to tackle depends on your organizational objectives. For example, businesses worried about customer disruption would prioritize the biggest sources of user-facing service delays and outages. Organizations aiming to reduce costs would develop KPIs measuring IT problems that have the largest financial impact. Those focused on agile transformation can anticipate change risks and reduce their impact for more seamless updates.
Problems can be evaluated and prioritized using data through the following process:
- Determine appropriate metrics that directly measure a problem category's organizational impact
- Identify high-impact incident categories based on these metrics
- Assess how incidents related to the problem are typically managed and how it could lead to the problem's continuing negative impact
- Discover and resolve root cause, or determine a better process to address the incident type more efficiently
The use of AI, analytics and machine learning (MLAI) can aid in highlighting or revealing which problems have the most impact - MLAI techniques such as incident clustering and prescriptive assignment group delegation could be very useful.
Identifying appropriate priorities allows IT leaders to make significant improvements quickly while keeping initiatives aligned with company-wide business requirements. This can potentially lead to increased buy-in from stakeholders — along with more latitude for IT to innovate or pursue larger opportunities.
Start by measuring the scope of the problem
When measuring anomalies you first need to assess a baseline. Teams must also agree on the definition of a high-impact problem and the relative significance of related incidents.
This initial work can be performed by compiling your total number of applicable IT tickets or incidents for the relevant period - say the past 6 months. Obtaining a single number of overall incidents puts everyone in IT on the same page and gives them objective information to begin working from.
Problems can also be categorized in buckets based on their relation to and effect upon:
- Specific user groups
- Specific domains within the organization
- Mission-critical services and applications
Viewing problems in any of these contexts ensures that IT leaders focus on the negative issues that are most visible to key stakeholder groups and which have the most adverse effects on critical business functions.
Determine appropriate metrics to measure the business impact of IT problems
The right metric serves as a guide and can instantly reveal sources of pain points.
Our IT service delivery friction index (SDFI) KPI is a prime example. It measures business impact by looking at both the volume of an incident type and its mean time to resolution (MTTR). SDFI is at once highly informative because it intersects problems that both occur at a high rate and tend to take a long time to solve. It reveals opportunities for quickly yielding business value by targeting incidents that cause the most friction within IT service delivery.
Other examples of possible KPIs to develop include:
- Cost of outages (business cost/hr x # hours affected x frequency of outage)
- # of Assignments to high-level MIM teams
- Change failure rate of specific systems or applications
AI, analytics and machine learning can be used to uncover metrics you may not have considered as having the capacity to have a negative business impact. For example, natural language processing (NLP) incident clustering can uncover IT ticketing phrases like "unable to access" or "unable to login" to describe business disruptions that may not have been categorized as such.
A useful strategy is to constrain metric views based upon specific locations, departments, or roles. These differing views can allow IT leaders to quickly reveal problems that have a disproportionate effect and tend to generate a greater negative impact. They can also help reveal the root cause of these problems since the incidents behind them have a pattern clustering around certain types of IT systems or organizational activities.
Highlighting trending problems can also be useful because it can reveal incident categories that are on the rise and that have the potential to generate new forms of negative business impact.
Consider how incidents related to a specific problem are typically handled to reveal the possible root cause
The root cause of a recurring problem can often relate to how its underlying incidents are managed day-to-day. Common incident resolution routines, or "band-aid" fixes, could be causing the problem to fester, creating deeper pain within the organization as it persists.
For instance, some incidents may be routinely closed as unsolved. Others may be ostensibly resolved quickly on first assignment — only to pop up again later. While these actions may be considered "business as usual" on the employee level, they could be contributing to the problem's persistence and its continued negative impacts.
Examining organizational processes, or how problem-related incidents are handled, can reveal further appropriate metrics to measure a problem's overall negative impact.
Use dashboards to visualize metrics and prioritize incident types with the most negative impact
Determine an appropriate visualization to reveal which incident categories move the needle the most on your chosen metrics. You can then use this information to identify and begin tackling IT problems that have the most negative impact.
Again, AI, analytics and machine learning can provide additional insights and value during this stage. NLP-driven incident clustering, mentioned above, can also improve team assignment practices based on a more logical grouping of IT incidents of the same type. Or, a task force can oversee certain incidents within a cluster category to ensure they are being dealt with at a rate satisfactory with goals and targets.
Machine learning can also analyze reassignment patterns to spot incident types that tend to result in troublesome resolution practices. If incidents related to a specific network application tend to be rerouted multiple times before they are ever addressed, IT leaders can identify them more readily through ML analysis and dashboard visualizations.
Set goals for the reduction of your problem metrics so that you can measure the effectiveness of remedial actions. Assign accountability for metrics to IT team leaders involved so that people understand their responsibilities as well as the consequences of success (rewards) and failure.
Possible solutions for priority IT problems include:
- Process changes, such as changing the way incident priority is self-assigned by the help desk
- Assigning a temporary special team to resolve the root cause
- Change risk assessment and mitigation
Prioritizing problem-solving methods in this way allows IT to derive more value from their actions while quickly and efficiently addressing gaps in service delivery. Early success means that IT leaders can then tackle smaller problems or reassess which incident types have the most impact now.
The overall effect is to not waste time chasing your tail or solving the same types of incidents every day. Instead, IT leaders can move onto more ambitious goals and embrace agile transformation without the dead weight of recurring pain points.
Learn more about how data can drive IT Operations efficiency in our recent webinar "Top 5 Ways to Reduce Incident Volume with AI & Analytics"