First, as a guest blogger please allow me to introduce myself. I am responsible for IT reporting and analytics at Rogers Communications, a Canadian media and telecommunications company. In this role, my goal is to help management make informed decisions fast using never before seen data delivered in near real time. Part of my mandate is to continuously expand reporting automation to all nooks and crannies of IT operations, revealing insights that drive operational efficiencies and reduce costs.
In this blog post, I’ll be providing some examples of how we use automation in reporting and analytics to drive continuous improvement within IT Operations.
So, what drove our need to automate reporting and analytics? In the not too distant past, our leadership was challenged with gaining awareness of the underlying issues within IT Operations. E.g., at the top of the organization, they always knew when their teams were firefighting some major incident. However, it was extremely difficult to get visibility into the overall larger picture to help them understand what was going on and what to do about it. They’d have to wait weeks, sometimes maybe even a month, before our teams could scramble and put a report for them together. And by the time we were able to put the info together the ask was obsolete.
Evolving the Role of IT Analysts
Labor-intensive, manual reporting is one of the most common barriers to implementing sweeping organizational changes. Individuals charged with reporting are forced to work long hours to produce analyses in order to deliver them to those who need information to act. Teams can get caught up in the cycle of reporting on activities, making the reports delivery-ready, and then starting on the next report when, instead, they could be solving the problems the reports reveal. Generating a global report — such as the patch status of every server touching an application a specific director oversees —could take weeks, sometimes even months.
This situation can easily lead to a culture of IT employee burnout, especially if employees feel out of control of making process improvements. Shifting to a self-service reporting culture — enabled by automated analytics — allows these individuals to achieve a near real-time view of operations. They can not only negate the need for labor-intensive manual reporting, but they can also begin to assemble new, previously impossible views that allow for rapid remedial action and collective buy-in.
In the process of rolling out analytics for CSI, we discovered that analytics adoption can be improved when you make reports highly relevant to individual people. When an individual can choose their name from a list, they can then instantly see a view of “their world” — the applications and teams that they are responsible for. They could also rapidly obtain views for individual team members, or they could analyze the health of operations within their specific domain.
Example Use Cases of How Analytics Can Empower CSI
Service Desk Trends
Aggregating IT service ticket data into a single source of truth allowed for respective team leaders to obtain visibility within their own domain. At a glance, they can determine which applications, departments, or service agents were generating the most pain. E.g., which areas are slowing down incident resolution, which individual team members are responsible. They could also map ticket volume per-hour, answering questions on when to staff, and what technology areas tended to need the most support.
Alerts are supposed to offer meaningful information that can be used to resolve incidents, but they can often be vague when viewed in isolation. By achieving an aggregated view and clustering incidents by root cause, IT leaders can instantly see which performance areas the alert was affecting, such as memory, CPU, or storage. Alerts could also be categorized by source, such as specific departments, apps, and agents. With enough alerts telling the same story — e.g. repeated memory alerts after a change was implemented — teams could know what targeted improvements to make to reduce the risk of incidents in the future.
Much like pulling out your refrigerator to find all the wayward crumbs hiding behind it, IT analytics can shine a light on data that reveals problem areas within processes. For instance, a report on incidents can instantly reveal tickets lying dormant. It can answer questions like: which agents are creating these tickets, and who should these tickets have been assigned to for resolution?
In some cases, identifying languishing areas reveals opportunities for quick improvements. For example, Numerify’s telecommunications customer determined that many ignored tickets were generated by the same alert, and since the issue didn’t really need to be addressed, leadership simply disabled the alert altogether. Other times, valuable alerts were misconfigured to be sent to the wrong team that was simply ignoring them, until a simple live dashboard revealed them to senior management.
One method of resolving incidents is to simply restart the server supporting the affected infrastructure. Numerify’s customer, in fact, had an SLA clause to restart servers preemptively every 180 – 270 days. Importing server monitoring data from BladeLogic into their automated reporting system revealed that certain servers hadn’t been restarted long past the necessary period. It was also discovered that some of these servers were dormant because they were supposed to have been decommissioned in the past. With this information, Numerify’s customer could know which servers were due for a restart — and which weren’t supposed to exist at all.
BladeLogic’s upcoming patch management feature can further enhance this level of monitoring. Using a person-centered dashboard view, a director with ownership of an application can instantly determine whether all servers supporting their needed infrastructure are compliant with the latest patches.
Main Benefits of Analytics, and Lessons Learned Along The Way
According to our own customers, the main benefits of AIOPs include:
● Near-real-time insights available to senior management
● Never-before-seen insights and analytics
● The transformation of reporting specialists to business analysts
On the last point, self-directed efforts and access to accurate insights allowed employees to feel highly engaged, improving both job satisfaction and productivity.
Measurably, access to automated reporting allowed CSI initiatives to achieve the desired performance success. We were able to significantly reduce their major incident count, Mean Time to Restore Service (MTRS), and reduced customer impact score by upwards of 50%
However, the journey to these improvements was not without some lessons. E.g., we discovered that input and collaboration were needed at the early stages of designing and testing the needed dashboards. By gathering needs requests from intended end-users, not only could the tool be improved, but he could also encourage adoption by those who helped him to create it. Furthermore, the build phase could be accelerated thanks to the work put in to obtain approvals, gather input, and collaborate on testing.
Giving your teams access to true insights at every level of the organization — from a thousand miles high view to spotlights on individual team members — facilitates a shift in culture and perspective. Teams can begin to address lingering problems that dogged productivity, some of which they may never have known existed. More importantly, team members can engage with CSI initiatives on a personal level, allowing IT personnel to become a major part of the changing face of their business as it gets just a little bit better, day after day.
Learn more in my recent webinar: “Using AI Analytics to Optimize IT Service Management (ITSM) at Rogers Communications“