This post is from the Numerify blog and has not been updated since the original publish date.
Why Smart Incident Management needs NLP (Part 1)
This is the first of a two-part series on this topic. Part 1 introduces Natural Language Processing (NLP) and how it could transform Incident Management. Better Incident Management is a critical area of focus for all IT organizations given the potential of Incidents to disrupt business, harm brand reputation and reduce stakeholder confidence in IT teams. Consumerization of IT and a movement towards best-of-breed applications has resulted in a volume and variety of Incidents that is unprecedented. Not only do IT organizations have to be nimbler than ever to resolve Incidents and restore business-as-usual in quick fashion, but they also have to be proactive to prevent Incidents from being generated at all. As analytics become more prominent, many companies are focusing on performance metrics, data quality and process automation for efficiency and cost reduction in ITSM processes. Incident Management is a key driver for that, but one question that begs to be answered is, "Are we leveraging all the data captured in the Incident process for Analytics?"[/vc_column_text][hcode_section_heading hcode_heading_type="heading-style1" heading_preview_image="heading-style1" hcode_heading_tag="h2" hcode_heading="Turning Better Incident Management into Smart Incident Management"][vc_column_text] Your ITSM Systems of Record carry valuable information about process performance that you can leverage to identify opportunities for improvement. This information can be categorized into one of the following:
- Structured: Most commonly used data for dashboards, reports and analytics
- Unstructured: Free-form text fields like Description, Work Notes, etc. Often analyzed manually.
- Semi-structured: Logs / Events, usually analyzed using Infrastructure and Application monitoring tools
Structured data is often leveraged extensively for reporting and analytics, while semi-structured data is often leveraged to find technical root cause. However, conventional analytics tend to underutilize the Unstructured data. This blog focuses on the ways in which Unstructured data can enhance Structured data analytics. Unstructured text fields are not used systematically or frequently for analysis as text analysis can be hard to execute at scale. The "signal-to-noise" ratio in text data is generally very low (especially in longer text fields like Work Notes, Resolution Notes) and the business user must cope with some ambiguity while looking at text-based insights. Additionally, when analyzing large volumes of Incidents, the volume of text data can be a deterrent. Relational data formats and querying techniques are unsuitable for text analytics and thus require different skills and technology stack. Finally, it can be difficult to juxtapose text-based insights with structured data analysis to present an actionable picture of the data. However, it is worth overcoming these hurdles to apply text analytics to Incident data, as it contains the following important information not available in structured fields:
- Exact nature of the Incident or the issue
- Root Cause or Resolution steps for the Incident
- Similarity between Incidents and Problems/Changes beyond standard fields like CI, Application, etc.
Given that most description text in incidents is human generated, we need to leverage Natural Language Processing to be able to parse, interpret and analyze this data. [/vc_column_text][hcode_section_heading hcode_heading_type="heading-style1" heading_preview_image="heading-style1" hcode_heading_tag="h3" hcode_heading="What Is NLP and How Can You Apply It to Incident data?"][vc_column_text]Natural Language Processing is a field of computer science that deals with algorithms and techniques that enable computers to process, understand and analyze human languages. Here's an example of how Numerify's System of Intelligence processes text and produces usable insights.
[/vc_column_text][hcode_section_heading hcode_heading_type="heading-style1" heading_preview_image="heading-style1" hcode_heading_tag="h3" hcode_heading="Key Capabilities of the Numerify NLP engine"][vc_column_text]Keyword feature extraction – Keyword extraction for Incident text data needs to go beyond the standard tokenization & lemmatization that is generally used for text preprocessing. Terms such as IP addresses, email addresses, URLs, Asset ids are significant for analytics and thus standard text preprocessing needs to be enhanced to handle such non-straightforward tokens. E.g. terms like "192.168.67.45", "email@example.com", "/proj/mps33b/rev2", "L343HH23" are quite common in Incident and need special processing. Domain-based Stopwords – Off-the-shelf text analytics packages and libraries typically work with a standard English language stopwords list as provided by Python NLTK or Stanford CoreNLP. This stopwords list is insufficient for ITSM and needs to be enhanced with the domain context. E.g. Words like "issue" or "incident" which occur very frequently across Incident text data are not a part of the English stopwords list. However, from an ITSM point of view, these are actual stopwords as they do not add any new information to the analysis we are doing. Leveraging our deep domain expertise and experience across Fortune 500 clients, we have compiled an ITSM domain-specific stopwords list which is a part of the Numerify NLP engine. Similarity identification – This is the core of the NLP engine and does the main task of isolating groups of similar Incidents based on text data. This core algorithm draws on industry standard techniques of Topic Modelling, Entity Recognition and Information Retrieval, and has been fine tuned for the ITSM context. This algorithm is generic in nature and can be applied to any process area beyond Incident such as Problem or Change Request. Stay tuned for the second part where we discuss specific use cases of how to use NLP with Incident data.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width="1/1"][/vc_column][/vc_row]