This post is from the Numerify blog and has not been updated since the original publish date.
Design Principles for Packaged Machine Learning Solutions
One of the biggest advantages IT leaders gain by working with an AI-powered IT business analytics solution provider, such as Numerify, is being able to tap into the predictive power of Machine Learning (ML) without getting into the complex code and algorithm behind it. There are several use cases where IT teams benefit from the application of ML and advanced analytic techniques, including Change Risk Prediction.
While ML models are specialized in nature and cannot be applied using a one-size-fits-all approach, there are shared business problems among customers that can be solved with pre-defined ML pipelines and common algorithms. Doing so allows us to cost-effectively train and deploy these solutions so that customers can benefit from state-of-the-art analytic techniques without having to acquire expensive and hard to hire data scientists.
How does it work? Here's a peek behind the curtain at some of the core concepts behind Numerify's pre-packaged ML pipelines:
Business Problem Driven
The first step in building a successful productized ML model is identifying the right business problem. The key is to define business problems in a way that could apply to a broad set of customers and still derive high value.
The next step is to package all possible ML approaches to solve the same objective. For instance, Incident Volume Reduction is a common objective of any large-scale IT organization However, it cannot be addressed by a single ML solution.
Our approach is to package multiple ML techniques such as root cause analysis, topic models to identify key topics behind clusters of incidents, associated sets of attributes with high incident volume, etc. A packaged solution, therefore, must contain an end-to-end solution comprising multiple ML pipelines. It's the actual data that verifies different hypotheses and converts the verified one into actionable insights.
A domain-specific ML pipeline has knowledge of common data problems such as imbalanced data, high cardinality, short text problem, common terms in textual data and standard outlier detection, and so on. Pre-built rules and algorithms handle them to make a ML pipeline robust and production ready.
Functional User Control on Feature Engineering
Feature engineering is the most crucial component of any applied data science projects. Robust feature engineering requires strong domain knowledge as well as data science skills. Domain knowledge helps in drafting potential relationships and interactions within data. Data scientists use this knowledge and apply them to create stronger predictors from raw data. While many tools aim at automating feature engineering, functional knowledge of data sources and business processes is a must to start feature engineering on the right data-sets.
Numerify's ML workbench is a core component in the Numerify Platform where functional users can customize or extend the data-set being fed into the predefined machine learning solutions. This enables the seamless application of domain knowledge into data science solutions.
Data and metadata profiling capabilities of the platform enable functional users to select the right data-set and seamlessly feed it into the downstream ML pipeline. They can also segment and branch off ML pipelines if different distribution is observed in different business segments to fit separate models with just a few clicks.
Segregation of Data-Driven and Data-Agnostic Phases
Segregating data-driven phases such as feature selection and feature engineering from model training, evaluation, and deployment helps in complete abstraction and automation of data-agnostic steps. Customization can be limited to data-driven phases, significantly bringing down the time to customize and deliver an ML solution for a specific customer.
Automated Inference Delivery
Numerify's ML workbench allows users to train multiple models simultaneously that can be compared and evaluated based on predefined evaluation criteria. Users can then promote the approved model directly from development to production. Regular batch ETL calls the deployed model via the scoring API to score incremental data and loads the inferences in high performance datastore. Interpretable insight from the ML model is generated with custom and open source libraries such as LIME and presented as actionable insights in user dashboards.
Our mission is to expose the power of machine learning and advanced analytics to as many users as possible while handling the complex modeling involved on their behalf. Our ML pipeline and ML workbench are designed to do exactly this, so that IT leaders can easily and quickly harness the powerful abilities of the latest AI technology to improve their organizations.