CI/CD/CT - DevOps for Machine Learning and Best Practice in Production

Machine Learning is bridging the same chasm that IT spent the last decade traversing - between the development process and the operational process in software development.

The objective is to be able to rapidly, reliably and repeatedly deliver value in production, and the unification of dev and ops has become a technological revolution commonly referred to as (perhaps unsurprisingly) DevOps.

A more recent trend in industry has been MLOps - extending the philosophy of DevOps to Machine Learning, typically through unifying data scientists and operations teams, to achieve similar goals. Because of the ways Machine Learning works, however, there are a few additional complexities and nuances that need to be considered when planning a move to MLOps:

1. Experimental Focus

The Machine Learning process is inherently experiment-oriented - fundamentally we are attempting to prove or disprove whether ML can solve a business problem. This might mean, for example, by beating the performance of the existing business process, or through an R&D-heavy undertaking to understand whether ML can apply to a problem at all or not. Although agile software deployments are more test-driven than more traditional models, the intended outcomes of DevOps and MLOps are still different.

2. Regression Testing

Where normal, deterministic software can be easily regression tested, it is inherent in machine learning models that we want outputs to change with improving performance. As an example, if a previous model predicted customer churn with 85% accuracy, and produced a score for a user of 0.783, we might have a new algorithm that predicts with improved accuracy, and for the same user produce a score of 0.832 - regression testing against a moving target is inevitably challenging.

3. Model and Data Drift

This is especially important to consider in models that deal with human behaviour. Data and models inevitably drift - for example with customers altering their behaviours because of working from home, or because of changing trends in retail. It’s more critical than ever to be able to deploy new models at speed.

4. Monitoring Loops

Monitoring loops start beyond software monitoring and logging - depending on the problem, feedback loops can range from seconds to months (for example, in applications such as fraud detection). The choice of metrics will depend heavily on the context of the business problem.

While MLOps does adapt two concepts from DevOps - Continuous Integration (CI) and Continuous Delivery (CD) - it also then needs a further pillar to account for the differences between development and machine learning. This is known as Continuous Testing (CT) and allows for both testing and retraining of models, giving us the CI/CD/CT approach. By appreciating the subtle differences and by adopting CI/CD/CT, teams can realise significant improvements with MLOps.

Whilst innovation in this space is far from finished, and will continue to see the creation of new tools and ideas for many years to come, MLOps has been able to introduce some particularly useful real-world concepts and technological developments in recent years.

These include:

Using well-established historic cases of fraudulent and legitimate transactions to ensure the algorithm continues to correctly detect these in back testing while still learning.
Building effective monitoring metrics that can have appropriately defined Red, Amber and Green warning flags - for example we may expect to have either 0.1% of all transactions being fraudulent, or expect an approximate order of magnitude of 100 transactions per day being flagged. So setting an Amber warning at 250 transactions per day or 0.5% would be a good start, and if fraudulent transactions exceed 10% of all transactions, the anti-fraud algorithm should be pulled into a fail-safe mode.
Building models that have a training process embedded into them, that potentially can be triggered by the monitoring tools - Continuous Training as an additional part of the Continuous Integration, Continuous Deployment loop is a critical part of ensuring models have longer term sustainability and continue to deliver business value.
Adopting containers for Machine Learning applications to simplify pipelines and deployment processes, while giving additional advantages such as automating scaling up.
Increasing awareness of the importance of data pipelines for ML models that means many organisations are now moving towards embedding data quality monitoring as part of their pipeline.

A substantial amount of effort has also gone into model explainability, which we will explore in subsequent blogs. Thanks for reading!

Don't forget to subscribe below to kept up-to-date with the latest on data & machine learning!

The Analytics Revolution blog series:

CI/CD/CT - DevOps for Machine Learning and Best Practice in Production

Machine Learning is bridging the same chasm that IT spent the last decade traversing - between the development process and the operational process in software development.

1. Experimental Focus

2. Regression Testing

3. Model and Data Drift

4. Monitoring Loops

Don't forget to subscribe below to kept up-to-date with the latest on data & machine learning!

Ethical AI is not an Afterthought

The Machine Learning Lifecycle

Five Recommendations for the Successful Adoption of Machine Learning

CI/CD/CT - DevOps for Machine Learning and Best Practice in Production

Dataflow - Google Cloud's Best-Kept Secret

Sign Up For Blog Updates