How to Implement ML Operations (With Case Study)
Machine learning (ML) is the transformative force that is molding today's data-driven world. It empowers businesses to gain valuable insights and make informed decisions.
ML enables us to design algorithms that help systems learn and improve themselves. However, deploying and managing ML models in production is a complex and challenging task.
ML Operations, or MLOps, is a set of techniques that facilitate efficient testing, deployment, and management of ML models. Let’s explore how ML Operations make the delivery of ML projects successful.
Understanding ML Operations (ML Ops)
ML Operations is an extension of DevOps. It transforms ML models from code to real-world solutions.
ML Operations entail the entire ML lifecycle from model development to training, deployment, and maintenance. It automates the entire process and recalibrates parts based on training and requirements.
It streamlines the ML lifecycle and helps organizations to:
- Accelerate time to market for ML models
- Improve the quality and reliability of ML models
- Reduce the cost of deploying and managing ML models
Importance of ML Operations
Any new ML model must go through many processes before it can be deployed. ML Operations can bridge that gap and promote collaboration between different teams.
Earlier, separate teams used to work on the ML models with different tools and frameworks. This required the model to be converted across different stages. It also required tracking the various parameter tweaks made to the model by all teams. However, ML Operations creates a standardized and streamlined way to develop models.
Another advantage of implementing ML Operations is feature engineering, which is vital for model training. Feature engineering is the process of transforming raw data into features that the ML models can process. Generating these features is a long and complicated process that impacts model accuracy. Some ML Ops solutions provide a feature store to make feature engineering a more efficient process.
Thus, ML Operations can turn models into actionable insights, minimize risks, and optimize performance. Businesses are now more likely to adopt ML Ops solutions to unlock the true potential of machine learning. For example, Uber reinforced its operations with an internal ML-as-a-service platform called Michelangelo. It aids its team in creating, deploying, and operating ML Ops solutions at scale.
How to Implement MLOps?
Implementing ML Operations becomes efficient with a holistic approach. This approach includes people, processes, and technology. Here are some important steps to consider:
Setting Up Version Control
As with any software deployment process, version control catalogs all versions of the ML model and ensures that everyone is on the same page. Popular platforms like Git enable developers to track all changes and revert to previous versions if required.
Designing CI/CD Pipelines
There are three levels of ML Operations automation. Manual process, ML pipeline automation, and CI/CD (Continuous Integration/Continuous Deployment) pipeline automation.
A CI/CD system is in the final stage. It automates and accelerates the process of building, testing, and deploying ML models. Thus ensuring a fast and reliable ML model deployment.
Integrating a Model Registry
A model registry is a centralized repository for storing and managing different versions of ML models. This includes the models’ metadata, versions, and performance metrics. It also facilitates collaboration between different teams.
Fostering Collaboration Across Teams
One way to streamline the ML workflow is by fostering collaboration across teams. It breaks down silos and encourages open communication among all members.
Leveraging Cloud-Based Solutions
Cloud platforms simplify the deployment process. They allow easy scalability, flexibility, and accessibility of the deployed models. They also enable efficient resource management and access to cutting-edge tools/ services. These benefits cut down deployment costs and risks significantly.
Optimizing Model Inference Speed
A high model inference speed is critical for real-time applications. Quantization can reduce the computational costs of inference by representing weights in low-precision data types (like 8-bit integers).
The resulting model then requires less memory storage and consumes less energy. Another method of enhancing inference speed and responsiveness is through hardware acceleration.
Implementing Monitoring and Continuous Improvement
Regular monitoring can maintain the ML model at its peak performance post-deployment. Monitoring can help proactively detect anomalies and prevent issues before they escalate. ML operations teams can even incorporate feedback loops to automate continuous improvement.
Challenges in Implementing ML Operations
Even after a successful implementation, ML Operations entail certain challenges. Here’s a look at some of those post-implementation ML Ops challenges:
Lack of Collaboration
Teams that operate in isolation can hinder collaboration. ML Operations thrive on cooperation, so it is essential to break down barriers technologically and in person.
Insufficient Monitoring
Sporadic monitoring can make teams miss relevant issues that can degrade performance. Robust monitoring systems are required to detect performance issues and data drift. Drift is the change in a model’s training data that leads to model performance degradation.
Resource Mismanagement
Inefficient resource allocation can deprive important functions of processing memory. This can lead to degraded performance and even a system failure.
Resistance to Change
It is very common for teams to resist adopting new processes. It is vital to address their concerns, provide training, and showcase the benefits of new processes.
Security Concerns
Security lapses and breaches have become too common in digital systems and software. It is paramount to implement robust security measures to protect the models and data throughout the pipeline.
Overlooking Ethical Considerations
The programming and training of ML models are susceptible to biases. The resulting social impact of these models can have severe ethical implications. Responsible AI requires that we check and control biases during model development and training.
Best Practices
For a successful implementation, consider these ML Ops best practices:
1. Establish Clear Communication Channels
Open and transparent communication is the cornerstone of successful ML Ops implementation. Ensure that individual ownership and responsibilities are known and communicated for the project.
2. Automate Repetitive Tasks
Automation accelerates the ML workflow and reduces the risk of errors. You can incorporate tools that automate the testing, deployment, and monitoring processes.
3. Prioritize Documentation
Document every step of the ML pipeline. Thorough documentation eases the transfer of knowledge and facilitates troubleshooting.
4. Embrace Agile Methodologies
Iterative development cycles are known to allow for quick adjustments based on feedback. Embrace agile methodologies to enhance the adaptability and responsiveness of the ML model.
Case Study – Swiggy Suggests
Swiggy, the popular food delivery system, faced a common challenge —their customers were overwhelmed by an abundance of choices. Swiggy recognized the “paradox of choice” in this problem. They decided to leverage ML Operations to simplify the decision-making process.
Swiggy envisioned Swiggy Suggests as the solution. It is a personalized item-level recommendation engine. It streamlines choices and facilitates quicker decision-making.
Swiggy Suggests broke down the problem into two components within the ML Operations framework:
1. Rule Generation for Valid Carts
- They used their Food Intelligence engine's categorization. This included data sets of dish families, co-order metrics, and ratings.
- They ensured uniformity in food preferences across cities while considering city-specific trends.
2. Retrieval and Ranking of Candidate Carts
- The developers decided to fetch candidate carts from both the overall platform orders and the customer’s order history.
- They created an Approximate Nearest Neighbour (ANN) index to efficiently retrieve the next closest relevant dish.
- They ranked carts based on factors such as similarity, pricing, and geographical distance.
Their ML Ops journey faced several problems typical of recommendation systems. Those were scale and subjectivity, item availability, and the need for micro-personalization. To counter those, they focused on two-item carts and incorporated their cross-sell models.
Swiggy Suggests was implemented through a well-orchestrated ML Ops workflow. To cover all bases, their ML team created two different workflows.
Workflow 1:
- It runs at a lower frequency (once a week) as its components don’t change daily.
- Feature engineering is used to capture the ‘vegness’ (appropriate items for vegetarians) of customers.
- To cross-sell effectively, candidate carts are built from customers’ order histories.
Workflow 2:
- This workflow runs daily to ensure that the organic carts being prepared are fresh/up-to-date.
- The creation of the Approximate Nearest Neighbour (ANN) indexes is based on Spotify’s Annoy library.
- Feature store is used to rank candidate carts.
- The Annoy library does not support spark implementation. To counter this, the ML team created custom wrappers that build indexes and infer from them parallelly.
These workflows notably reduced the ‘time to order’ for its customers. Their ML model will undergo continuous improvements and further refine their nearest neighbor algorithm.
Final Word
ML Operations ensure that cutting-edge ML models translate into tangible outcomes. Navigating its challenges and embracing the best practices is key to a successful implementation.
There are many generic ML Ops implementation strategies available in the market. But, it is prudent to use a tool that will adapt to your needs and result in a successful model deployment.
MarkovML’s, a no-code platform, lets you build AI workflows in minutes without dealing with the technical complexities. Harness the power of advanced machine learning models to streamline your business processes and focus on what truly matters. Leave the heavy lifting to us! Want to learn more?
Get in touch to know more.
Let’s Talk About What MarkovML
Can Do for Your Business
Boost your Data to AI journey with MarkovML today!