We’re excited to bring back Transform 2022 in person on July 19 and virtually from July 20-28. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Register today!
With the massive growth of machine learning (ML)-based services, the term MLops has become a regular part of the conversation – and with good reason. Short for “machine learning operations”, MLops refers to a broad set of tools, job functions, and best practices to ensure that machine learning models are deployed and maintained in production reliably and efficiently. Its practice is at the heart of production models, ensuring rapid deployment, facilitating experiments to improve performance, and avoiding model bias or loss of prediction quality. Without it, ML becomes impossible at scale.
With any upcoming practice, it’s easy to get confused as to what it actually entails. To help you, we’ve listed seven common MLops myths to avoid, so you can get on the right path to leveraging ML successfully at scale.
Myth #1: MLops terminates on launch
Reality: Launching an ML model is just one step in an ongoing process.
ML is an intrinsically experimental practice. Even after the initial launch, there is a need to test new hypotheses while refining signals and parameters. This allows the model to improve its accuracy and performance over time. MLops processes help engineers manage the experimentation process efficiently.
For example, a central component of MLops is versioning. This allows teams to track key metrics across a wide range of model variants to ensure the best one is selected, while allowing easy reversal in case of error.
It is also important to monitor model performance over time due to the risk of data drift. Data drift occurs when the data a model examines in production deviates significantly from the data the model was originally trained on, resulting in poor quality predictions. For example, many ML models that were trained for consumer behavior before the COVID-19 pandemic degraded sharply in quality after lockdowns changed the way we live. MLops works to respond to these scenarios by creating strong monitoring practices and building an infrastructure to adapt quickly if a major change occurs. It goes far beyond the launch of a model.
Myth #2: MLops is the same as model development
Fact: MLops is the bridge between model development and the successful use of ML in production.
The process used to develop a model in a test environment is usually not the same as that which will allow it to succeed in production. Running models in production requires robust data pipelines to source, process, and train models, often spanning much larger datasets than those found in development.
Databases and computing power will typically need to be moved to distributed environments to handle the increased load. Much of this process needs to be automated to ensure reliable deployments and the ability to rapidly iterate at scale. Tracking also needs to be much more robust as production environments will see data outside of what is available in the test, and so the potential for the unexpected is much greater. MLops consists of all of these practices to bring a model from development to launch.
Myth #3: MLops is the same as devops
Reality: MLops pursues similar goals to devops, but its implementation differs in several respects.
As MLops and devops strive to make deployment scalable and efficient, achieving this goal for ML systems requires a new set of practices. MLops puts more emphasis on experimentation compared to devops. Unlike standard software deployment, ML models are often deployed with many variations at once. There is therefore a need for monitoring models to compare them in order to select an optimal version. For every redeploy, it’s not enough to land the code – the templates have to be retrained every time there’s a change. This differs from standard devops deployments, as the pipeline must now include a recycle and commit phase.
For many common devops practices, MLops extends the scope to meet its specific needs. Continuous integration for MLops goes beyond simple code testing, but also includes data quality checks as well as model validation. Continuous Deployment is more than just a collection of software packages, but now also includes a pipeline to modify or roll back changes to templates.
Myth #4: Correcting an error is simply modifying lines of code
Reality: Fixing ML model errors in production requires advance planning and multiple fallbacks.
If a new deployment results in performance degradation or some other error, MLops teams should have a suite of options to resolve the issue. Simply rolling back to previous code is often not enough, as models need to be retrained before deployment. Instead, teams should keep multiple versions of models on hand, to ensure there’s always a production-ready version available in case an error occurs.
Additionally, in scenarios where there is data loss or a significant change in the distribution of production data, teams should have a simple fallback heuristic so that the system can at least maintain some level of performance. . All of this requires significant pre-planning, which is an essential aspect of MLops.
Myth #5: Governance is completely separate from MLops
Reality: Although governance has separate goals from MLops, much of MLops can help support governance goals.
Model governance manages regulatory compliance and risks associated with using the ML system. This includes things like maintaining appropriate user data protection policies and preventing bias or discriminatory results in model predictions. Although MLops is generally considered to ensure that models deliver performance, this is a narrow view of what it can deliver.
Tracking and monitoring of models in production can be supplemented with analysis to improve the explainability of the models and find biases in the results. Transparency in model training and deployment pipelines can facilitate data processing compliance goals. MLops should be viewed as a practice enabling scalable ML for all business purposes, including performance, governance, and model risk management.
Myth #6: Managing ML systems can be done in silos
Fact: Successful MLops systems require collaborative teams with hybrid skill sets.
ML model deployment spans many roles, including data scientists, data engineers, ML engineers, and devops engineers. Without collaboration and understanding of each other’s work, effective ML systems can become unwieldy at scale.
For example, a data scientist may develop models without much visibility or external input, which can then lead to difficulties in deployment due to performance and scaling issues. Perhaps a DevOps team, unaware of ML leading practices, may not develop the proper tracking to enable iterative experimentation with models.
This is why, at all levels, it is important that all team members have a comprehensive understanding of the model development pipeline and ML practices, with collaboration from day one.
Myth #7: Managing ML systems is risky and untenable
Reality: Any team can leverage ML at scale with the right tools and practices.
As MLops is still a growing field, it may seem like there is a lot of complexity. However, the ecosystem is maturing rapidly and there are a wealth of resources and tools available to help teams succeed at every stage of the MLops lifecycle.
With the right processes in place, you can unlock the full potential of ML at scale.
Krishnaram Kenthapadi is the chief scientist of Fiddler AI.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider writing your own article!
Learn more about DataDecisionMakers