nasscom Community

ML Engineering and ML Ops – what’s the buzz about?

3 Mins read

The term ‘ML Engineering’ has exploded in the past few years, touted as the ‘hottest job’ in Technology. However, the term ‘ML Engineering’ is nebulous, and is used in a variety of ways. Recently, it has become commonplace to throw in this term to make a job appear more attractive. This post will provide some clarifications and hopefully, help some of you understand what this term actually means and enable some of you to make decisions on developing your career in this domain.

Machine Learning (ML) is a branch of Artificial Intelligence involving algorithms to perform specific tasks/activities (like predictions or decisions) purely from data in an automated manner, i.e., without explicit human guidance on how to perform those tasks/activities. These algorithms are typically complex statistical algorithms involving very large data sets. So to make these algorithms successful in practice, we require fairly elaborate engineering. The engineering associated with ML is what we call “ML Engineering”.

Machine Learning Project Lifecycle

However, this is too abstract a definition as there are different forms of engineering associated with ML. When it comes to exploring or developing a career in “ML Engineering”, it pays to understand this term by splitting “ML Engineering” into two buckets that require fairly different technical backgrounds, and the nature of the engineering work also tends to be fairly different. Hence, we categorize “ML Engineering” into the following two buckets:

  • Engineering for Development of ML Models: This is the engineering work required to train an ML model, experiment with different choices of models and hyper-parameters (known as “hyperparameter tuning”), experiment with different choices of features used to train the model (known as “feature engineering”), and invest in the software engineering for model efficiency, reusability, and readability.

This engineering work is typically highly entangled with the mathematical and statistical aspects of ML Model Development. So, it’s impossible to do this type of engineering work without a depth of understanding of the mathematics of ML (for example, cross-entropy, gradient descent, embeddings, etc.).

Fortunately, there are several good books and videos (plus open-source code) to learn about this topic. This educational content is typically a joint education on the mathematics and engineering associated with the development of ML Models (because of how entangled mathematics and engineering are).

  • Engineering for Deployment of ML Models: This is the engineering work required to deploy and support ML model training pipelines and inference in production. Sometimes this area is named as “MLOps”. Although we used the somewhat-narrow term “Deployment”, there are many aspects here involving real-time performance. This includes caching for immediate inferencing, ensuring the right model version is used, the ability to debug production errors reliably and quickly, and collect and assess performance metrics for model-feedback.

Much of the engineering work involved here has strong resemblances to traditional deployment of software, which an engineering without a background in ML should be very familiar with. In fact, this is the reason this area is a much easier entry point into the world of ML for an engineer. We’d argue that one can do the “MLOps” job with only a surface-level understanding of ML, as long as one has got significant experience in the traditional engineering world of “DevOps”.

So now we’d like to provide some reading/coding content to learn about the world of “MLOps”. We will provide the content in 3 layers, starting with a quick, introductory read and ending with an entire hands-on course that will train a traditional software engineer in the world of “MLOps”.

  1. https://stackoverflow.blog/2020/10/12/how-to-put-machine-learning-models-into-production/ is a short blog post on deploying ML models in production (introductory content)
  2. https://mlinproduction.com/deploying-machine-learning-models/ breaks up the world of “MLOps” into its different aspects and is a series of blog posts explaining the different aspects in some detail.
  3. We all know that an engineer truly understands a subject only by “doing”. Hence, we recommend a wonderful course taught in the Computer Science department at Stanford (Disclaimer: We might be a bit biased here with the university choice): https://stanford-cs329s.github.io/index.html. The good news is that you don’t have to be a Stanford student to learn this material. All the lecture notes and slides are available openly. More importantly, this GitHub repo: https://github.com/mrdbourke/cs329s-ml-deployment-tutorial is the codebase that serves as the tutorial throughout this course. We want to emphasize that you can’t learn ML Ops by simply reading a textbook. You have to write code to actually deploy and test an ML model in order to truly grasp this subject. We hope you enjoy this coding experience! 

     

We hope this article has provided some clarity on ML Engineering and ML Operations and how you can get started on this journey. There is a world of information and resources available no matter which career path you take – it’s about understanding how this could make a difference to your engineering career and deciding what’s right for you.

Authors

Anupama Joshi, Senior Director, Technology, Target Ashwin Rao, Vice President, AI, Target
Anupama Joshi, Senior Director, Technology, Target Ashwin Rao, Vice President, AI, Target