MLOps & Deployment
Pipelines, monitoring, drift detection — notebook to production
Prerequisites
- inference-serving
Here's a dirty secret about AI: most models that work brilliantly in a Jupyter notebook fail in production. Not because the model is bad, but because nobody thought about how to deploy it, monitor it, update it, or roll it back when things go wrong. MLOps is the discipline that bridges that gap — it's DevOps for machine learning.
In traditional software, you write code, test it, deploy it, and monitor it. In ML, you do all that plus manage datasets, track experiments, version models, detect when your production data starts looking different from your training data, and retrain when performance degrades. It's messier, more dynamic, and absolutely critical if you want your model to survive first contact with the real world.
This lesson walks you through the complete MLOps lifecycle — from experiment tracking and model registry to deployment pipelines and drift detection. By the end, you'll understand how production ML teams keep models running reliably at scale.
From notebook to production
The core challenge of MLOps is this: machine learning is inherently experimental. You try different features, different algorithms, different hyperparameters. Maybe 20 experiments later you get something that works. Now you need to remember which one it was, what data it trained on, how to reproduce it, and how to deploy it without breaking everything.
The MLOps lifecycle is a continuous loop. It's not a straight line from training to deployment — models degrade over time, data changes, and you'll be retraining and redeploying regularly. Here's the flow:
The MLOps lifecycle
When monitoring detects a problem — accuracy dropping, input distribution shifting — you loop back to the top: gather new data, retrain, evaluate, deploy a new version. This is why MLOps is more complex than traditional DevOps. The artifact you're deploying (the model) is produced by a stochastic training process, not a deterministic build step. You can't just rerun the same code and get the same result.
The MLOps toolkit
MLOps isn't a single tool — it's a collection of practices and infrastructure that work together. Here are the four core pieces every production ML system needs.
Experiment Tracking
When you're training models, you need to log everything: hyperparameters, metrics, training time, dataset version, git commit hash, random seed. Tools like MLflow and Weights & Biases make this automatic.
Why this matters: Two months from now, when someone asks "which model was that good one from Tuesday?" you can pull up the exact run, see what made it work, and reproduce it. Without experiment tracking, you're flying blind.
Try it yourself
This interactive pipeline simulator shows the full MLOps flow. Click Run Pipeline to watch data flow through each stage. Click any stage to see the tools and metrics involved. Toggle between Good Data and Data Drift scenarios to see how monitoring catches performance degradation.
📊Production Monitoring
Notice how the drift scenario triggers an alert when model accuracy drops below the 90% threshold. That's your signal to investigate, gather new data, and retrain. In a real production system, this would automatically notify the team and might even trigger a retraining job.
Key Takeaways
- MLOps is DevOps for machine learning — it covers experiment tracking, model versioning, automated testing, deployment pipelines, and monitoring. Models are living artifacts that degrade over time, so you need infrastructure to retrain and redeploy continuously.
- Experiment tracking (MLflow, W&B) logs every training run so you can reproduce results and compare models. Without it, you lose track of what worked and why.
- Model registries version models like Git versions code. Tag models for staging/production, promote new versions, and roll back instantly if something breaks.
- Data drift (input distribution changes) and concept drift (input-output relationship changes) are the silent killers of production models. Monitoring tools detect these shifts so you know when to retrain.
- ML has unique tests: data validation (schema, ranges), model performance checks (accuracy thresholds), and integration tests (API latency). CI/CD for ML automates these and blocks bad deployments.
Common Misconceptions
- "You train a model once and deploy it forever." — Models degrade. Data changes, user behavior shifts, and performance drops. Production ML is a continuous loop of monitoring, retraining, and redeploying. If you treat your model like static code, it will fail.
- "If it works in the notebook, it will work in production." — The notebook has clean data, infinite time, and no latency constraints. Production has messy data, strict SLAs, and users who will find edge cases you never imagined. Deployment is where most ML projects die.
- "Monitoring is just tracking accuracy." — Accuracy can stay high even when the model is broken. You need to monitor input distributions, prediction distributions, latency, error rates, and business metrics. A model that predicts "no fraud" 100% of the time has perfect accuracy if fraud is rare — but it is useless.