[Avg. reading time: 7 minutes]
Life Before MLOps
Challenges Faced by ML Teams.
Moving Models from Dev → Staging → Prod
Models were often shared as .pkl or joblib files, passed around manually.
Problem: Dependency mismatches (Python, sklearn version), fragile handoffs.
Stopgap: Packaging models with Docker images, but still manual and inconsistent.
Champion vs Challenger Deployment
Teams struggled to test a new (challenger) model against the current (champion).
Problem: No controlled A/B testing or shadow deployments → risky rollouts.
Stopgap: Manual canary releases or running offline comparisons.
Model Versioning Confusion
Models saved as model_final.pkl, model_final_v2.pkl, final_final.pkl.
Problem: Nobody knew which version was truly in production.
Stopgap: Git or S3 versioning for files, but no link to experiments/data.
Inference on Wrong Model Version
Even if multiple versions existed, production systems sometimes pointed to the wrong one.
Problem: Silent failures, misaligned experiments vs prod results.
Stopgap: Hardcoding file paths or timestamps — brittle and error-prone.
Train vs Serve Skew (Data-Model Mismatch)
Preprocessing done in notebooks was re-written differently in prod code.
Problem: Same model behaves differently in production.
Stopgap: Copy-paste code snippets, but no guarantee of sync.
Experiment Tracking Chaos
Results scattered across notebooks, Slack messages, spreadsheets.
Problem: Couldn’t reproduce “that good accuracy we saw last week.”
Stopgap: Manually logging metrics in Excel or text files.
Reproducibility Issues
Same code/data gave different results on different machines.
Problem: No control of data versions, package dependencies, or random seeds.
Stopgap: Virtualenvs, requirements.txt — helped a bit but not full reproducibility.
Lack of Monitoring in Production
Once deployed, no one knew if the model degraded over time.
Problem: Models silently failed due to data drift or concept drift.
Stopgap: Occasional manual performance checks, but no automation.
Scaling & Performance Gaps
Models trained in notebooks failed under production loads.
Problem: Couldn’t handle large-scale data or real-time inference.
Stopgap: Batch scoring jobs on cron — but too slow for real-time use cases.
Collaboration Breakdowns
Data Scientists, Engineers, Ops worked in silos.
Problem: Miscommunication -> wrong datasets, broken pipelines, delays.
Stopgap: Jira tickets and handovers — but still slow and error-prone.
Governance & Compliance Gaps
No audit trail of which model made which prediction.
Problem: Risky for regulated domains (finance, healthcare).
Stopgap: Manual logging of predictions — incomplete and unreliable.