Machine Learning Engineering Standards
This skill covers the end-to-end lifecycle of building and deploying ML models, from simple regressions to complex neural networks.
1. Problem Framing
- Supervised vs Unsupervised: Do we have labeled data (e.g., patient outcomes)?
- Regression vs Classification: Are we predicting a number (blood sugar level) or a category (Risk: High/Low)?
- Baseline: Always establish a dump heuristic baseline (e.g., "Predict the average") before training a model. If your model doesn't beat the average, it's useless.
2. Data Engineering (Feature Store)
- Garbage In, Garbage Out: 80% of ML is data cleaning.
- Normalization: Scale inputs (0-1 or -1 to 1). Neural networks fail with unscaled data.
- Categorical Encoding: One-Hot Encoding vs Embeddings.
- Splitting: STRICT separation of Train / Validation / Test sets to avoid data leakage.
3. Model Selection Strategy
- Tabular Data (Excel, SQL): XGBoost / LightGBM / CatBoost usually beat Deep Learning.
- Unstructured Data (Images, Text): Deep Learning (Transformers, CNNs).
- Start Simple: Logistic Regression -> Random Forest -> Gradient Boosting -> Neural Net. Don't jump to Deep Learning immediately.
4. MLOps (Deployment)
- Model format: ONNX is the universal standard for portability.
- Serving:
- Realtime: API (FastAPI) wrapping the
model.predict(). - Batch: Nightly jobs processing thousands of rows.
- Realtime: API (FastAPI) wrapping the
- Drift Monitoring: Models rot. Monitor the input distribution. If inputs change (e.g., "Patient age range changed"), retrain.
5. Evaluation Metrics
- Accuracy is misleading (especially in imbalanced medical data).
- Use Precision (False Positives matter?) vs Recall (False Negatives matter?).
- For medical screening, Recall usually wins (better to have a false alarm than miss a diagnosis).
