ML-PRO Cheatsheet — Production ML on Databricks (Features, Registry, Deployment, Monitoring)

Last-mile ML-PRO review: feature pipeline patterns, MLflow registry and promotion workflows, batch vs online deployment pickers, monitoring/drift decision rules, and governance essentials.

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate production judgment.


1) The “production ML loop” (what the exam is testing)

    flowchart LR
	  FE["Feature pipeline"] --> TR["Train + evaluate"]
	  TR --> RUN["MLflow run (params/metrics/artifacts)"]
	  RUN --> REG["Registry version"]
	  REG --> DEP["Deploy (batch/online)"]
	  DEP --> MON["Monitor + drift"]
	  MON -->|retrain| FE

Exam rule: if a solution lacks versioning, lineage, or rollback, it’s rarely correct.


2) Feature pipelines: consistency beats cleverness

Risk Symptom Mitigation
Training/serving skew production metrics collapse shared transforms; enforce schema
Leakage unrealistically good offline metrics time-aware splits; careful feature design
Drift model degrades over time monitor distributions and outcomes

3) Registry and release workflows (high-yield)

Concept Why it matters
Registry versions stable, auditable artifacts
Stage transitions controlled promotion and rollback
Approval gates reduce “accidental production”

One-sentence heuristic: runs are for experiments, the registry is for releases.


4) Deployment pickers (batch vs online)

Requirement Prefer Why
Low latency per request Online serving request/response
High throughput scoring Batch inference cost-efficient
Model updates frequently Managed rollout + rollback reduce risk
Strict governance Versioned registry releases auditability

5) Monitoring decision rules (what to do when metrics drop)

Observation First question Likely action
Gradual degradation data drift? seasonality? retrain / update features
Sudden drop pipeline break? schema change? rollback or fix upstream
Only one segment affected sampling bias? segment monitoring + targeted fix

6) Fast troubleshooting pickers

  • Can’t reproduce model: missing data/code versioning, randomness, or preprocess mismatch.
  • Model works in staging, fails in prod: feature skew, missing preprocessing, wrong schema.
  • Drift alarms firing: confirm data pipeline changes and validate feature distributions.