ML-ASSOC Cheatsheet — MLflow, Features, Training & Evaluation on Databricks

Last-mile ML-ASSOC review: feature engineering patterns, train/test discipline, MLflow tracking and registry concepts, and evaluation pickers. Includes code snippets, tables, and diagrams.

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate instincts.


1) The MLflow mental model (what goes where)

MLflow conceptWhat it storesWhy it matters
Runone training/eval attemptcompare experiments reproducibly
Paramshyperparameters/configexplain how a run was produced
Metricsevaluation numbersrank candidates
Artifactsmodel files, plots, reportsreproduce and deploy
Registrymodel versions + lifecycle stagescontrolled promotion to production
    flowchart LR
	  D["Data"] --> FE["Feature engineering"]
	  FE --> TR["Train"]
	  TR --> R["MLflow run (params/metrics/artifacts)"]
	  R --> REG["Model Registry"]
	  REG --> DEP["Deploy (batch/real-time)"]

2) Feature engineering quick rules (avoid leakage)

RiskWhat it looks likeSafer approach
Leakagefeatures use future infocompute features using only info available at prediction time
Label leakagefeature derived from targetdrop/shift feature; verify pipeline
Train/test contaminationstats computed on full datasetfit transforms on train only

3) Metrics pickers (high-yield)

TaskCommon metricsNotes
Classificationaccuracy, precision/recall, F1, AUCbeware class imbalance
RegressionRMSE, MAE, R²choose based on error sensitivity

Rule: If the prompt mentions imbalance or false positives/negatives, accuracy is rarely the right answer.


4) MLflow tracking: minimal code pattern

1import mlflow
2
3with mlflow.start_run():
4  mlflow.log_param("max_depth", 8)
5  mlflow.log_metric("auc", 0.91)
6  mlflow.log_artifact("confusion_matrix.png")
7  mlflow.sklearn.log_model(model, "model")

Exam cue: if you need reproducibility, log params + metrics + model artifact in the run.


5) Registry basics (versioning + promotion)

StepWhat happensWhy it matters
Register modelcreates a named model with versionsstable reference
New versionproduced from a run/model artifacttraceability
Promote stagee.g., staging → productioncontrolled rollout

6) Fast troubleshooting pickers

  • Can’t reproduce a result: missing params/artifacts, data version drift, or randomness not controlled.
  • Metrics look too good: leakage, wrong split, target in features.
  • Model “works” offline but not in production: skewed features, missing preprocessing, inconsistent schema.