ML-ASSOC Cheatsheet — MLflow, Features, Training & Evaluation on Databricks

Last-mile ML-ASSOC review: feature engineering patterns, train/test discipline, MLflow tracking and registry concepts, and evaluation pickers. Includes code snippets, tables, and diagrams.

On this page

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate instincts.

1) The MLflow mental model (what goes where)

MLflow concept	What it stores	Why it matters
Run	one training/eval attempt	compare experiments reproducibly
Params	hyperparameters/config	explain how a run was produced
Metrics	evaluation numbers	rank candidates
Artifacts	model files, plots, reports	reproduce and deploy
Registry	model versions + lifecycle stages	controlled promotion to production

    flowchart LR
	  D["Data"] --> FE["Feature engineering"]
	  FE --> TR["Train"]
	  TR --> R["MLflow run (params/metrics/artifacts)"]
	  R --> REG["Model Registry"]
	  REG --> DEP["Deploy (batch/real-time)"]

2) Feature engineering quick rules (avoid leakage)

Risk	What it looks like	Safer approach
Leakage	features use future info	compute features using only info available at prediction time
Label leakage	feature derived from target	drop/shift feature; verify pipeline
Train/test contamination	stats computed on full dataset	fit transforms on train only

3) Metrics pickers (high-yield)

Task	Common metrics	Notes
Classification	accuracy, precision/recall, F1, AUC	beware class imbalance
Regression	RMSE, MAE, R²	choose based on error sensitivity

Rule: If the prompt mentions imbalance or false positives/negatives, accuracy is rarely the right answer.

4) MLflow tracking: minimal code pattern

1import mlflow
2
3with mlflow.start_run():
4  mlflow.log_param("max_depth", 8)
5  mlflow.log_metric("auc", 0.91)
6  mlflow.log_artifact("confusion_matrix.png")
7  mlflow.sklearn.log_model(model, "model")

Exam cue: if you need reproducibility, log params + metrics + model artifact in the run.

5) Registry basics (versioning + promotion)

Step	What happens	Why it matters
Register model	creates a named model with versions	stable reference
New version	produced from a run/model artifact	traceability
Promote stage	e.g., staging → production	controlled rollout

6) Fast troubleshooting pickers

Can’t reproduce a result: missing params/artifacts, data version drift, or randomness not controlled.
Metrics look too good: leakage, wrong split, target in features.
Model “works” offline but not in production: skewed features, missing preprocessing, inconsistent schema.

Syllabus

Practice

Browse Exams — Mock Exams & Practice Tests