GENAI-ASSOC Cheatsheet — RAG, Vector Search, Evaluation & Deployment on Databricks

Last-mile GENAI-ASSOC review: embeddings and chunking, vector search relevance, RAG prompt patterns, evaluation loops, and production trade-offs (cost, latency, governance).

Use this for last‑mile review. Pair it with Syllabus and practice drills.


1) The RAG pipeline (the canonical mental model)

    flowchart LR
	  DOC["Docs"] --> CH["Chunk + clean"]
	  CH --> EMB["Embeddings"]
	  EMB --> IDX["Vector index"]
	  Q["User query"] --> QEMB["Query embedding"]
	  QEMB --> RET["Retrieve top-k"]
	  RET --> PROMPT["Prompt with context"]
	  PROMPT --> LLM["LLM"]
	  LLM --> OUT["Answer + citations"]

Rule: good RAG is mostly data + retrieval quality, not clever prompts.


2) Chunking and embeddings (high-yield pickers)

DecisionTrade-offRule of thumb
Chunk sizerecall vs precisionchunks should fit the model context with room for instructions
Overlapredundancy vs costsmall overlap helps continuity
Metadatafiltering and securitystore source, date, tenant, access tags

3) Retrieval relevance (why results look wrong)

Common causes:

  • poor chunking (too big/too small)
  • missing metadata filters (wrong tenant/version)
  • query mismatch (user question needs reformulation)

4) Evaluation loop (production-safe approach)

What to testExamples
Retrieval qualitytop-k hit rate, groundedness
Answer qualitycorrectness, citation quality
Safetyleakage, prompt injection resilience
Regressionkeep a fixed eval set

5) Cost/latency controls (exam-friendly)

  • Cache embeddings and reuse indexes.
  • Use metadata filters to reduce candidate set.
  • Limit top-k and context length intentionally.
  • Monitor token usage and tail latency.