GENAI-ASSOC Cheatsheet — RAG, Vector Search, Evaluation & Deployment on Databricks

Last-mile GENAI-ASSOC review: embeddings and chunking, vector search relevance, RAG prompt patterns, evaluation loops, and production trade-offs (cost, latency, governance).

Use this for last‑mile review. Pair it with Syllabus and practice drills.


1) The RAG pipeline (the canonical mental model)

    flowchart LR
	  DOC["Docs"] --> CH["Chunk + clean"]
	  CH --> EMB["Embeddings"]
	  EMB --> IDX["Vector index"]
	  Q["User query"] --> QEMB["Query embedding"]
	  QEMB --> RET["Retrieve top-k"]
	  RET --> PROMPT["Prompt with context"]
	  PROMPT --> LLM["LLM"]
	  LLM --> OUT["Answer + citations"]

Rule: good RAG is mostly data + retrieval quality, not clever prompts.


2) Chunking and embeddings (high-yield pickers)

Decision Trade-off Rule of thumb
Chunk size recall vs precision chunks should fit the model context with room for instructions
Overlap redundancy vs cost small overlap helps continuity
Metadata filtering and security store source, date, tenant, access tags

3) Retrieval relevance (why results look wrong)

Common causes:

  • poor chunking (too big/too small)
  • missing metadata filters (wrong tenant/version)
  • query mismatch (user question needs reformulation)

4) Evaluation loop (production-safe approach)

What to test Examples
Retrieval quality top-k hit rate, groundedness
Answer quality correctness, citation quality
Safety leakage, prompt injection resilience
Regression keep a fixed eval set

5) Cost/latency controls (exam-friendly)

  • Cache embeddings and reuse indexes.
  • Use metadata filters to reduce candidate set.
  • Limit top-k and context length intentionally.
  • Monitor token usage and tail latency.