Retrieval-augmented generation (RAG) grounds a language model on retrieved documents at query time. The clearest expert explanations walk through ingestion, chunking, embeddings, retrieval, and generation — not just model prompts.
There are a few metrics, but the most important one for us is “Recall.” Basically, for a given question, there is at least one required fact. If the retrieval step of the application found at least one context for every required fact, we mark that for a set of questions.
•Picking an embedding model that mismatches domain vocabulary without offline recall checks.
Implementation tradeoffs
•Reranking: Cross-encoder or LLM rerankers improve top-k quality at higher latency and inference cost.
•Knowledge updates: RAG re-index cadence vs fine-tune retrain cycles when policies or product facts change frequently.
•Regression testing: Fine-tune releases need behavior suites on fixed prompts; RAG releases need recall suites on labeled questions — teams often test only one.
Themes repeated across indexed engineering talks and practitioner writeups — not a survey, vote count, or attributed quote roundup.