Retrieval miss after embedding deploy
trust 78%LangChain+pgvector
Symptom chain
- Expected chunk ranks below score threshold
- recall@10 drops post deploy
- retrieve span max_score below gate
Remediation
- Reindex with new embedding model hash
- Lower score threshold temporarily with eval gate
- Diff golden-set chunk IDs vs retrieve ranking
Evidence types
retrieve_span, eval_metrics, embedding_version
Explainability
score_threshold in trace · chunk rank position · embedding model tag
Uncertainty: Confirm namespace filter unchanged
Hybrid search regression (alpha=1)
trust 82%Qdrant+LangChain
Symptom chain
- recall@10 −18% after config deploy
- sparse leg cold-start in trace
- p95 latency +12ms
Remediation
- Restore alpha / RRF fusion from last known good
- Warm sparse index post deploy
- Benchmark dense-only vs hybrid on golden set
Evidence types
fusion_config, benchmark_delta, sparse_index_freshness
Explainability
alpha in retrieval config · RRF vs weighted sum mode
Metadata filter failure → empty context
trust 74%Pinecone+LangGraph
Symptom chain
- empty_context=true on retrieve spans
- faithfulness collapse post filter v2
- tenant filter over-constrains namespace
Remediation
- Rollback metadata filter deploy
- Audit filter JSON vs prior version
- Add eval gate on context_recall pre-prod
Evidence types
metadata_filter_diff, faithfulness_eval, trace_empty_context
Explainability
filter predicate in trace · candidate count pre/post filter
Uncertainty: Validate multi-tenant filter keys
Reranker failure / latency regression
trust 71%LangChain+OpenSearch
Symptom chain
- top_k candidates drop after rerank timeout
- cross-encoder batch queue depth spike
- boundary chunks demoted below threshold
Remediation
- Increase rerank timeout / shrink batch size
- Cache rerank scores for hot queries
- A/B reranker off on latency SLO breach
Evidence types
rerank_span, latency_histogram, top_k_before_after
Explainability
reranker model id · timeout events in trace
Eval regression on deploy window
trust 85%LlamaIndex+Weaviate
Symptom chain
- Ragas context_precision below gate
- golden set harness version drift
- faithfulness 0.91 → 0.54
Remediation
- Pin eval harness + dataset version
- Block deploy on faithfulness delta > ε
- Segment failures by query cluster
Evidence types
eval_harness_version, ragas_metrics, deploy_correlation
Explainability
eval gate config · metric deltas with timestamps
Hallucination spike after chunking change
trust 69%LangGraph+pgvector
Symptom chain
- hallucination rate 3.2× baseline
- chunk_overlap=32 drops boundary context
- answer faithfulness below SLO
Remediation
- Restore chunk_overlap ≥ 128 for policy docs
- Re-embed affected doc partition
- Run faithfulness gate on incident window queries
Evidence types
chunking_config, faithfulness_eval, generation_trace
Explainability
chunk_overlap in config · context length in generation span
Uncertainty: Confirm reranker unchanged