Operational incident library

Anonymized patterns for retrieval debugging — symptom chains, stack fingerprints, remediation, and trust/explainability envelopes. No customer PII.

Your investigations →RAG failure debugger →

Book technical walkthrough Apply for pilot Request architecture review

Retrieval miss after embedding deploy

trust 78%

LangChain+pgvector

Symptom chain

Expected chunk ranks below score threshold
recall@10 drops post deploy
retrieve span max_score below gate

Remediation

Reindex with new embedding model hash
Lower score threshold temporarily with eval gate
Diff golden-set chunk IDs vs retrieve ranking

Evidence types

retrieve_span, eval_metrics, embedding_version

Explainability

score_threshold in trace · chunk rank position · embedding model tag

Uncertainty: Confirm namespace filter unchanged

Hybrid search regression (alpha=1)

trust 82%

Qdrant+LangChain

Symptom chain

recall@10 −18% after config deploy
sparse leg cold-start in trace
p95 latency +12ms

Remediation

Restore alpha / RRF fusion from last known good
Warm sparse index post deploy
Benchmark dense-only vs hybrid on golden set

Evidence types

fusion_config, benchmark_delta, sparse_index_freshness

Explainability

alpha in retrieval config · RRF vs weighted sum mode

Metadata filter failure → empty context

trust 74%

Pinecone+LangGraph

Symptom chain

empty_context=true on retrieve spans
faithfulness collapse post filter v2
tenant filter over-constrains namespace

Remediation

Rollback metadata filter deploy
Audit filter JSON vs prior version
Add eval gate on context_recall pre-prod

Evidence types

metadata_filter_diff, faithfulness_eval, trace_empty_context

Explainability

filter predicate in trace · candidate count pre/post filter

Uncertainty: Validate multi-tenant filter keys

Reranker failure / latency regression

trust 71%

LangChain+OpenSearch

Symptom chain

top_k candidates drop after rerank timeout
cross-encoder batch queue depth spike
boundary chunks demoted below threshold

Remediation

Increase rerank timeout / shrink batch size
Cache rerank scores for hot queries
A/B reranker off on latency SLO breach

Evidence types

rerank_span, latency_histogram, top_k_before_after

Explainability

reranker model id · timeout events in trace

Eval regression on deploy window

trust 85%

LlamaIndex+Weaviate

Symptom chain

Ragas context_precision below gate
golden set harness version drift
faithfulness 0.91 → 0.54

Remediation

Pin eval harness + dataset version
Block deploy on faithfulness delta > ε
Segment failures by query cluster

Evidence types

eval_harness_version, ragas_metrics, deploy_correlation

Explainability

eval gate config · metric deltas with timestamps

Hallucination spike after chunking change

trust 69%

LangGraph+pgvector

Symptom chain

hallucination rate 3.2× baseline
chunk_overlap=32 drops boundary context
answer faithfulness below SLO

Remediation

Restore chunk_overlap ≥ 128 for policy docs
Re-embed affected doc partition
Run faithfulness gate on incident window queries

Evidence types

chunking_config, faithfulness_eval, generation_trace

Explainability

chunk_overlap in config · context length in generation span

Uncertainty: Confirm reranker unchanged