yts-analytics:page_view yts-analytics:search_performed yts-analytics:clip_click yts-analytics:email_signup yts-analytics:api_cta_click yts-analytics:related_page_click

Engineering comparison · index boundaries vs post retrieval ranking

Chunking vs reranking — index boundaries vs post-retrieval ordering

← All comparisonsRAG topic hub

Core question

Should I fix chunking or add a reranker first?

Short answer

Chunking determines which text exists in the index at all. Reranking only reorders candidates already retrieved. If required facts never appear in any chunk, reranking cannot recover them.

Decision rule

Measure recall on required facts per question set first. If recall is low, change chunk size, overlap, and metadata boundaries before adding reranker latency and cost.

Architecture differences

  • Chunking runs at ingest time and defines embedding inputs; reranking runs at query time on top-k candidates.
  • Rerankers cannot surface text that was never chunked into the index.

Choose Chunking

How documents are split, overlapped, and tagged before embedding — defines the searchable units.

  • Recall@k is low because passages are too large, too small, or split mid-thought.
  • You are still designing ingestion and have no stable baseline metrics.
  • Required facts are missing from every retrieved candidate in eval traces.

Choose Reranking

A second stage that scores and reorders top-k hits from the first retrieval pass before generation.

  • Recall is acceptable but precision at top ranks is noisy.
  • You can afford extra latency per query for a cross-encoder or rerank model.
  • Top-10 hits contain the answer but the wrong passage ranks first.

Where people confuse them

  • Adding a reranker when required facts are absent from every candidate chunk.
  • Tuning overlap while tables are still split mid-row.

What experts agree on

Shared ground practitioners cite before choosing sides in this comparison.

  • Both affect which text the generator sees — chunking upstream, reranking immediately before the prompt.
  • Both should be logged in eval traces to debug failures.
  • Retrieval quality dominates many production failures; fixing prompts alone rarely fixes wrong or missing chunks.
  • Chunking, embedding model choice, and metadata boundaries materially affect what the model can see.

What experts disagree on

Open engineering debates — compare indexed explanations before you commit to an architecture.

  • Semantic chunking versus fixed token windows for technical docs.

    Semantic chunking versus fixed token windows for technical docs.

  • Whether reranking belongs in-line or only in offline eval tooling.

    Whether reranking belongs in-line or only in offline eval tooling.

Common mistakes

  • Buying reranker SaaS before baseline recall metrics exist.
  • Using fixed token windows on API reference docs with code blocks.
  • Rerankers recover answers missing from all chunks.
  • Larger chunks always improve retrieval.
  • Tuning rerankers while answers never appear in any chunk.
  • Huge chunks that bury the single sentence containing the fact.

Implementation tradeoffs

  • Chunk changes require re-embed and re-index jobs; reranker deploys add latency and GPU cost per query.
  • Chunk bugs are fixed in ETL; rerank bugs are fixed in model serving.
  • Smaller chunks increase index size and query fan-out; rerankers scale with top-k size and cross-encoder throughput.
  • Huge chunks dilute embeddings and inflate generation token cost downstream.
  • Establish recall on required facts before reranker experiments — measure MRR shift only after recall plateaus.
  • Log both retrieved and reranked spans in eval traces.

Themes repeated across indexed engineering talks and practitioner writeups — not a survey, vote count, or attributed quote roundup.

Example use cases

  • API docs with split endpoints → structure-aware chunking.
  • Noisy top-10 support tickets → reranker before generation.

Related engineering concepts

  • Chunking explained
  • Retrieval evaluation
  • RAG hallucination causes

Best expert explanation

Best expert explanation

mprove the final quality of your results using

Chosen for clarity and how directly it answers the question — not for views or hype.

"We're going to cover how to use Cohere's reranker to improve the final quality of your results"

Weights & Biases · Chunking and embedding tradeoffs · 0:23

Start with the clearest explanation

Opens a little earlier so you catch the setup

Open clip on YouTube
Share this moment

Share formats

Supporting explanations

Best expert explanation

relevant chunks from your vector database

"You're not actually returning the relevant chunks from your vector database — you're not going to be able to answer the question"

AI Engineer · Chunking and embedding tradeoffs · 3:15

Open this explanation

Opens a little earlier so you catch the setup

Open clip on YouTubeMoment page
Share this moment

Share formats

Best expert explanation

add the Cohere API

"Cohere's embed rerank and chat APIs. When initializing our Weaviate client below we simply add the Cohere API key"

Weights & Biases · Chunking and embedding tradeoffs · 1:50

Open this explanation

Opens a little earlier so you catch the setup

Open clip on YouTubeMoment page
Share this moment

Share formats

Best expert explanation

use Cohere's

"functionality. And finally we're going to cover how to use Cohere's reranker to improve the"

Weights & Biases · Chunking and embedding tradeoffs · 0:21

Open this explanation

Opens a little earlier so you catch the setup

Open clip on YouTubeMoment page
Share this moment

Share formats

Build a RAG investigation

Save expert explanations into one investigation, compare voices, and export a shareable research brief on this device.

Related expert search queries

Continue learning

Authority pages for this decision

Continue with the product

Weekly digest of new expert moments

Programmatic access (waitlist)

Curated engineering collections

Browse hand-picked RAG and retrieval moments — same indexed corpus, organized for deep dives.

Open RAG explanation collection →

Save clips to an investigation

Build a private notebook of timestamped moments while comparing RAG architecture choices.

FAQ

  • Which lever has more impact?

    Most practitioners fix chunk coverage and measure recall before investing in rerankers — reranking optimizes order, not missing text.