Skip to Content
RAGRetrieval Pipeline

Retrieval Pipeline

Before an agent generates a response, it first pulls supporting passages from your library and the literature. The model always sees the retrieved passages along with the question.

Flow

  1. Chunk: split source documents into passages. Each chunk is small enough to embed but long enough to carry meaning (200–500 tokens works well).
  2. Embed: turn each chunk into a vector that encodes what it means.
  3. Index: store the vectors with their metadata (source, date, author, scope).
  4. Retrieve: find the top-k chunks whose vectors are closest to the query vector.
  5. Re-rank: reorder those chunks with a cross-encoder, then pass the winners into the generation prompt as context.

Retrieval Controls

  • Source filters: limit retrieval to a specific library (local, account, public).
  • Date windows: limit to a time range, for example “guidelines from the last five years.”
  • Relevance thresholds: drop chunks below a minimum similarity score so weak matches don’t dilute the context.
Last updated on