Retrieval Pipeline
Before an agent generates a response, it first pulls supporting passages from your library and the literature. The model always sees the retrieved passages along with the question.
Flow
- Chunk: split source documents into passages. Each chunk is small enough to embed but long enough to carry meaning (200–500 tokens works well).
- Embed: turn each chunk into a vector that encodes what it means.
- Index: store the vectors with their metadata (source, date, author, scope).
- Retrieve: find the top-k chunks whose vectors are closest to the query vector.
- Re-rank: reorder those chunks with a cross-encoder, then pass the winners into the generation prompt as context.
Retrieval Controls
- Source filters: limit retrieval to a specific library (local, account, public).
- Date windows: limit to a time range, for example “guidelines from the last five years.”
- Relevance thresholds: drop chunks below a minimum similarity score so weak matches don’t dilute the context.
Last updated on