Episode Details
Back to Episodes
Source-Restricted vs. Open Retrieval: How to Lock Down Your LLM
Episode 3751
Published 3 days, 23 hours ago
Description
The terminology around "closed corpus" and "open world" LLM retrieval is a mess — and the stakes are high. A model blending your documents with its own training data can turn a contract review into a malpractice suit. This episode unpacks the real distinction: source-restricted versus open retrieval, and why it's a per-generation decision, not an application-level toggle. We walk through concrete cases in legal, medical, and compliance work, then survey how LangGraph, LlamaIndex, and the Anthropic and OpenAI SDKs handle — or fail to handle — the constraint. The verdict: there's no clean primitive, just system prompts and tool lists. Here's how to wire it up without getting burned.