Zou et al. (2024), "PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models" (arXiv:2402.07867), demonstrate that inserting as few as five malicious passages into a corpus of 250,000 documents is enough to manipulate the answer on more than 90% of targeted queries. The attack does not require model access, training-data access, or any privileged position. It requires the ability to write into the index — which, in practice, every ingest pipeline grants by default.
The reason this is worse than a SQL injection is geometry. A poisoned fragment does not just answer the exact query the attacker wrote; it sits in a semantic neighborhood, and every query whose embedding falls inside that neighborhood retrieves it. Carlini et al. (2023), "Poisoning Web-Scale Training Datasets is Practical" (arXiv:2302.10149), made the same argument for training corpora; PoisonedRAG ports it to inference time, where the blast radius is wider because the defenses are thinner.
Then it metastasizes. A poisoned answer becomes a logged interaction. The logged interaction becomes a training example for the next fine-tune, or a citation in a downstream document, or a quoted snippet in an automated report that is itself re-ingested next quarter. Shumailov et al. (Nature, 2024), "AI Models Collapse When Trained on Recursively Generated Data," show that the recursive loop is not theoretical: outputs leak back into inputs, and the distribution drifts in ways the original poisoner never had to plan for.
The economics are unforgiving. Publicly reported RAG security audits in 2024–25 describe incident-response on a confirmed corpus poisoning as weeks of engineering time: identification, blast-radius mapping, snapshot rollback, customer notification, and the durable provenance work that should have been there from the start. The poisoner spent five rows.
Detection is hard because the signal is faint. Embedding-space anomaly detection catches obvious outliers but misses crafted passages that sit in the cluster they intend to bias. The honest controls are upstream: signed provenance on every ingest, an out-of-band record of retrieval-to-citation joins, and an undo window long enough to remediate after the regression appears in user behaviour rather than at write time. The cryptographic plumbing is the same hash-chain pattern audit logs use — see Certificate Transparency for the canonical reference.
No single tool prevents this. Provenance tooling (Sigstore, in-toto), retrieval observability layers (Ragas, TruLens, Buzo), and vector-DB snapshot features (Pinecone Backups, Qdrant Cloud restore-points) each cover a piece of the surface; deploying any one of them in isolation is theatre. The durable answer is hygiene — knowing what entered your index, why, when, and being able to walk every claim back to its source. The teams that survive a poisoning incident are the ones who can prove what they did not write into their corpus, not the ones who can prove what they did.
