Why do enterprise buyers ask about it?

Because vague answers no longer pass security review. Reviewers want to see the actual path data takes, not just provider claims. A clean data path is what lets a team answer those questions consistently.

Does a clean data path require self-hosted models?

No. Self-hosting is one way to keep data inside a controlled environment, but managed model APIs can be part of a clean data path when the layer in front of them masks supported sensitive values, supports BYOK and persists structural metadata only.

What is Privian's data path?

Documented end-to-end at /data-path: prompts are masked into placeholders before they leave, the masked prompt is sent via BYOK to the chosen provider, the response is rehydrated inside the gateway, and the entity map plus decrypted credential are discarded when the request ends. Raw prompt and response bodies are not persisted.

What is a clean AI data path?

Definition

"Clean" is shorthand for legible and reviewable, not for the absence of any third party. A path can include managed providers and still be clean if it can be described accurately.

Per-hop review vocabulary— Five questions answered at every hop in the data path.

Why buyers ask for it

Enterprise security reviews moved on from "does the vendor claim to be secure?" to "show us the path." Reviewers want a diagram or a paragraph per hop, not a marketing one-liner. A clean data path is what makes the answer short and consistent across reviewers.

What buyers mean by "clean"

From real questionnaires, "clean" usually unpacks into:

The data sent at each hop is documented.
Retention is documented per hop, not just at the provider.
The list of sub-processors is reviewable.
The flow is reversible — a layer can be swapped or removed.
Observability is structural; raw bodies are not persisted by default.

An example AI data path

A common shape: application → gateway → managed model provider → gateway → application. At each arrow the question is what crosses it and what stays behind. For Privian specifically, the masked prompt crosses the BYOK boundary; the entity map and decrypted credential do not. The diagram lives at /data-path.

What enters the model vs. what does not

In a clean path you can answer this directly. For a masked gateway: the model sees a prompt with deterministic placeholders in place of supported sensitive values; it does not see the rehydrated form, the entity map, or the BYOK credential after decryption. For a path without masking, the model sees whatever the application sent — knowing this is itself a clean answer, even if it is not the desired one.

Retention and visibility

Retention is per hop. The application may persist transcripts; the gateway may persist usage rollups; the provider has its own retention defaults that vary by product tier. A clean path documents all three rather than collapsing them into a single answer.

Managed models vs. self-hosted

Both can be part of a clean data path. Self-hosting reduces the number of parties and is the right answer for some workloads; managed models with a masking gateway and BYOK is the right answer for many others. Neutral comparison: Managed vs. self-hosted LLMs and the comparison page Privian vs. self-hosted LLMs.

Technical controls that support a clean path

Upstream data minimization in the application.
Prompt-level masking before the prompt leaves.
BYOK so the provider relationship stays inside the org.
Allow-listed models at the gateway.
Sanitized, structural observability — no raw bodies by default.

Common misconceptions

"Clean" means no third party. Not necessarily — it means the third parties are documented.
"Clean" requires self-hosting. No; many clean paths use managed APIs behind a masking gateway.
Provider claims are enough. Provider claims cover the provider hop. The other hops are still on you.

Where Privian fits

Privian is the gateway hop in a clean data path: a privacy-first LLM gateway that masks supported sensitive values, supports BYOK end-to-end, and persists structural metrics rather than raw bodies. It is one layer in the path — see /resources/architecture for the broader picture and /resources/security for the security model.

Written under our editorial principles: implementation-grounded, honest about limitations, educational first.

Frequently asked questions

What is a clean AI data path?: An AI data path where every hop between the application and the model is described and where the description matches the implementation: what is sent, what is retained, who can see it, which sub-processors are involved, and how the flow can be changed.
Why do enterprise buyers ask about it?: Because vague answers no longer pass security review. Reviewers want to see the actual path data takes, not just provider claims. A clean data path is what lets a team answer those questions consistently.
Does a clean data path require self-hosted models?: No. Self-hosting is one way to keep data inside a controlled environment, but managed model APIs can be part of a clean data path when the layer in front of them masks supported sensitive values, supports BYOK and persists structural metadata only.
What is Privian's data path?: Documented end-to-end at /data-path: prompts are masked into placeholders before they leave, the masked prompt is sent via BYOK to the chosen provider, the response is rehydrated inside the gateway, and the entity map plus decrypted credential are discarded when the request ends. Raw prompt and response bodies are not persisted.