Article · AI Privacy
The clean data path framework
A reusable framework for describing — and defending — an AI data path: what enters, what leaves, what is retained, who can see it, and what is deleted. The vocabulary enterprise reviewers are starting to expect.
"What is your data path?" is the question every serious enterprise AI review eventually arrives at. Most vendors do not have a crisp answer. The teams that do tend to win the review.
This article proposes a small, reusable framework — five questions, applied per hop — that turns a vague conversation into a defensible one. It is not new in its parts; it is just an explicit application of basic data-flow discipline to AI features. Privian uses it to describe its own architecture, but the framework belongs to whoever finds it useful.
The framework
For every hop in an AI data path, answer five questions:
Framework
Clean Data Path Framework
- 01
What enters?
The exact shape of the payload arriving at this hop.
- 02
What leaves?
The exact shape of the payload forwarded to the next hop.
- 03
What is retained?
What this hop stores, in what form, with what retention window.
- 04
Who can see it?
Which roles — human or system — can read which of the above.
- 05
What is deleted?
What disappears, automatically or on request, and how soon.
A "clean" data path is one where every hop has a short, defensible answer to each question. Cleanliness is a property of clarity, not of any particular technology.
A clean AI data path is often more persuasive in a security review than any single AI security claim.
Why this framework
Three properties make it work:
- It is composable. A path with five hops has twenty-five answers. They can be written, reviewed and versioned.
- It surfaces ambiguity. Vendors who cannot answer "what is retained at hop 3?" cleanly almost always discover they have not made the decision.
- It is provider-agnostic. The same vocabulary works whether you use OpenAI, Anthropic, a self-hosted model or a routing gateway.
Worked example — a basic AI feature
Consider a SaaS support-AI feature: agent pastes a ticket → app composes a prompt → gateway masks PII → provider answers → gateway rehydrates → app displays draft to agent.
Hop 1 — Agent browser
Enters: free text the agent typed/pasted
Leaves: same text, over TLS, to the app server
Retained: nothing (browser memory)
Visible: the agent
Deleted: on tab close
Hop 2 — App server
Enters: ticket text
Leaves: composed prompt to gateway
Retained: ticket text in the support DB (existing retention policy)
Visible: the application's own roles
Deleted: per existing customer-data retention
Hop 3 — Privian gateway
Enters: prompt (may contain PII)
Leaves: masked prompt (PII replaced with placeholders)
Retained: nothing raw — sanitized observability counters only
Visible: no humans on the request hot path
Deleted: token map discarded with the request
Hop 4 — LLM provider (via BYOK)
Enters: masked prompt
Leaves: raw response (may reference placeholders)
Retained: per provider tier; documented in vendor's subprocessor list
Visible: per provider's published controls
Deleted: per provider's retention window
Hop 5 — Privian gateway (return)
Enters: raw response with placeholders
Leaves: rehydrated response to app server
Retained: nothing raw
Visible: no humans on the request hot path
Deleted: immediately after return
Hop 6 — App server (return)
Enters: rehydrated draft
Leaves: draft to the agent's browser
Retained: per app's own product policy
Visible: the agent
Deleted: per app's own policyHow to use it in a security review
When a reviewer asks "what does your AI feature do with our data?", paste the table for your own data path. Almost every follow-up question they would have asked becomes self-answering. The remaining questions are usually about a specific hop, and they are answerable with one row.
This is the same dynamic that modern AI security reviews increasingly reward: clarity beats sophistication.
What the framework deliberately does not do
- It does not certify anything. It is a description, not a compliance regime.
- It does not specify any particular masking technique or provider choice — those are decisions the path documents, not the framework imposes.
- It does not protect against your own application's misuse of data downstream of the AI feature.
Where Privian sits
Privian occupies a single, deliberately narrow hop: take a prompt in, mask supported sensitive values, route to the provider via BYOK, rehydrate the response, retain nothing raw. The full hop is documented on the data path page and the architecture page.
Used as a hop inside this framework, Privian replaces the often hand-waved "and then we send it to the LLM" with five sentences that hold up in a review. That is the entire promise.
Written under our editorial principles: implementation-grounded, honest about limitations, educational first.
Try Privian during beta
Protect prompts before they reach GPT, Claude and other models.
BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.
FAQ
Frequently asked questions
- What is a 'data path' in AI?
- The ordered sequence of components a prompt passes through from the user, through your application, through any gateway or proxy, into the model, and back. A clean data path is one where each step has a clear, defensible answer to what it sees, what it stores and who can read it.
- What five questions does the framework answer?
- What enters the path. What leaves it. What is retained at each hop. Who can see what. What is deleted, and when. Almost every enterprise AI privacy review reduces to some combination of these five.
- Is this a compliance framework?
- No. It is a description framework — a vocabulary for honestly explaining an AI data path to a reviewer. Compliance regimes (GDPR, SOC 2, HIPAA) sit on top of it but do not replace the underlying clarity.
- Where does Privian sit in this framework?
- Privian is a single, well-defined hop in the path: masking before egress, rehydration on the response, zero raw retention, BYOK for the provider relationship. The framework lets you describe that hop precisely instead of waving at it.
More articles
Continue reading
AI Privacy
GDPR and LLMs, explained
What GDPR means for teams using GPT, Claude and other managed LLMs — personal data in prompts, provider boundaries, retention, and the technical controls teams adopt in practice.
AI Privacy
How to reduce sensitive data in LLM prompts
A practical guide for shrinking the sensitive-data footprint of summarization, drafting, support and copilot prompts — with realistic before/after examples and honest limitations.
AI Privacy
BYOK for privacy-sensitive AI
Bring-your-own-key explained for teams with privacy and procurement requirements: what BYOK changes about billing, provider boundaries and trust — and what it does not solve.