Article · AI Privacy

The clean data path framework

A reusable framework for describing — and defending — an AI data path: what enters, what leaves, what is retained, who can see it, and what is deleted. The vocabulary enterprise reviewers are starting to expect.

By Privian TeamUpdated June 6, 202610 min read

"What is your data path?" is the question every serious enterprise AI review eventually arrives at. Most vendors do not have a crisp answer. The teams that do tend to win the review.

This article proposes a small, reusable framework — five questions, applied per hop — that turns a vague conversation into a defensible one. It is not new in its parts; it is just an explicit application of basic data-flow discipline to AI features. Privian uses it to describe its own architecture, but the framework belongs to whoever finds it useful.

The framework

For every hop in an AI data path, answer five questions:

Framework

Clean Data Path Framework

  1. 01

    What enters?

    The exact shape of the payload arriving at this hop.

  2. 02

    What leaves?

    The exact shape of the payload forwarded to the next hop.

  3. 03

    What is retained?

    What this hop stores, in what form, with what retention window.

  4. 04

    Who can see it?

    Which roles — human or system — can read which of the above.

  5. 05

    What is deleted?

    What disappears, automatically or on request, and how soon.

The clean data path framework: every hop in an AI data path is described by what enters, what leaves, what is retained, who can see it, and what is deleted.01Enters?02Leaves?03Retained?04Visible?05Deleted?
Per-hop review vocabularyFive questions answered at every hop in the data path.

A "clean" data path is one where every hop has a short, defensible answer to each question. Cleanliness is a property of clarity, not of any particular technology.

A clean AI data path is often more persuasive in a security review than any single AI security claim.

Why this framework

Three properties make it work:

  • It is composable. A path with five hops has twenty-five answers. They can be written, reviewed and versioned.
  • It surfaces ambiguity. Vendors who cannot answer "what is retained at hop 3?" cleanly almost always discover they have not made the decision.
  • It is provider-agnostic. The same vocabulary works whether you use OpenAI, Anthropic, a self-hosted model or a routing gateway.

Worked example — a basic AI feature

Consider a SaaS support-AI feature: agent pastes a ticket → app composes a prompt → gateway masks PII → provider answers → gateway rehydrates → app displays draft to agent.

Hop 1 — Agent browser
  Enters:  free text the agent typed/pasted
  Leaves:  same text, over TLS, to the app server
  Retained: nothing (browser memory)
  Visible:  the agent
  Deleted:  on tab close

Hop 2 — App server
  Enters:  ticket text
  Leaves:  composed prompt to gateway
  Retained: ticket text in the support DB (existing retention policy)
  Visible:  the application's own roles
  Deleted:  per existing customer-data retention

Hop 3 — Privian gateway
  Enters:  prompt (may contain PII)
  Leaves:  masked prompt (PII replaced with placeholders)
  Retained: nothing raw — sanitized observability counters only
  Visible:  no humans on the request hot path
  Deleted:  token map discarded with the request

Hop 4 — LLM provider (via BYOK)
  Enters:  masked prompt
  Leaves:  raw response (may reference placeholders)
  Retained: per provider tier; documented in vendor's subprocessor list
  Visible:  per provider's published controls
  Deleted:  per provider's retention window

Hop 5 — Privian gateway (return)
  Enters:  raw response with placeholders
  Leaves:  rehydrated response to app server
  Retained: nothing raw
  Visible:  no humans on the request hot path
  Deleted:  immediately after return

Hop 6 — App server (return)
  Enters:  rehydrated draft
  Leaves:  draft to the agent's browser
  Retained: per app's own product policy
  Visible:  the agent
  Deleted:  per app's own policy

How to use it in a security review

When a reviewer asks "what does your AI feature do with our data?", paste the table for your own data path. Almost every follow-up question they would have asked becomes self-answering. The remaining questions are usually about a specific hop, and they are answerable with one row.

This is the same dynamic that modern AI security reviews increasingly reward: clarity beats sophistication.

What the framework deliberately does not do

  • It does not certify anything. It is a description, not a compliance regime.
  • It does not specify any particular masking technique or provider choice — those are decisions the path documents, not the framework imposes.
  • It does not protect against your own application's misuse of data downstream of the AI feature.

Where Privian sits

Privian occupies a single, deliberately narrow hop: take a prompt in, mask supported sensitive values, route to the provider via BYOK, rehydrate the response, retain nothing raw. The full hop is documented on the data path page and the architecture page.

Used as a hop inside this framework, Privian replaces the often hand-waved "and then we send it to the LLM" with five sentences that hold up in a review. That is the entire promise.

Written under our editorial principles: implementation-grounded, honest about limitations, educational first.

Try Privian during beta

Protect prompts before they reach GPT, Claude and other models.

BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.

FAQ

Frequently asked questions

What is a 'data path' in AI?
The ordered sequence of components a prompt passes through from the user, through your application, through any gateway or proxy, into the model, and back. A clean data path is one where each step has a clear, defensible answer to what it sees, what it stores and who can read it.
What five questions does the framework answer?
What enters the path. What leaves it. What is retained at each hop. Who can see what. What is deleted, and when. Almost every enterprise AI privacy review reduces to some combination of these five.
Is this a compliance framework?
No. It is a description framework — a vocabulary for honestly explaining an AI data path to a reviewer. Compliance regimes (GDPR, SOC 2, HIPAA) sit on top of it but do not replace the underlying clarity.
Where does Privian sit in this framework?
Privian is a single, well-defined hop in the path: masking before egress, rehydration on the response, zero raw retention, BYOK for the provider relationship. The framework lets you describe that hop precisely instead of waving at it.