Article · PII Masking

How to prevent GPT from seeing customer data

A defense-in-depth approach to keeping customer identifiers, contact info and account data away from third-party model providers.

7 min read · Updated May 20, 2026

Where customer data leaks into prompts

Three patterns account for almost every accidental exposure:

  • A copilot wraps a customer record verbatim and sends it as context
  • A support tool forwards the customer's raw message into a summarization prompt
  • A debugging trace includes user identifiers and gets pasted into a chat

None of these are reckless. They are the natural shape of an LLM feature being built quickly. The control has to live somewhere that can catch all three.

Defense in depth

No single control is enough. The pattern that works:

  1. Minimize before you build the prompt. Pull only the fields the prompt actually needs.
  2. Mask at the gateway. Detection and substitution happen between your app and the provider, on every call.
  3. Avoid persisting raw prompts. Observability should record metadata, not bodies. See Zero retention.
  4. Limit who can call the gateway. Treat the gateway as production infrastructure and gate access accordingly.
  5. Plan for rotation. If a leak happens, you want a clear runbook.

What "the provider never saw it" really means

If a prompt arrives at the provider already masked, the provider cannot see the original value. They cannot log it, they cannot train on it, they cannot expose it in a breach. The model still produces a useful response — it just operates on placeholders that your gateway rehydrates on the return path.

What this does not solve

Masking is one control. It does not stop a buggy application from rendering the rehydrated response to the wrong tenant. It does not stop a system prompt from leaking instructions. It is the single biggest reduction in third-party exposure you can make without changing the model.

How Privian fits

Privian sits between your application and the provider. Every prompt that goes through POST /v1/gateway is masked before it leaves your perimeter, and every response is rehydrated before it reaches your code. See Customer Support AI for a worked example.

Try Privian during beta

Protect prompts before they reach GPT, Claude and other models.

BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.

FAQ

Frequently asked questions

Is enterprise mode from the provider enough?
It helps — providers that promise zero retention or no training on your data reduce one risk. It does not change the fact that customer data still travels to that provider. Masking is complementary, not redundant.
Does this break personalized responses?
No. Because the gateway rehydrates placeholders on the return path, your end user sees a response that reads as if the model had access to the data the whole time.
What if the provider gets breached?
If the provider only ever saw EMAIL_1 instead of jane@example.com, the blast radius of a provider-side incident is dramatically smaller.