Article · PII Masking
How to remove personal data before sending to GPT
Practical strategies for stripping names, emails, account numbers and secrets out of a prompt — and the tradeoffs of each approach.
7 min read · Updated May 20, 2026
Three strategies, ranked
Most "remove the personal data" decisions fall into one of three patterns. Each has a different tradeoff between recall, capability, and integration cost.
1. Strip
Delete every recognized identifier. The model gets the cleanest prompt and the fewest hooks back to a real person. The downside is capability loss: the model can no longer answer "draft a reply to the customer" because it does not know which customer.
Best for: classification, sentiment analysis, summarization of non-identifying content.
2. Mask with deterministic placeholders
Replace each value with a typed token like EMAIL_1 or PERSON_2. The model can still tell which entity is which inside a single prompt, and your application can rehydrate the response with the real values.
Best for: chat, drafting, support automation, anywhere the response needs to feel personal.
3. Hash or tokenize
Replace values with stable hashes that survive across requests. This is useful when you need cross-request identity (e.g. for analytics) but it introduces a persistence problem: anything stable is eventually linkable.
Best for: aggregate analysis, not real-time chat. Privian deliberately does not do this in the gateway — see Zero retention.
What good detection looks like
Detection is the hard part. A serious implementation usually combines:
- Regex for structured data — emails, phone numbers, IBANs, credit cards (with Luhn checks), known secret patterns.
- NER (named entity recognition) for free-text names, locations and organizations that regex cannot catch.
- Validators to reduce false positives — a Luhn-valid 16-digit number is much more likely to be a card.
- A fallback path for ambiguous cases so the system fails closed.
A working example
# Before: prompt that exposes a customer Summarize the email from jane@example.com about IBAN DE89370400440532013000 and the failed payment. # After: masked by the gateway, sent to the provider Summarize the email from EMAIL_1 about IBAN IBAN_1 and the failed payment. # Response from the provider (still uses tokens) EMAIL_1 reported a failed payment on IBAN_1 and is asking for a refund. # Rehydrated response returned to your app jane@example.com reported a failed payment on DE89370400440532013000 and is asking for a refund.
How Privian fits
Privian implements pattern #2 by default: deterministic masking with in-memory rehydration. The gateway handles detection, substitution, forwarding, and the return path. You change a base URL; the policy travels with the request. See the PII Masking page for the supported entity types and validators.
Try Privian during beta
Protect prompts before they reach GPT, Claude and other models.
BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.
FAQ
Frequently asked questions
- Should I strip personal data, or mask it?
- It depends on the workload. Stripping is safer but lossy — the model has less context. Masking with deterministic placeholders preserves identity references and is the right default for most chat and summarization flows.
- Can I rely on regex alone?
- Regex catches structured data (emails, IBANs, card numbers) cleanly. It struggles with unstructured names, addresses, and locations. A combined approach — regex for structure, an NER model for free text — is what production systems use.
- Where should the stripping happen?
- At a gateway, not inside individual services. Centralizing the policy is the only way to keep it consistent across teams.
More articles
Continue reading
PII Masking
PII masking for LLMs
What PII masking means in the context of LLM applications, how deterministic placeholders work, and how rehydration restores responses on the way back.
PII Masking
PII redaction vs. PII masking
Redaction destroys data. Masking preserves structure. The choice changes what the model can do — and whether the response is usable.
PII Masking
How to prevent GPT from seeing customer data
A defense-in-depth approach to keeping customer identifiers, contact info and account data away from third-party model providers.