Article · PII Masking
PII masking for LLMs
What PII masking means in the context of LLM applications, how deterministic placeholders work, and how rehydration restores responses on the way back.
8 min read · Updated May 20, 2026
The problem
LLM applications routinely send customer data to third-party model providers. Even with TLS in transit and reasonable retention policies, the model sees the raw value — names, emails, account numbers — and that data is now in someone else's system. PII masking removes the data from the prompt without removing the ability to reason about it.
Definition
PII masking for LLMs is the practice of replacing personal data in a prompt with deterministic placeholders before forwarding to the model, then restoring the original values in the response on the way back. The model never sees the raw value; your application gets a usable response that references real identities.
How deterministic placeholders work
A consistent placeholder scheme is the key trick. Privian uses {TYPE}_{N} — for example EMAIL_1, PERSON_2. Within a single request, the same value always maps to the same placeholder. That lets the model preserve identity references across a long prompt without ever seeing the underlying string.
Input: "Email jane@example.com and let her know about ticket #4821." Masked (sent to provider): "Email EMAIL_1 and let her know about ticket #4821." Response (from provider): "I drafted an email to EMAIL_1 about ticket #4821." Rehydrated (returned to your app): "I drafted an email to jane@example.com about ticket #4821."
Why deterministic placeholders beat generic redaction
Replacing every personal value with [REDACTED] works for legal disclosure but breaks LLM workloads. The model loses the ability to tell which entity is which. A masking scheme that preserves type and identity references keeps the model useful while still hiding the data.
What Privian detects today
Privian's beta detects more than 15 entity types, including:
- People —
PERSON - Contact info —
EMAIL,PHONE_NUMBER,URL,IP_ADDRESS - Financial —
CREDIT_CARD(Luhn-validated),IBAN,SWIFT - Locations and dates —
LOCATION,DATE_OF_BIRTH - Developer secrets —
OPENAI_API_KEY,AWS_ACCESS_KEY_ID,JWT,SLACK_TOKEN
The full catalog and example transformations live on the PII Masking product page.
Tradeoffs
- Recall. No detector catches 100% of personal data, especially in unstructured text. Combine masking with upstream data minimization.
- Determinism inside a request, not across requests.
EMAIL_1in request A and request B may refer to different values. This is intentional — keeping mappings across requests would require persistence we explicitly avoid. - Performance. Detection adds a few milliseconds. That cost is usually invisible next to the model's own latency.
How Privian fits
Send a prompt to POST /v1/gateway; the gateway applies masking, forwards the masked prompt, and rehydrates the response before returning it. No raw values are persisted. See Rehydration for the return-path detail.
Try Privian during beta
Protect prompts before they reach GPT, Claude and other models.
BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.
FAQ
Frequently asked questions
- What is a deterministic placeholder?
- A token in the form {TYPE}_{N} (for example EMAIL_1, PERSON_2) that replaces a real value. The same value in the same request gets the same placeholder, which lets the model reason about identity without seeing the underlying string.
- Does the mapping ever leave the gateway?
- No. The mapping is held in memory for the lifetime of a single request and discarded after the response is rehydrated. It is not written to logs, not stored in a database, and not shared between requests.
- Will masking confuse the model?
- For most chat, summarization, and classification workloads the model handles placeholders well — it can still reason about 'EMAIL_1 sent a complaint about PERSON_1's order' even without the raw values. For tasks that require the real string (e.g. lookups), keep that work outside the model.
- How is this different from redaction?
- Redaction usually replaces sensitive values with a generic marker like [REDACTED], destroying structure. Masking preserves type and identity references so the model and your application can both still work with the output.
More articles
Continue reading
PII Masking
How to remove personal data before sending to GPT
Practical strategies for stripping names, emails, account numbers and secrets out of a prompt — and the tradeoffs of each approach.
PII Masking
PII redaction vs. PII masking
Redaction destroys data. Masking preserves structure. The choice changes what the model can do — and whether the response is usable.
PII Masking
How to prevent GPT from seeing customer data
A defense-in-depth approach to keeping customer identifiers, contact info and account data away from third-party model providers.