What is a deterministic placeholder?

A token in the form {TYPE}_{N} (for example EMAIL_1, PERSON_2) that replaces a real value. The same value in the same request gets the same placeholder, which lets the model reason about identity without seeing the underlying string.

Does the mapping ever leave the gateway?

No. The mapping is held in memory for the lifetime of a single request and discarded after the response is rehydrated. It is not written to logs, not stored in a database, and not shared between requests.

Will masking confuse the model?

For most chat, summarization, and classification workloads the model handles placeholders well — it can still reason about 'EMAIL_1 sent a complaint about PERSON_1's order' even without the raw values. For tasks that require the real string (e.g. lookups), keep that work outside the model.

How is this different from redaction?

Redaction usually replaces sensitive values with a generic marker like [REDACTED], destroying structure. Masking preserves type and identity references so the model and your application can both still work with the output.

PII masking for LLMs

The problem

LLM applications routinely send customer data to third-party model providers. Even with TLS in transit and reasonable retention policies, the model sees the raw value — names, emails, account numbers — and that data is now in someone else's system. PII masking removes the data from the prompt without removing the ability to reason about it.

Definition

PII masking for LLMs is the practice of replacing personal data in a prompt with deterministic placeholders before forwarding to the model, then restoring the original values in the response on the way back. The model never sees the raw value; your application gets a usable response that references real identities.

How deterministic placeholders work

A consistent placeholder scheme is the key trick. Privian uses {TYPE}_{N} — for example EMAIL_1, PERSON_2. Within a single request, the same value always maps to the same placeholder. That lets the model preserve identity references across a long prompt without ever seeing the underlying string.

Input:
  "Email jane@example.com and let her know about ticket #4821."

Masked (sent to provider):
  "Email EMAIL_1 and let her know about ticket #4821."

Response (from provider):
  "I drafted an email to EMAIL_1 about ticket #4821."

Rehydrated (returned to your app):
  "I drafted an email to jane@example.com about ticket #4821."

Why deterministic placeholders beat generic redaction

Replacing every personal value with [REDACTED] works for legal disclosure but breaks LLM workloads. The model loses the ability to tell which entity is which. A masking scheme that preserves type and identity references keeps the model useful while still hiding the data.

What Privian detects today

Privian's beta detects more than 15 entity types, including:

People — PERSON
Contact info — EMAIL, PHONE_NUMBER, URL, IP_ADDRESS
Financial — CREDIT_CARD (Luhn-validated), IBAN, SWIFT
Locations and dates — LOCATION, DATE_OF_BIRTH
Developer secrets — OPENAI_API_KEY, AWS_ACCESS_KEY_ID, JWT, SLACK_TOKEN

The full catalog and example transformations live on the PII Masking product page.

Tradeoffs

Recall. No detector catches 100% of personal data, especially in unstructured text. Combine masking with upstream data minimization.
Determinism inside a request, not across requests. EMAIL_1 in request A and request B may refer to different values. This is intentional — keeping mappings across requests would require persistence we explicitly avoid.
Performance. Detection adds a few milliseconds. That cost is usually invisible next to the model's own latency.

How Privian fits

Send a prompt to POST /v1/gateway; the gateway applies masking, forwards the masked prompt, and rehydrates the response before returning it. No raw values are persisted. See Rehydration for the return-path detail.

Written under our editorial principles: implementation-grounded, honest about limitations, educational first.

Frequently asked questions

What is a deterministic placeholder?: A token in the form {TYPE}_{N} (for example EMAIL_1, PERSON_2) that replaces a real value. The same value in the same request gets the same placeholder, which lets the model reason about identity without seeing the underlying string.
Does the mapping ever leave the gateway?: No. The mapping is held in memory for the lifetime of a single request and discarded after the response is rehydrated. It is not written to logs, not stored in a database, and not shared between requests.
Will masking confuse the model?: For most chat, summarization, and classification workloads the model handles placeholders well — it can still reason about 'EMAIL_1 sent a complaint about PERSON_1's order' even without the raw values. For tasks that require the real string (e.g. lookups), keep that work outside the model.
How is this different from redaction?: Redaction usually replaces sensitive values with a generic marker like [REDACTED], destroying structure. Masking preserves type and identity references so the model and your application can both still work with the output.