PII Masking
Deterministic detection and placeholder replacement of personal and sensitive data, performed at the edge before any provider call.
What it is
PII masking transforms a raw prompt into a semantically equivalent prompt where identifying values are replaced by stable placeholders. The LLM sees the structure of the request without the underlying personal data.
Supported entity types (beta)
The current deterministic detector covers:
- Identity:
PERSON,EMAIL,PHONE,IP_ADDRESS - National IDs:
SSN_US,SIN_CA(Luhn-validated) - Payment:
CREDIT_CARD(Luhn-validated),IBAN(mod-97 validated) - Secrets:
JWT,OPENAI_API_KEY,GITHUB_TOKEN,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,GENERIC_API_KEY,ENV_SECRET,SECRET_TOKEN
Detectors are pure regex + small validator helpers — no network calls, no external dependencies. Provider-specific tokens win over generic secrets when spans overlap.
Not yet supported as a first-class type
- Norwegian fødselsnummer (11-digit national ID). May currently be detected as
PHONE; a dedicatedNATIONAL_ID_NOtype with mod-11 checksum is tracked as future work. - Address blocks, free-form medical terms, and other domain-specific PII.
Placeholder format
Placeholders are {TYPE}_{N} where N is a per-type counter. Equivalent values (case-insensitive for PERSON/EMAIL, digits-only for PHONE, etc.) reuse the same token within a request:
Original: Email John Smith at john@acme.com. Cc john.smith@acme.com.
Masked: Email PERSON_1 at EMAIL_1. Cc EMAIL_2.Determinism and request-scoping
- The token map exists only for the lifetime of a single request.
- It is never persisted, logged, or sent to providers.
- Two requests with the same prompt produce the same masking shape but new, ephemeral mappings.
Confidence and fallback
When the deterministic detector accepts an entity it is the source of truth. A bounded LLM fallback may supplement low-confidence regions but can never overwrite, remap, or delete a deterministic match. Fallback is hardened against prompt injection and is rarely invoked on healthy traffic.
Limitations (beta)
- Detection is conservative — ordinary identifiers like
EMAIL_HANDLER_22are intentionally not masked. - Free-form addresses and informal references (“my boss”) are out of scope.
- Image and audio modalities are not supported.