PII Masking

Deterministic detection and placeholder replacement of personal and sensitive data, performed at the edge before any provider call.

What it is

PII masking transforms a raw prompt into a semantically equivalent prompt where identifying values are replaced by stable placeholders. The LLM sees the structure of the request without the underlying personal data.

Supported entity types (beta)

The current deterministic detector covers:

  • Identity: PERSON, EMAIL, PHONE, IP_ADDRESS
  • National IDs: SSN_US, SIN_CA (Luhn-validated)
  • Payment: CREDIT_CARD (Luhn-validated), IBAN (mod-97 validated)
  • Secrets: JWT, OPENAI_API_KEY, GITHUB_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, GENERIC_API_KEY, ENV_SECRET, SECRET_TOKEN

Detectors are pure regex + small validator helpers — no network calls, no external dependencies. Provider-specific tokens win over generic secrets when spans overlap.

Not yet supported as a first-class type

  • Norwegian fødselsnummer (11-digit national ID). May currently be detected as PHONE; a dedicated NATIONAL_ID_NO type with mod-11 checksum is tracked as future work.
  • Address blocks, free-form medical terms, and other domain-specific PII.

Placeholder format

Placeholders are {TYPE}_{N} where N is a per-type counter. Equivalent values (case-insensitive for PERSON/EMAIL, digits-only for PHONE, etc.) reuse the same token within a request:

text
Original:  Email John Smith at john@acme.com. Cc john.smith@acme.com.
Masked:    Email PERSON_1 at EMAIL_1. Cc EMAIL_2.

Determinism and request-scoping

  • The token map exists only for the lifetime of a single request.
  • It is never persisted, logged, or sent to providers.
  • Two requests with the same prompt produce the same masking shape but new, ephemeral mappings.

Confidence and fallback

When the deterministic detector accepts an entity it is the source of truth. A bounded LLM fallback may supplement low-confidence regions but can never overwrite, remap, or delete a deterministic match. Fallback is hardened against prompt injection and is rarely invoked on healthy traffic.

Limitations (beta)

  • Detection is conservative — ordinary identifiers like EMAIL_HANDLER_22 are intentionally not masked.
  • Free-form addresses and informal references (“my boss”) are out of scope.
  • Image and audio modalities are not supported.

Related