Pillar · EU privacy

GDPR and LLMs

Reducing prompt-level sensitive-data exposure when teams use GPT, Claude and other managed models.

An educational reference for engineering, security and platform teams in the EU. Privian helps reduce one specific risk — sensitive values entering prompts that reach third-party providers. It does not provide legal advice or compliance guarantees.

Definition

What GDPR means for LLM usage

GDPR does not ban the use of large language models. It does require that any personal data processed through an AI workflow has a lawful basis, an appropriate data-processing arrangement with the provider, and reasonable technical and organizational measures to protect it.

For an LLM-powered application, that usually translates into a handful of concrete questions: what personal data ends up in the prompt, where does that prompt go, who retains what, and how quickly can the data flow be changed if something goes wrong.

This page is educational. It is not legal advice. It describes patterns teams use in practice and where a privacy-first LLM gateway like Privian fits in.

Why it matters

Why prompt privacy matters

The prompt is the new data-export surface. Anything in the prompt is, by definition, sent to a third party. For most teams this is the first place where personal data crosses an organisational boundary without going through the usual review.

Prompts also have unusual properties: they are often built dynamically from records, they frequently include free-text written by end users, and they are shaped by individual developers under deadline. A field that nobody intended to expose can end up in a prompt simply because it was on the same object as something that was needed.

Exposure surface

Sensitive-data exposure in prompts

  • Direct identifiers

    Names, emails, phone numbers, addresses — frequently included so the model can write a personalized reply.

  • Account identifiers

    Customer IDs, order numbers, internal references that map back to a person on the receiving end of a support workflow.

  • Free-text from end users

    Support tickets, form submissions and chat transcripts that may contain anything from health details to payment context.

  • Internal metadata

    Employee names, internal hostnames, project codenames and other organizational data that leak business context.

  • Secrets in stack traces

    API keys, JWTs and tokens that end up in debugging or code-review prompts without anyone noticing.

  • Documents and attachments

    Whole documents pasted into a prompt to summarize, classify or extract — often containing more than the user realises.

Organizational reality

Why policies alone are insufficient

Policy and training matter. They set expectations and create the shared vocabulary a team needs. In practice, teams that handle sensitive data also add technical controls — not because their colleagues are careless, but because people inevitably paste information into tools that help them move faster.

The same pattern shows up across other data-protection disciplines: secret scanning in source control, DLP in email, masking in analytics pipelines. Each one assumes that a written rule is necessary but not sufficient, and that a technical check is what catches the long tail.

Prompt-level controls follow the same logic. They do not replace policy or governance — they sit underneath them and handle the cases where intent and behavior diverge.

Controls

Technical controls teams use

There is no single answer. Most privacy-sensitive AI stacks combine several of the following, weighted to the team's constraints:

Self-hosted or on-premise models

What it is: Run an open-weight model on infrastructure the team controls.

Tradeoff: Strongest data-residency story; significant operational cost, narrower model selection and slower access to frontier capabilities. Suitable for teams with the engineering bandwidth and a strict residency requirement.

Provider-side zero-retention / enterprise terms

What it is: Use provider features and contractual terms that limit or disable provider-side retention.

Tradeoff: Reduces provider-side persistence and may satisfy procurement, but does not affect what enters the prompt in the first place.

Prompt-level masking / redaction

What it is: Detect supported personal and sensitive values before the prompt leaves the application, and replace them with deterministic placeholders.

Tradeoff: Reduces the data sent to the provider for supported entity types. Does not catch what the detector does not recognize and does not address downstream provider behavior.

BYOK and provider isolation

What it is: Route requests using the organization's own provider API key, keeping the provider contract and billing inside the org.

Tradeoff: Improves trust boundaries and key rotation; does not prevent sensitive values from entering prompts on its own.

Model and tool restrictions

What it is: Allow only specific models, allow-list which services can call the gateway, and limit which features can post free-text from end users.

Tradeoff: Effective but requires inventory work and ongoing maintenance.

Policies, training and AI governance

What it is: Acceptable-use policies, employee training, incident processes, and an AI governance forum that reviews new use cases.

Tradeoff: Necessary foundation; insufficient on its own because individual prompts are not reviewed before they are sent.

Privian is designed for teams that want to keep using managed models — GPT, Claude, Gemini and others — while reducing sensitive-data exposure in the prompts they send. It is one layer in a broader stack, not a replacement for governance, self-hosting or provider-side controls.

Concept

Prompt-level data protection, explained

Prompt-level data protection is a narrow idea: apply controls to the prompt itself, at the moment it is built or sent, rather than relying on what happens after it reaches the provider.

The mechanics are simple:

  • Detect supported personal and sensitive values in the inbound prompt.
  • Mask each detected value with a deterministic placeholder, scoped to the request.
  • Route the masked prompt to the provider using the organization's own key.
  • Rehydrate placeholders in the response so the calling application sees the original values it submitted.
  • Retain nothing beyond structural counters — model, latency, masked entity counts.

None of this is a substitute for data minimisation upstream of the prompt. The most reliable way to keep something out of a provider's hands is not to put it in the prompt at all.

Scope

What Privian helps with

  • Masking supported sensitive values

    Names, emails, phone numbers, addresses, account identifiers, payment data and developer secrets — replaced with deterministic placeholders before the prompt is forwarded.

  • Provider-agnostic BYOK routing

    Privian forwards requests using your provider API key. Your contract, your billing, your provider-side terms.

  • Limited retention at the gateway

    Privian persists structural metrics for observability — model, latency, masked entity counts — not raw prompt or response bodies.

  • A single enforcement point

    All AI traffic flows through one endpoint, so masking and routing policy do not depend on every client doing the right thing.

Honest scope

What Privian does NOT claim

Trust matters more than claims. Privian explicitly does not claim any of the following:

  • GDPR certification or "GDPR compliant AI".
  • Legal compliance guarantees or legal advice.
  • HIPAA, SOC 2 or PCI certification.
  • Prompt-injection or jailbreak prevention.
  • Control over downstream provider behavior or provider-side retention.
  • Isolation equivalent to a self-hosted or on-premise model.
  • Coverage of every possible sensitive value — detection is bounded to a supported entity set that evolves over time.

See the LLM security pillar for the broader picture, or the AI Security Layer category for the architectural framing.

FAQ

Frequently asked questions

Can I use GPT under GDPR?
Yes — many EU teams use GPT, Claude and other managed LLMs. GDPR does not ban LLM use; it requires that personal data is processed lawfully, transparently and with appropriate technical and organizational measures. In practice that means thinking carefully about what enters a prompt, where it is sent, what is retained and on what legal basis.
Is personal data in a prompt regulated under GDPR?
If a prompt contains identifying information about a natural person — names, emails, phone numbers, IDs, free-text written by a customer — that is personal data. Sending it to a third-party model provider is a processing activity that needs a lawful basis, a data-processing arrangement with the provider, and appropriate safeguards.
What happens if employees paste customer data into ChatGPT?
It becomes a processing activity that the organization may not have controls or contracts for. Even where the provider offers enterprise terms, ad-hoc pasting bypasses the organization's data inventory, retention policy and access controls. Most teams respond with a combination of policy, training and technical controls — masking at the edge, restricting which tools can be used, or routing AI traffic through a gateway they control.
What is prompt-level data protection?
Controls applied at the moment a prompt is built or sent — typically detection and masking of sensitive values, retention rules at the gateway, and BYOK so credentials and billing remain inside the organization. It is one layer in a broader AI control stack alongside policy, training and governance.
Does Privian make my AI usage GDPR compliant?
No. Compliance is an organizational property, not a product property. Privian helps reduce one specific risk — sensitive data flowing into prompts that reach third-party providers — by masking supported entities, keeping no raw prompt or response bodies, and routing through your own provider keys. Legal compliance still depends on your data inventory, lawful basis, contracts, retention rules and broader controls.
What does Privian explicitly NOT claim?
Privian does not claim GDPR certification, legal compliance, HIPAA, SOC 2, PCI, prompt-injection blocking, jailbreak prevention, downstream provider guarantees, or isolation equivalent to a self-hosted model. We focus on prompt-level masking, BYOK routing and limited retention at the gateway.
Is BYOK enough on its own?
BYOK improves the trust boundary — your provider key, your billing relationship, your provider-side terms — but it does not, by itself, prevent sensitive data from entering prompts. BYOK and prompt-level masking complement each other.