Article · AI Privacy

GDPR and LLMs, explained

What GDPR means for teams using GPT, Claude and other managed LLMs — personal data in prompts, provider boundaries, retention, and the technical controls teams adopt in practice.

9 min read · Updated June 2, 2026

This article is an educational reference. It is not legal advice. It describes how engineering and security teams in the EU typically think about LLM usage under GDPR, and what kind of technical controls show up in privacy-sensitive AI stacks.

What GDPR actually says about LLM usage

GDPR does not name large language models. It regulates the processing of personal data of individuals in the EU/EEA: a processing activity needs a lawful basis, transparency, appropriate contractual arrangements with processors, and reasonable technical and organizational measures to protect the data.

When an application sends a prompt that contains personal data to a managed LLM provider, that is a processing activity. The provider becomes a processor (or sub-processor). The same questions you ask of any third-party data flow apply here: what is the basis, what is the agreement, what is retained, where does it go, and how reversible is the decision.

What counts as personal data in a prompt

Personal data is any information relating to an identified or identifiable natural person. In a prompt, that commonly includes:

  • Names, email addresses, phone numbers, postal addresses.
  • Customer or employee identifiers that can be linked back to a person.
  • Free-text written by end users in support tickets, forms or chat transcripts.
  • Indirect identifiers that, combined, identify someone (job title + employer + city, for instance).

Special categories — health, religion, biometric data, etc. — are regulated more strictly. Many teams treat them as a separate class with stricter controls.

Why the prompt is the sensitive surface

Most data flows in a typical application are well-known: database reads, log writes, third-party integrations. The prompt is the new one. It is built dynamically, often from records, and the people who write the prompt-construction code are not always the same people who run the data-protection program.

The result is a class of accidental exposure that traditional data-protection tooling does not see — there is no "send personal data to OpenAI" SQL query to audit, just an HTTP call that depends on what happened to be in scope at the time.

Provider boundaries and retention

Each managed model provider has its own data-usage terms, retention defaults and product tiers. They change. Two practical recommendations regardless of provider:

  • Read the data-usage terms of the specific product tier you use, and re-read them periodically.
  • Do not rely on provider-side controls as your only layer — what enters the prompt is something you can influence directly.

What teams do in practice

Privacy-sensitive teams typically combine several layers. None of them, on their own, is sufficient:

  • Data minimization upstream. Only fetch the fields a prompt actually needs. Most prompts need less than they are given.
  • Prompt-level masking. Detect supported personal and sensitive values before the prompt leaves the application, and replace them with deterministic placeholders.
  • BYOK and provider isolation. Keep the provider relationship — contract, key, billing — inside the org.
  • Retention rules at the gateway. Persist structural metrics for observability, not raw prompt or response bodies.
  • Policy and training. Acceptable-use policy, a documented decision process for new use cases, periodic review.

Privian is one option for the masking, routing and retention parts. See the GDPR and LLMs pillar for the broader framing and PII Masking for what is currently supported.

Honest limitations

Prompt-level masking does not catch what its detector does not recognize, does not protect against prompts that were never routed through the gateway, and does not change what a provider may retain. Compliance is a property of the organization, not of any single tool.

Try Privian during beta

Protect prompts before they reach GPT, Claude and other models.

BYOK · Zero retention · Provider-agnostic. Privian is currently in beta — pricing and limits may change.

FAQ

Frequently asked questions

Can I send personal data to ChatGPT under GDPR?
Sometimes — with a lawful basis, an appropriate data-processing arrangement with the provider, and proportionate safeguards. GDPR does not ban LLM use; it expects you to be intentional about what is sent, why, and what protections exist.
Does GPT train on my prompts?
It depends on the product and account tier. Consumer ChatGPT, API access and enterprise tiers each have different defaults. Read the provider's current data-usage terms and confirm what applies to your account before assuming.
What does GDPR mean in practice for LLM workflows?
In practice it means knowing what personal data enters a prompt, who processes it downstream, what is retained, on what legal basis, and how quickly the flow can be changed. Most teams document this and add technical controls — masking, BYOK, gateway-level retention — to back the documentation up.
What are technical controls for AI privacy?
The common ones: prompt-level masking, retention controls at the gateway, BYOK so provider relationships stay inside the org, allow-listed models, and access controls on which services can call the gateway.