Skip to content

PII Detection — GDPR-Compliant Data Scanning

Free-text fields are a GDPR blind spot. A Comment column that starts as “Great product!” eventually contains “Call Jean Dupont at +32 478 123 456”. Your data map doesn’t list Comment as containing personal data — but it does.

Granit.Privacy.AI provides IAIPiiDetector: a service that scans arbitrary text and returns a structured list of detected PII types. It integrates with Granit.Privacy’s data provider registry and can be called anywhere you handle user-generated text.

PiiTypeExamples
PersonName”Jean Dupont”, “Dr. Smith”
Emailjean@example.com
PhoneNumber”+32 478 123 456”, “(555) 867-5309”
Address”14 Rue de la Paix, Paris”
NationalIdSSN, NISS, BSN, NIE — country-agnostic
DateOfBirth”born 14/03/1985”, “DOB: 1985-03-14”
BankAccountIBAN, routing numbers
CreditCard4-group card numbers
OtherOther identifiers not in the list above
[DependsOn(
typeof(GranitPrivacyAIModule),
typeof(GranitAIOllamaModule))] // strongly recommended for GDPR
public class AppModule : GranitModule { }
public class CommentValidator(IAIPiiDetector piiDetector) : AbstractValidator<CreateCommentRequest>
{
public CommentValidator()
{
RuleFor(x => x.Body)
.MustAsync(async (text, ct) =>
{
PiiDetectionResult result = await piiDetector
.ScanAsync(text, ct)
.ConfigureAwait(false);
return !result.ContainsPii;
})
.WithMessage("Comments must not contain personal information (names, phone numbers, etc.).");
}
}
PiiDetectionResult result = await piiDetector.ScanAsync(userInput, ct).ConfigureAwait(false);
if (result.ContainsPii)
{
foreach (DetectedPii item in result.Items)
{
// item.Type → PiiType.Email, PiiType.PhoneNumber, ...
// item.Description → "email address detected in first sentence"
// Note: Description never contains the actual PII value
logger.LogWarning("PII detected: {Type} — {Description}", item.Type, item.Description);
}
}

The Description field explains where and what kind of PII was found, but never contains the actual value — so it is safe to log.

Composing PII detection into your pipeline

Section titled “Composing PII detection into your pipeline”

IAIPiiDetector is a standalone building block — you call ScanAsync from your own validators, message handlers, or batch jobs and act on the result. The framework ships only the detector; the action you take on a finding (queue for review, pseudonymise, notify the DPO) is application code.

// Wolverine handler — your app subscribes after a record is created.
// IPiiReviewQueue is application-defined; the framework provides IAIPiiDetector only.
public static async Task Handle(
CommentCreatedEvent evt,
IAIPiiDetector detector,
IPiiReviewQueue reviewQueue,
CancellationToken ct)
{
PiiDetectionResult result = await detector.ScanAsync(evt.Body, ct).ConfigureAwait(false);
if (result.ContainsPii)
{
// item.Description carries the location/kind, never the raw value — safe to persist.
await reviewQueue.EnqueueAsync(
evt.CommentId,
result.Items.Select(i => (i.Type, i.Description)),
ct).ConfigureAwait(false);
}
}

For PII detection, configure a separate AI workspace using a local model. This isolates PII traffic from other AI usage:

{
"AI": {
"Workspaces": [
{
"Name": "pii-detection",
"Provider": "Ollama",
"Model": "llama3.1",
"SystemPrompt": "You are a GDPR compliance assistant. Detect personally identifiable information in text. Be conservative — only flag clear PII, not ambiguous terms.",
"Temperature": 0.0
}
],
"Privacy": {
"WorkspaceName": "pii-detection"
}
}
}

Setting Temperature: 0.0 reduces hallucinations — you want deterministic detection, not creative interpretation.

PII detection calls the LLM synchronously. For high-throughput pipelines, run detection asynchronously via a Wolverine handler (see the example above) so the write path does not wait on the model.

ScenarioRecommended executionRationale
Form validation (user-facing)Synchronous, short timeoutBlocking is acceptable for submission; keep TimeoutSeconds low
Batch scan of existing dataAsync Wolverine handlerPotentially millions of records
Import pipelineAsync, after persistenceScan after save — upload should not wait

When the LLM is unavailable or times out, the detector returns a fallback result governed by FailMode:

  • Closed (default) — assumes PII is present (ContainsPii = true). A scanner outage therefore blocks a validation that gates on a clean result, rather than silently waving the text through. This is the conservative, GDPR-safe posture.
  • Open — assumes no PII (ContainsPii = false). Permissive, for development and testing only.

Every fallback is logged as a warning. FailMode = Open is rejected at startup outside the Development environment — a permissive PII scanner in production is treated as a misconfiguration, not a choice.

PropertyTypeDefaultDescription
WorkspaceNamestring"default"AI workspace for PII detection. Use an Ollama workspace in production
TimeoutSecondsint15LLM call timeout. PII detection is allowed more time than moderation
FailModePiiDetectionFailModeClosedBehaviour on scanner failure. Closed assumes PII is present (safe default); Open assumes none (Development only — rejected at startup elsewhere)