Skip to content

Import Mapping — AI Column Matching

Every B2B application faces the same nightmare: data onboarding. Your client exports a CSV from their legacy system (Sage, SAP, AS/400) with columns named Nom_Clt_V2_Final, dt_crea, or simply F_003. Your system expects CustomerName, CreatedAt, AmountExclTax. Someone has to map each column — manually, every time.

Granit.DataExchange.AI solves this by adding an AI-powered Tier 4 to the existing mapping pipeline. When exact and fuzzy matching fail, the LLM analyzes the column names and target schema to suggest the mapping automatically.

Traditional approaches fail on real-world data:

Source columnTarget propertyExact match?Fuzzy match?AI match?
EmailEmailYes
CourrielEmailNoYes (alias)
Nom_Clt_V2_FinalCustomerNameNoNoYes (0.92)
MONTANT HTAmountExclTaxNoNoYes (0.95)
dt_creaCreatedAtNoNoYes (0.88)
COL1???NoNoNeeds preview rows

The first two tiers (exact + fuzzy) handle clean data. The AI tier handles the rest — which is most of what you encounter in production B2B onboarding.

flowchart TD
    H[File headers] --> T1[Tier 1: Saved mappings]
    T1 -->|unmapped columns| T2[Tier 2: Exact match]
    T2 -->|unmapped columns| T3[Tier 3: Fuzzy match]
    T3 -->|unmapped columns| T4[Tier 4: AI Semantic]
    T4 --> R[Final mapping]
    T1 --> R
    T2 --> R
    T3 --> R

    style T4 fill:#e8f5e9,stroke:#4caf50

Each tier only processes columns that previous tiers couldn’t match. The best confidence wins per column. This means the AI is only called when necessary — most columns are matched by cheaper tiers first.

TierConfidenceSpeedCost
SavedHighest (user-confirmed)InstantFree
ExactHigh (case-insensitive name/alias)InstantFree
FuzzyMedium (Levenshtein ≥ 0.8)InstantFree
Semantic (AI)Variable (0.0–1.0)~200msLLM tokens
[DependsOn(
typeof(GranitDataExchangeAIModule),
typeof(GranitAIOllamaModule))]
public class AppModule : GranitModule { }
builder.AddGranitAI();
builder.AddGranitAIOllama(); // or any provider
builder.AddGranitDataExchangeAI();
{
"AI": {
"DataExchange": {
"WorkspaceName": "default",
"TimeoutSeconds": 10,
"MinConfidenceScore": 0.6
}
}
}

That’s it. The AI tier is automatically registered as the ISemanticMappingService implementation, replacing the default no-op.

This is critical for GDPR compliance. The LLM receives only metadata:

Source columns: Nom_Clt_V2_Final, dt_crea, MONTANT HT, TVA
Target properties:
| Property | Type | Display Name | Description | Required |
|----------------|------------------|------------------|----------------------|----------|
| CustomerName | String | Customer Name | Full customer name | Yes |
| CreatedAt | DateTimeOffset | Creation Date | Record creation date | Yes |
| AmountExclTax | Decimal | Amount excl. tax | Net amount | Yes |
| VatRate | Decimal | VAT Rate | VAT percentage | No |

The LLM never receives:

  • Row data (names, emails, amounts)
  • Database records
  • Tenant identifiers
  • Any business data whatsoever

When column headers are meaningless (COL1, F_003, FIELD_A), headers alone aren’t enough. You can opt-in to send the first few data rows to the LLM:

{
"AI": {
"DataExchange": {
"IncludePreviewRows": true,
"PreviewRowCount": 5
}
}
}

The LLM then sees:

Source columns: COL1, COL2, COL3
Sample data (first rows):
| COL1 | COL2 | COL3 |
|-------------------|------------|----------|
| [email protected] | John Doe | +32 123 |
| [email protected] | Jane Smith | +32 456 |

With this context, the LLM can infer that COL1 is an email, COL2 is a name, etc.

Soft dependency — zero changes to DataExchange

Section titled “Soft dependency — zero changes to DataExchange”

Granit.DataExchange.AI is a pure additive package. The DataExchange module defines ISemanticMappingService with a null-object default:

Granit.DataExchange → defines ISemanticMappingService
registers NullSemanticMappingService (IsAvailable = false)
Granit.DataExchange.AI → implements AISemanticMappingService (IsAvailable = true)
references Granit.DataExchange replaces via DI
references Granit.AI

Without the AI package: Tier 4 is silently skipped. With it: Tier 4 activates. No if statements, no feature flags — just DI composition.

The AI mapping service applies multiple layers of defense:

  • Prompt injection protection — preview row cell values and target field metadata (DisplayName, Description) are sanitized before embedding in the prompt. XML-like tags, pipe characters, and newlines are escaped to prevent markdown table breakout.
  • Output validation — LLM suggestions are validated against known ImportFieldMetadata property paths and source headers. Suggestions targeting unknown properties are discarded. Confidence scores are clamped to [0.0, 1.0] to prevent the LLM from inflating scores.
  • Preview row truncation — when IncludePreviewRows is enabled, rows are truncated to PreviewRowCount before prompt construction, preventing unbounded token consumption.
  • Startup validationDataExchangeAIOptions are validated at startup. Invalid configurations (TimeoutSeconds: 0, MinConfidenceScore: -1.0) are rejected immediately.
RiskMitigation
LLM timeout10s timeout (configurable), fallback to empty result — Tiers 1-3 still work
Wrong mappingMinimum confidence threshold (0.6), user validates in wizard before import
PII in preview rowsDisabled by default, explicit opt-in required
CostOnly called for unmapped columns after 3 free tiers, typically 1-5 columns per import
LLM hallucinationStructured output (JSON), validated against known property paths
Prompt injectionCell values and field metadata sanitized before prompt embedding
PropertyTypeDefaultDescription
WorkspaceNamestring"default"AI workspace for mapping suggestions
TimeoutSecondsint10LLM call timeout
MinConfidenceScoredouble0.6Minimum score to accept a suggestion
IncludePreviewRowsboolfalseInclude sample data rows (GDPR opt-in)
PreviewRowCountint5Number of preview rows when enabled