Skip to content

Granit.AI — Run LLMs without locking yourself into a vendor

LLM features look easy in a prototype and turn ugly in production. You start with one SDK. You add a second one for the EU region. The bill explodes because nothing tracks tokens. A compliance auditor asks who called which model with what data — and no one can answer. A vendor changes its pricing and you discover the integration is wired into every service.

Granit.AI solves the production problem, not the prototype problem. Your application code depends on IChatClient from Microsoft.Extensions.AI — never on an OpenAI or Azure SDK. Providers swap per environment. Workspaces are named, multi-tenant, and permission-scoped. Every call is metered. Every API key lives in a vault. Every model swap is a one-line config change.

PainGranit.AI’s answer
Vendor lock-inIChatClient abstraction — swap OpenAI ↔ Azure ↔ Ollama with one config setting
Hardcoded model parameters scattered across servicesWorkspaces — named configs (document-extraction, support-chat) bind provider, model, prompt, temperature
API keys in appsettings.jsonFirst-class Vault integration with rotation
Untraceable LLM costsEvery call recorded with tenant, user, tokens, duration, estimated cost
Auditor: “who asked the LLM about patient X?”ISO 27001 audit trail with tenant + user + workspace + model on every interaction
No way to cap a runaway tenantPer-tenant hourly quota with Skip/Reject behaviour
Prompt injectionPromptBuilder separates instructions from user data with sanitization + delimiters

Microsoft.Extensions.AI (Microsoft’s official .NET 10 AI abstraction) is the standard your code should depend on. It gives you the same middleware pipeline as Microsoft.Extensions.Logging: logging, caching, OpenTelemetry, function calling — all provider-agnostic.

ApproachProsCons
Direct SDK (OpenAI, Anthropic.SDK…)Full API accessVendor lock-in, no middleware, each module reimplements integration
Custom abstraction (ITextGenerationService)Full controlIsolated from .NET ecosystem, reinvents the wheel
Microsoft.Extensions.AI.NET 10 standard, ~100 compatible packages, middleware pipelineNewer API surface

Granit.AI builds on top of it — adding the operational layer (workspaces, persistence, tenancy, audit, quota, cost) that the abstraction alone doesn’t provide.

  • DirectoryGranit.AI/ Core abstractions, workspace + capability resolver, factories, quota guard
    • Granit.AI.OpenAI OpenAI provider (GPT-4o, o3/o4-mini, embeddings) + dynamic model catalog
    • Granit.AI.AzureOpenAI Azure OpenAI with DefaultAzureCredential + deployment catalog
    • Granit.AI.Anthropic Anthropic provider (Claude Opus/Sonnet/Haiku), chat completion only — no embeddings
    • Granit.AI.Ollama Local models via OllamaSharp (dev, on-premise, GDPR strict)
    • Granit.AI.EntityFrameworkCore Isolated DbContext for workspaces and usage records
    • Granit.AI.Extraction Structured document extraction (PDF → typed C# record)
    • Granit.AI.VectorData Multi-tenant vector storage for semantic search (RAG)
    • Granit.AI.Mcp Server-side MCP tool sources for AI agents
    • Granit.AI.Endpoints REST API: workspaces, provider discovery, usage query, chat & embedding proxy
  • Granit.DataExchange.AI AI-powered column mapping for CSV/Excel import
  • Granit.QueryEngine.AI Natural Language Query — phrases to structured filters
PackageRoleDepends on
Granit.AIIAIChatClientFactory, IAIWorkspaceManager, IAIQuotaGuard, usage trackingGranit
Granit.AI.OpenAIOpenAI provider + IAIModelCatalog (live /v1/models listing)Granit.AI
Granit.AI.AzureOpenAIAzure OpenAI with API key or Managed IdentityGranit.AI
Granit.AI.AnthropicAnthropic (Claude) provider, chat-onlyGranit.AI
Granit.AI.OllamaLocal models, /api/show capability resolution, health checkGranit.AI
Granit.AI.EntityFrameworkCoreAIDbContext, EfAIWorkspaceStore, EfAIUsageStoreGranit.AI, Granit.Persistence
Granit.AI.EndpointsWorkspace CRUD, provider discovery, usage query, chat/embedding proxyGranit.AI, Granit.Authorization, Granit.QueryEngine.AspNetCore
Granit.AI.ExtractionIDocumentExtractor<T> — structured output from documentsGranit.AI
Granit.AI.VectorDataIVectorCollection<T>, ISemanticSearchService for RAGGranit.AI
Granit.AI.McpIMcpToolSourceProvider — Model Context Protocol toolsGranit.AI
Granit.DataExchange.AIISemanticMappingService for import mappingGranit.AI, Granit.DataExchange
Granit.QueryEngine.AIINaturalLanguageQueryTranslator (NLQ → QueryRequest)Granit.AI, Granit.QueryEngine

Four providers ship in-box. They share the same IAIProviderFactory contract — pick based on what the deployment environment requires, not on what your code expects.

ProviderBest forAuthModel catalog
OllamaDev laptops, on-premise, strict GDPR (data never leaves)NoneLive via GET /api/tags (30s cache); capabilities from /api/show
OpenAICloud SaaS, fast iteration, frontier modelsAPI keyLive via GET /v1/models, enriched with known capabilities
Azure OpenAIEU data residency, HDS, enterprise complianceAPI key or Managed IdentityConfigured deployments
AnthropicClaude Opus/Sonnet/Haiku, extended thinkingAPI keyStatic (no catalog API)
[DependsOn(typeof(GranitAIOllamaModule))]
public class AppModule : GranitModule;
builder.AddGranitAI();
builder.AddGranitAIOllama();

No configuration needed — connects to http://localhost:11434 with llama3.1 by default.

Terminal window
# Start Ollama and pull the default model
ollama pull llama3.1
ollama serve

For production, add Granit.AI.EntityFrameworkCore to persist workspaces and track usage:

[DependsOn(
typeof(GranitAIOllamaModule), // or any provider
typeof(GranitAIEntityFrameworkCoreModule))]
public class AppModule : GranitModule;
builder.AddGranitAI();
builder.AddGranitAIOllama();
builder.AddGranitAIEntityFrameworkCore(options =>
options.UseNpgsql(builder.Configuration.GetConnectionString("AI")));

Without this package, Granit.AI runs with null implementations: workspaces are code-only, usage records are discarded, the API still works. Add it the day you need persistence — no code change, just a module reference.

A workspace is a named AI configuration: which provider, which model, which prompt, which sampling parameters. Application code asks for a workspace by name. The factory returns a configured IChatClient. The day you swap OpenAI for Azure, you change the workspace — not the calling code.

flowchart LR
    App[Application code] -->|"CreateAsync('support-chat')"| F[IAIChatClientFactory]
    F --> WP[IAIWorkspaceProvider]
    WP --> SYS[System workspaces<br/>code-defined, immutable]
    WP --> DYN[Dynamic workspaces<br/>database, API-managed]
    F --> PF[IAIProviderFactory]
    PF --> OAI[OpenAI]
    PF --> AAI[Azure OpenAI]
    PF --> OL[Ollama]
    F -->|returns| CC[IChatClient]

Declare workspaces that should not drift between environments — extraction pipelines, internal classifiers, anything where a runtime change would be a defect:

public class AppWorkspaceDefinitionProvider : IAIWorkspaceDefinitionProvider
{
public void Define(IAIWorkspaceDefinitionContext context)
{
context.Add(new AIWorkspace
{
Name = "document-extraction",
Provider = "AzureOpenAI",
Model = "gpt-4o",
SystemPrompt = "Extract structured data from documents. Return valid JSON only.",
Temperature = 0.0f,
MaxOutputTokens = 4096,
});
context.Add(new AIWorkspace
{
Name = "invoice-summary",
Provider = "OpenAI",
Model = "gpt-4o-mini",
SystemPrompt = "Summarize invoices in three bullet points.",
Temperature = 0.3f,
});
}
}

System workspaces take precedence over dynamic workspaces with the same name. The API rejects modifications and deletions on system workspaces with 422 Unprocessable Entity.

For per-tenant customization or runtime tuning, dynamic workspaces are persisted via Granit.AI.EntityFrameworkCore and mutated through IAIWorkspaceManager — which centralizes invariants (Dynamic-only guard, duplicate-name check):

public class TenantOnboardingService(IAIWorkspaceManager workspaceManager)
{
public async Task OnboardAsync(Tenant tenant, CancellationToken ct)
{
await workspaceManager.CreateAsync(new AIWorkspace
{
Name = $"tenant-{tenant.Slug}-chat",
Provider = "AzureOpenAI",
Model = "gpt-4o",
SystemPrompt = tenant.SupportPersona,
Temperature = 0.7f,
}, ct).ConfigureAwait(false);
}
}

Duplicate names are caught at the database level (unique constraint, 409 Conflict). Deactivating a workspace via Activated = false keeps it visible in the admin API but prevents execution — chat and embedding factories throw AIWorkspaceNotActiveException (422).

public class InvoiceSummaryService(IAIChatClientFactory chatClientFactory)
{
public async Task<string> SummarizeAsync(Invoice invoice, CancellationToken ct)
{
IChatClient client = await chatClientFactory
.CreateAsync("invoice-summary", ct)
.ConfigureAwait(false);
ChatResponse response = await client.GetResponseAsync(
$"Summarize this invoice in 3 bullet points: {invoice.Description}",
cancellationToken: ct)
.ConfigureAwait(false);
return response.Text;
}
}
public class PatientSearchService(IAIEmbeddingGeneratorFactory embeddingFactory)
{
public async Task<Embedding<float>> EmbedAsync(string query, CancellationToken ct)
{
IEmbeddingGenerator<string, Embedding<float>> generator =
await embeddingFactory.CreateAsync("embeddings", ct).ConfigureAwait(false);
return await generator
.GenerateEmbeddingAsync(query, cancellationToken: ct)
.ConfigureAwait(false);
}
}
  • No provider registered → AIProviderNotRegisteredException with a message naming the missing package.
  • No persistence adapter → null stores; workspaces come from code definitions and usage records are discarded.
  • Workspace deactivated → AIWorkspaceNotActiveException on execution (422).
  • Quota exceeded → either skipped (returns no enrichment) or rejected, depending on AIQuotaOptions.ExceededBehavior.

Not every model can do every job. gpt-4o does vision, text-embedding-3 does only embeddings, llama3.1:8b doesn’t do tool calling. Granit.AI resolves capabilities from the provider’s catalog at runtime and surfaces them on every workspace response — so the frontend can show “Upload image” only when the model supports it.

CapabilityTypeDefault
Chatbooltrue
Embeddingsboolfalse
Visionboolfalse
ImageGenerationboolfalse
Audioboolfalse
ToolUseboolfalse
Streamingbooltrue
StructuredOutputboolfalse
ExtensionsIReadOnlySet<string>empty

Extensions covers provider-specific features (web search, code interpreter, file upload, citations…). Use WellKnownAICapabilities constants for portable lookups.

Ollama queries /api/show per model and maps completion, embedding, vision, tools into the typed properties. OpenAI enriches its live /v1/models listing with known capabilities. Azure OpenAI inspects configured deployments.

Every IChatClient call is tracked when GranitAIOptions.EnableUsageTracking is true (default). The AIUsageRecord captures:

FieldDescription
TenantIdTenant that made the request
UserIdUser who triggered the AI call
WorkspaceNameWorkspace used
ConversationIdAgentic-chat conversation the call belongs to (Guid?, null for non-chat calls)
Provider / ModelProvider and model identifier
InputTokens / OutputTokensToken consumption
EstimatedCost / CostCurrencyCost estimate and ISO 4217 currency code
DurationResponse time
TimestampWhen the interaction occurred (UTC)

Streaming completions track usage at the end of the stream (Microsoft.Extensions.AI emits a UsageContent final chunk). Records are persisted by Granit.AI.EntityFrameworkCore and queryable through GET /ai/usage for billing dashboards, cost alerts, and showback per tenant.

When the call originates from the agentic chat surface, ConversationId links the usage record to its conversation, so tokens, latency, and cost can be billed or audited per conversation. It is filterable and group-by enabled on the GET /ai/usage query. Three constraints apply:

  • No migration ships with the framework. ConversationId is a new column on ai_usage_records; the consuming application owns the EF migration that adds it.
  • Not an OpenTelemetry tag. ConversationId is deliberately excluded from the metric tags — its cardinality is unbounded. Per-conversation slicing is a relational query, never a metric dimension.
  • No per-turn linkage yet. Records attach to the conversation, not to an individual message. Per-message (MessageId) linkage is a planned follow-up.

Quotas — fail safe before it costs you money

Section titled “Quotas — fail safe before it costs you money”

A runaway tenant or a buggy loop can run up four-figure bills in an hour. AIQuotaOptions caps requests per tenant per rolling hour. When the limit is hit, AI calls either:

  • Skip (default) — the call returns a default result; the host operation (e.g. blob classification) succeeds without AI enrichment.
  • Reject — the call throws, surfacing as 429 Too Many Requests. Use for strict enforcement.
{
"AI": {
"Quota": {
"MaxRequestsPerTenantPerHour": 500,
"ExceededBehavior": "Skip"
}
}
}

Set MaxRequestsPerTenantPerHour: 0 (default) for unlimited.

Each AI integration has its own page with examples, architecture diagrams, risk analysis, and GDPR considerations:

ModulePain it solvesPage
DataExchange.AICryptic CSV columns like Nom_Clt_V2_FinalAI: Import Mapping
AI.Extraction”Re-keying every invoice PDF wastes hours”AI: Document Extraction
QueryEngine.AIUsers don’t know how to build filtersAI: Natural Language Query
AI.VectorDataKeyword search misses synonyms and intentAI: Semantic Search & RAG

Every AI integration follows the same pattern — the existing module declares an interface with a null-object default; the .AI package replaces it via DI:

Granit.{Module} → defines IService + NullService (no-op default)
no reference to AI
Granit.{Module}.AI → implements AIService
references Granit.{Module} replaces via DI
references Granit.AI

Without the AI package, the feature is silently skipped. With it, it activates. No if statements, no feature flags — just DI composition.

PropertyTypeDefaultDescription
DefaultWorkspacestring"default"Workspace used when none is specified
EnableUsageTrackingbooltrueTrack token consumption and cost
EnableAuditTrailbooltrueLog every AI interaction (ISO 27001)
PropertyTypeDefaultDescription
MaxRequestsPerTenantPerHourint0Per-tenant hourly quota; 0 = unlimited
ExceededBehaviorenumSkipSkip (graceful no-op) or Reject (429)

OpenAIProviderOptions (section: AI:OpenAI)

Section titled “OpenAIProviderOptions (section: AI:OpenAI)”
PropertyTypeDefaultDescription
ApiKeystringRequired. Inject from Vault
Endpointstring?nullCustom base URL (OpenAI-compatible proxies)
DefaultModelstring"gpt-4o"Fallback chat model
DefaultEmbeddingModelstring"text-embedding-3-small"Fallback embedding model

AzureOpenAIProviderOptions (section: AI:AzureOpenAI)

Section titled “AzureOpenAIProviderOptions (section: AI:AzureOpenAI)”
PropertyTypeDefaultDescription
EndpointstringRequired. HTTPS URI of the Azure resource
ApiKeystring?nullLeave empty in production → Managed Identity
DefaultDeploymentstring"gpt-4o"Fallback deployment name
DefaultEmbeddingDeploymentstring"text-embedding-3-small"Fallback embedding deployment

OllamaProviderOptions (section: AI:Ollama)

Section titled “OllamaProviderOptions (section: AI:Ollama)”
PropertyTypeDefaultDescription
Endpointstring"http://localhost:11434"Ollama server URL
DefaultModelstring"llama3.1"Fallback model

DataExchangeAIOptions (section: DataExchange:AI)

Section titled “DataExchangeAIOptions (section: DataExchange:AI)”
PropertyTypeDefaultDescription
WorkspaceNamestring"default"AI workspace for mapping suggestions
TimeoutSecondsint10LLM call timeout
MinConfidenceScoredouble0.6Minimum score to accept a suggestion
IncludePreviewRowsboolfalseInclude sample data rows in prompt (GDPR opt-in)
PreviewRowCountint5Number of preview rows when enabled

ExtractionOptions (section: AI:Extraction)

Section titled “ExtractionOptions (section: AI:Extraction)”
PropertyTypeDefaultDescription
WorkspaceNamestring"default"AI workspace for extraction
ReviewThresholddouble0.7Below this confidence → NeedsReview status
TimeoutSecondsint30Extraction timeout (OCR + LLM)

QueryEngineAIOptions (section: QueryEngine:AI)

Section titled “QueryEngineAIOptions (section: QueryEngine:AI)”
PropertyTypeDefaultDescription
WorkspaceNamestring"default"AI workspace for NLQ translation
TimeoutSecondsint5NLQ should be fast — short timeout

VectorDataOptions (section: AI:VectorData)

Section titled “VectorDataOptions (section: AI:VectorData)”
PropertyTypeDefaultDescription
EmbeddingWorkspacestring"default"Workspace for embedding generation
DefaultSearchLimitint10Default number of results
  • Data minimization — workspaces control what reaches the LLM via system prompts; no business data is sent without an explicit IncludePreviewRows opt-in.
  • Right to erasure — workspace and usage records support soft delete; vector embeddings are deletable per tenant.
  • Data residency — Azure OpenAI supports EU-only deployments; Ollama keeps every byte on-premise.
  • PII protection — usage records store identifiers (tenant, user, workspace), never conversation content.
  • Audit trail — every AI interaction is recorded with tenant, user, workspace, model, tokens, and timestamp.
  • Immutable recordsAIUsageRecord is CreationAuditedEntity; no update/delete surface.
  • Audited mutations — workspace changes write AuditedEntity (CreatedAt/By, ModifiedAt/By).

When Granit.AI.EntityFrameworkCore is registered, two tables ship:

TableEntityPurpose
ai_workspacesAIWorkspaceEntityDynamic workspace configurations (soft-deletable, audited)
ai_usage_recordsAIUsageRecordEntityToken consumption and cost tracking (immutable)

Key indexes:

  • uq_ai_workspaces_tenant_name — unique workspace name per tenant (enforces the 409 Conflict on create at the DB level, not just at the manager level)
  • ix_ai_usage_records_tenant_workspace_date — billing queries by workspace and period
  • ix_ai_usage_records_tenant_provider_model — cost aggregation by provider/model

PromptBuilder — defense against prompt injection

Section titled “PromptBuilder — defense against prompt injection”

All AI integration modules use PromptBuilder to separate system instructions from user-controlled data. This mitigates OWASP LLM01 (Prompt Injection) by wrapping user input in structured <data> delimiters with sanitization:

var pb = new PromptBuilder(maxInputLength: 50_000);
// Developer-controlled instructions (not sanitized)
pb.AppendInstruction("Analyze the following text for PII.");
pb.AppendInstruction("Return ONLY valid JSON.");
// User-controlled data (sanitized + truncated + delimited)
pb.AppendUserData("User ID", userId);
pb.AppendUserTextBlock("Document", documentText);
string prompt = pb.Build();

Output:

Analyze the following text for PII.
Return ONLY valid JSON.
User ID: <data>user-123</data>
Document
<data>
The actual document text goes here, sanitized...
</data>
MethodUse for
AppendInstruction(text)System instructions (developer-controlled, not sanitized)
AppendUserData(label, value)Short user values wrapped in <data>
AppendUserTextBlock(label, text)Long text blocks wrapped in <data>
AppendUserDataJson(label, value)JSON-encoded values (prevents structural breakout)
AppendUserDataMap(label, pairs)Key-value maps in a <data> block
  1. Delimiter isolation — user data in <data> blocks, instructions outside.
  2. Control-character stripping — Unicode control characters, zero-width chars, and bidirectional overrides are stripped to defeat obfuscation-based injection.
  3. Tag neutralization — XML/HTML-like tags in user input are stripped to prevent delimiter spoofing (<data>, <system>, <tool>, …).
  4. Length truncation — configurable maxInputLength prevents denial-of-wallet.