Skip to content

AI Endpoints — Workspaces, Models, Usage, Inference

Building an admin UI on top of IChatClient means rebuilding the same five pieces in every project: a workspace CRUD form, a “which provider/model can I pick?” dropdown, a usage dashboard with filters and aggregates, a chat tester, and an embedding playground. Granit.AI.Endpoints ships all five — typed, permission-scoped, OpenAPI-described — so the admin frontend is a thin layer over a documented contract.

You needThis package gives you
Admin UI for AI configurationGET/POST/PUT/DELETE /ai/workspaces
Provider/model dropdown in your UIGET /ai/providers, GET /ai/providers/{name}/models (live catalog)
Cost & usage dashboardGET /ai/usage via Granit.QueryEngine — filtering, sorting, paging, groupBy with aggregates
Chat sandboxPOST /ai/chat/{workspace} (+ SSE stream variant)
Embedding playgroundPOST /ai/embeddings/{workspace}
Locked-down by defaultPermission-policy authorization on every route group
OpenAPI for code generationTagged groups, Produces<T>, ProblemDetails
  • DirectoryGranit.AI.Endpoints/
    • DirectoryEndpoints/
      • AIWorkspaceEndpoints.cs Workspace CRUD (list, get, create, update, delete)
      • AIProviderEndpoints.cs Provider + model discovery
      • AIChatEndpoints.cs Chat completion (sync + SSE streaming)
      • AIEmbeddingEndpoints.cs Embedding generation proxy
    • DirectoryQueries/
      • AIUsageRecordQueryDefinition.cs Usage tracking via Granit.QueryEngine
    • DirectoryDtos/ Request/response records
    • DirectoryValidators/ FluentValidation (auto-applied via MapGranitGroup)
    • DirectoryPermissions/ AIPermissions constants
    • DirectoryOptions/ AIEndpointsOptions (route prefix, OpenAPI tags, limits)
    • DirectoryLocalization/ 18 cultures
Program.cs
builder.AddGranitAI(); // Core abstractions
builder.AddGranitAIEntityFrameworkCore(o => ...); // Persistence (workspaces + usage)
builder.AddGranitAIOpenAI(); // Provider (or AzureOpenAI, Ollama…)
// Route registration
app.MapGranitAI();
// With custom options:
app.MapGranitAI(opts =>
{
opts.RoutePrefix = "api/ai";
opts.MaxChatMessages = 50;
opts.MaxEmbeddingInputs = 25;
});

MapGranitAI() returns the parent RouteGroupBuilder so you can chain framework filters (CORS, rate-limit policies, additional metadata) on the entire AI surface.

Every route is gated by a Granit permission policy — not by a role string. Permissions are defined in AIPermissionDefinitionProvider and surface in the standard Granit.Authorization admin UI for grant management.

PermissionUsed by
AI.Workspaces.ReadGET /ai/providers, GET /ai/providers/{name}/models
AI.Workspaces.ManageAll /ai/workspaces CRUD endpoints
AI.Usage.Read/ai/usage query endpoints
AI.Chat.ExecutePOST /ai/chat/{workspace} (+ stream)
AI.Embeddings.ExecutePOST /ai/embeddings/{workspace}

The two endpoints that make admin UIs possible. The frontend asks: “which providers are wired in?” and “what can I pick inside each?”.

Lists every IAIProviderFactory registered in DI, with a quick capability summary derived from the provider’s model catalog.

[
{ "name": "OpenAI", "supportsChat": true, "supportsEmbeddings": true },
{ "name": "AzureOpenAI", "supportsChat": true, "supportsEmbeddings": true },
{ "name": "Ollama", "supportsChat": true, "supportsEmbeddings": true }
]

Returns the live model catalog for a single provider:

  • OpenAIGET /v1/models, enriched with known capabilities (gpt-4o → vision + tool use; text-embedding-3-* → embeddings only).
  • OllamaGET /api/tags (30 s cache), then /api/show per model to map completion, embedding, vision, tools into typed AIModelCapabilities.
  • Azure OpenAI — the deployments configured for the resource.
[
{
"id": "gpt-4o",
"displayName": "GPT-4o",
"maxContextTokens": 128000,
"capabilities": {
"chat": true,
"embeddings": false,
"vision": true,
"imageGeneration": false,
"audio": false,
"toolUse": true,
"streaming": true,
"structuredOutput": true,
"extensions": []
}
}
]
HTTPWhen
200 OKCatalog returned
404 Not FoundNo factory registered with that name
502 Bad GatewayProvider does not implement IAIModelCatalog, or the remote catalog call failed

This is what powers the frontend “model dropdown” — it knows in advance whether to hide the “Upload image” toggle (because vision: false) or to disable streaming (because the underlying model doesn’t support it).

MethodRouteBehavior
GET/ai/workspacesList all workspaces (system + dynamic) with resolved Capabilities
GET/ai/workspaces/{name}Get a workspace by name
POST/ai/workspacesCreate a dynamic workspace
PUT/ai/workspaces/{name}Update a dynamic workspace
DELETE/ai/workspaces/{name}Delete a dynamic workspace
// POST /ai/workspaces
{
"name": "support-chat",
"provider": "OpenAI",
"model": "gpt-4o",
"systemPrompt": "You are a helpful customer support assistant.",
"temperature": 0.7,
"maxOutputTokens": 1024
}
// 201 Created
{
"name": "support-chat",
"provider": "OpenAI",
"model": "gpt-4o",
"systemPrompt": "You are a helpful customer support assistant.",
"temperature": 0.7,
"maxOutputTokens": 1024,
"kind": "Dynamic",
"activated": true,
"capabilities": {
"chat": true, "vision": true, "toolUse": true, "streaming": true,
"structuredOutput": true, "embeddings": false, "audio": false,
"imageGeneration": false, "extensions": []
}
}
FailureHTTPDetail
Duplicate workspace name409 ConflictCaught at the DB level — survives concurrent creates
System workspace modify/delete422 Unprocessable EntitySystem workspaces are immutable
Workspace not found404 Not Found

The previous v1 contract used isActive. From Granit 0.32, the field is activated on both response and update request (consistent with IActive.Activated across the framework). Plan a frontend migration if you generated clients before 0.32.

PUT /ai/workspaces/{name} replaces provider, model, system prompt, temperature, maxOutputTokens, and the activated flag. Deactivating ("activated": false) keeps the workspace visible in the admin API but blocks execution — chat and embedding calls return 422 Unprocessable Entity (AIWorkspaceNotActiveException) until it’s reactivated.

Usage tracking — powered by Granit.QueryEngine

Section titled “Usage tracking — powered by Granit.QueryEngine”

Instead of custom listing endpoints, usage records are exposed through MapGranitQuery<AIUsageRecord> — filtering, sorting, paging, groupBy with aggregates, saved views, and metadata, all from one query definition.

RoutePurpose
GET /ai/usagePaginated query with filters
GET /ai/usage/metaQuery metadata for the frontend (fields, operators, defaults)
GET /ai/usage/saved-viewsCaller’s saved views

Query definition schema:

FeatureFields
FilterableWorkspaceName, Provider, Model, Timestamp, ConversationId
SortableTimestamp (default -timestamp), InputTokens, OutputTokens, EstimatedCost
GlobalSearchWorkspaceName, Provider, Model
GroupByWorkspaceName, Provider, Model, ConversationId
AggregatesSum(InputTokens), Sum(OutputTokens), Sum(EstimatedCost)
Date filterTimestamp (default ThisMonth)

Cost summary by provider:

GET /ai/usage?groupBy=provider&dateRange=last30days

returns one row per provider with summed tokens and estimated cost — exactly what a billing dashboard needs, in a single request.

Cost per agentic-chat conversation:

GET /ai/usage?filter=conversationId eq {id}&groupBy=conversationId

ConversationId (Guid?, null for non-chat calls) is the relational path for per-conversation billing and audit. It is intentionally not an OpenTelemetry metric tag — its cardinality is unbounded — so this query, not a metric dimension, is how you slice cost by conversation. The column lives on ai_usage_records; the consuming app owns the EF migration that adds it. See Usage tracking.

MethodRouteDescription
POST/ai/chat/{workspaceName}Synchronous chat completion
POST/ai/chat/{workspaceName}/streamSSE streaming completion
POST /ai/chat/support-chat
Content-Type: application/json
{
"messages": [
{ "role": "user", "content": "Where can I see my invoices?" }
]
}
// 200 OK
{
"workspaceName": "support-chat",
"model": "gpt-4o",
"content": "Open the Billing section in the side menu…",
"usage": {
"inputTokens": 28,
"outputTokens": 64,
"estimatedCost": null,
"costCurrency": null
},
"duration": "00:00:01.234"
}

Usage tracking happens after the final chunk for streaming responses — Microsoft.Extensions.AI emits a UsageContent final update which is recorded via IAIUsageRecordFactory and persisted by the usage tracker. ISO 27001 audit + cost monitoring keep working even for streaming traffic.

MethodRouteDescription
POST/ai/embeddings/{workspaceName}Generate embeddings for one or more inputs
POST /ai/embeddings/embeddings
Content-Type: application/json
{ "inputs": ["Hello world", "Granit framework"] }
// 200 OK
{
"workspaceName": "embeddings",
"model": "text-embedding-3-small",
"embeddings": [
{ "index": 0, "vector": [0.0123, -0.0456, ...] },
{ "index": 1, "vector": [0.0789, -0.0012, ...] }
],
"usage": { "inputTokens": 8 }
}

All errors follow RFC 7807 (ProblemDetails):

ScenarioHTTPMaps to
Validation failure400Auto via FluentValidationAutoEndpointFilter
Workspace not found404AIWorkspaceNotFoundException
Duplicate workspace name409DB unique-constraint violation, caught in EfAIWorkspaceStore
System workspace modify/delete422Immutable-kind guard
Workspace deactivated422AIWorkspaceNotActiveException
Quota exceeded (Reject mode)429AIQuotaOptions.ExceededBehavior = Reject
Provider unavailable502Network/timeout to OpenAI/Azure/Ollama
Streaming provider unavailable503Service-down during a stream attempt

All request DTOs are auto-validated via MapGranitGroup:

DTORules
AIWorkspaceCreateRequestName: ^[a-z0-9][a-z0-9-]*$, max 128; Provider/Model: required; Temperature: 0–2; MaxOutputTokens > 0
AIWorkspaceUpdateRequestSame as create (without Name); Activated: required
AIChatRequestMessages: required, max 100; Role: user/assistant/system; Content: max 128 KB
AIEmbeddingRequestInputs: required, max 50; each max 32 KB

Limits on chat messages and embedding inputs are overridable via AIEndpointsOptions.

{
"AI:Endpoints": {
"RoutePrefix": "ai",
"MaxChatMessages": 100,
"MaxEmbeddingInputs": 50,
"ProvidersTagName": "AI - Providers",
"WorkspacesTagName": "AI - Workspaces",
"UsageTagName": "AI - Usage",
"InferenceTagName": "AI - Inference"
}
}

OpenAPI tags split the surface into four logical groups — your generated client can emit AIProvidersApi, AIWorkspacesApi, AIUsageApi, AIInferenceApi instead of one monolithic class.