AI Endpoints — Workspaces, Models, Usage, Inference
Building an admin UI on top of IChatClient means rebuilding the same five pieces in
every project: a workspace CRUD form, a “which provider/model can I pick?” dropdown, a
usage dashboard with filters and aggregates, a chat tester, and an embedding playground.
Granit.AI.Endpoints ships all five — typed, permission-scoped, OpenAPI-described — so
the admin frontend is a thin layer over a documented contract.
| You need | This package gives you |
|---|---|
| Admin UI for AI configuration | GET/POST/PUT/DELETE /ai/workspaces |
| Provider/model dropdown in your UI | GET /ai/providers, GET /ai/providers/{name}/models (live catalog) |
| Cost & usage dashboard | GET /ai/usage via Granit.QueryEngine — filtering, sorting, paging, groupBy with aggregates |
| Chat sandbox | POST /ai/chat/{workspace} (+ SSE stream variant) |
| Embedding playground | POST /ai/embeddings/{workspace} |
| Locked-down by default | Permission-policy authorization on every route group |
| OpenAPI for code generation | Tagged groups, Produces<T>, ProblemDetails |
Package structure
Section titled “Package structure”DirectoryGranit.AI.Endpoints/
DirectoryEndpoints/
- AIWorkspaceEndpoints.cs Workspace CRUD (list, get, create, update, delete)
- AIProviderEndpoints.cs Provider + model discovery
- AIChatEndpoints.cs Chat completion (sync + SSE streaming)
- AIEmbeddingEndpoints.cs Embedding generation proxy
DirectoryQueries/
- AIUsageRecordQueryDefinition.cs Usage tracking via Granit.QueryEngine
DirectoryDtos/ Request/response records
- …
DirectoryValidators/ FluentValidation (auto-applied via MapGranitGroup)
- …
DirectoryPermissions/ AIPermissions constants
- …
DirectoryOptions/ AIEndpointsOptions (route prefix, OpenAPI tags, limits)
- …
DirectoryLocalization/ 18 cultures
- …
Installation
Section titled “Installation”builder.AddGranitAI(); // Core abstractionsbuilder.AddGranitAIEntityFrameworkCore(o => ...); // Persistence (workspaces + usage)builder.AddGranitAIOpenAI(); // Provider (or AzureOpenAI, Ollama…)
// Route registrationapp.MapGranitAI();
// With custom options:app.MapGranitAI(opts =>{ opts.RoutePrefix = "api/ai"; opts.MaxChatMessages = 50; opts.MaxEmbeddingInputs = 25;});MapGranitAI() returns the parent RouteGroupBuilder so you can chain framework
filters (CORS, rate-limit policies, additional metadata) on the entire AI surface.
Authorization model
Section titled “Authorization model”Every route is gated by a Granit permission policy — not by a role string. Permissions
are defined in AIPermissionDefinitionProvider and surface in the standard
Granit.Authorization admin UI for grant management.
| Permission | Used by |
|---|---|
AI.Workspaces.Read | GET /ai/providers, GET /ai/providers/{name}/models |
AI.Workspaces.Manage | All /ai/workspaces CRUD endpoints |
AI.Usage.Read | /ai/usage query endpoints |
AI.Chat.Execute | POST /ai/chat/{workspace} (+ stream) |
AI.Embeddings.Execute | POST /ai/embeddings/{workspace} |
Provider & model discovery
Section titled “Provider & model discovery”The two endpoints that make admin UIs possible. The frontend asks: “which providers are wired in?” and “what can I pick inside each?”.
GET /ai/providers
Section titled “GET /ai/providers”Lists every IAIProviderFactory registered in DI, with a quick capability summary
derived from the provider’s model catalog.
[ { "name": "OpenAI", "supportsChat": true, "supportsEmbeddings": true }, { "name": "AzureOpenAI", "supportsChat": true, "supportsEmbeddings": true }, { "name": "Ollama", "supportsChat": true, "supportsEmbeddings": true }]GET /ai/providers/{providerName}/models
Section titled “GET /ai/providers/{providerName}/models”Returns the live model catalog for a single provider:
- OpenAI —
GET /v1/models, enriched with known capabilities (gpt-4o→ vision + tool use;text-embedding-3-*→ embeddings only). - Ollama —
GET /api/tags(30 s cache), then/api/showper model to mapcompletion,embedding,vision,toolsinto typedAIModelCapabilities. - Azure OpenAI — the deployments configured for the resource.
[ { "id": "gpt-4o", "displayName": "GPT-4o", "maxContextTokens": 128000, "capabilities": { "chat": true, "embeddings": false, "vision": true, "imageGeneration": false, "audio": false, "toolUse": true, "streaming": true, "structuredOutput": true, "extensions": [] } }]| HTTP | When |
|---|---|
200 OK | Catalog returned |
404 Not Found | No factory registered with that name |
502 Bad Gateway | Provider does not implement IAIModelCatalog, or the remote catalog call failed |
This is what powers the frontend “model dropdown” — it knows in advance whether to
hide the “Upload image” toggle (because vision: false) or to disable streaming
(because the underlying model doesn’t support it).
Workspace management
Section titled “Workspace management”| Method | Route | Behavior |
|---|---|---|
GET | /ai/workspaces | List all workspaces (system + dynamic) with resolved Capabilities |
GET | /ai/workspaces/{name} | Get a workspace by name |
POST | /ai/workspaces | Create a dynamic workspace |
PUT | /ai/workspaces/{name} | Update a dynamic workspace |
DELETE | /ai/workspaces/{name} | Delete a dynamic workspace |
Create
Section titled “Create”// POST /ai/workspaces{ "name": "support-chat", "provider": "OpenAI", "model": "gpt-4o", "systemPrompt": "You are a helpful customer support assistant.", "temperature": 0.7, "maxOutputTokens": 1024}
// 201 Created{ "name": "support-chat", "provider": "OpenAI", "model": "gpt-4o", "systemPrompt": "You are a helpful customer support assistant.", "temperature": 0.7, "maxOutputTokens": 1024, "kind": "Dynamic", "activated": true, "capabilities": { "chat": true, "vision": true, "toolUse": true, "streaming": true, "structuredOutput": true, "embeddings": false, "audio": false, "imageGeneration": false, "extensions": [] }}| Failure | HTTP | Detail |
|---|---|---|
| Duplicate workspace name | 409 Conflict | Caught at the DB level — survives concurrent creates |
| System workspace modify/delete | 422 Unprocessable Entity | System workspaces are immutable |
| Workspace not found | 404 Not Found | — |
The previous v1 contract used isActive. From Granit 0.32, the field is activated
on both response and update request (consistent with IActive.Activated across the
framework). Plan a frontend migration if you generated clients before 0.32.
Update
Section titled “Update”PUT /ai/workspaces/{name} replaces provider, model, system prompt, temperature,
maxOutputTokens, and the activated flag. Deactivating ("activated": false) keeps
the workspace visible in the admin API but blocks execution — chat and embedding calls
return 422 Unprocessable Entity (AIWorkspaceNotActiveException) until it’s
reactivated.
Usage tracking — powered by Granit.QueryEngine
Section titled “Usage tracking — powered by Granit.QueryEngine”Instead of custom listing endpoints, usage records are exposed through
MapGranitQuery<AIUsageRecord> — filtering, sorting, paging, groupBy with aggregates,
saved views, and metadata, all from one query definition.
| Route | Purpose |
|---|---|
GET /ai/usage | Paginated query with filters |
GET /ai/usage/meta | Query metadata for the frontend (fields, operators, defaults) |
GET /ai/usage/saved-views | Caller’s saved views |
Query definition schema:
| Feature | Fields |
|---|---|
| Filterable | WorkspaceName, Provider, Model, Timestamp, ConversationId |
| Sortable | Timestamp (default -timestamp), InputTokens, OutputTokens, EstimatedCost |
| GlobalSearch | WorkspaceName, Provider, Model |
| GroupBy | WorkspaceName, Provider, Model, ConversationId |
| Aggregates | Sum(InputTokens), Sum(OutputTokens), Sum(EstimatedCost) |
| Date filter | Timestamp (default ThisMonth) |
Cost summary by provider:
GET /ai/usage?groupBy=provider&dateRange=last30daysreturns one row per provider with summed tokens and estimated cost — exactly what a billing dashboard needs, in a single request.
Cost per agentic-chat conversation:
GET /ai/usage?filter=conversationId eq {id}&groupBy=conversationIdConversationId (Guid?, null for non-chat calls) is the relational path for
per-conversation billing and audit. It is intentionally not an OpenTelemetry metric
tag — its cardinality is unbounded — so this query, not a metric dimension, is how you
slice cost by conversation. The column lives on ai_usage_records; the consuming app
owns the EF migration that adds it. See Usage tracking.
Chat completion proxy
Section titled “Chat completion proxy”| Method | Route | Description |
|---|---|---|
POST | /ai/chat/{workspaceName} | Synchronous chat completion |
POST | /ai/chat/{workspaceName}/stream | SSE streaming completion |
POST /ai/chat/support-chatContent-Type: application/json
{ "messages": [ { "role": "user", "content": "Where can I see my invoices?" } ]}// 200 OK{ "workspaceName": "support-chat", "model": "gpt-4o", "content": "Open the Billing section in the side menu…", "usage": { "inputTokens": 28, "outputTokens": 64, "estimatedCost": null, "costCurrency": null }, "duration": "00:00:01.234"}POST /ai/chat/support-chat/streamContent-Type: application/json
{ "messages": [{ "role": "user", "content": "Summarize Granit.AI in one sentence." }] }HTTP/1.1 200 OKContent-Type: text/event-stream
data: {"content":"Granit.AI"}
data: {"content":" gives you"}
data: {"content":" provider-agnostic"}
…
event: usagedata: {"inputTokens":12,"outputTokens":24}
data: [DONE]If the provider errors after the stream has started, a final event: error is
emitted instead of switching the HTTP status code (headers are already flushed):
event: errordata: {"error":"Model 'gpt-4o-mini' not found on provider 'OpenAI'."}Usage tracking happens after the final chunk for streaming responses — Microsoft.Extensions.AI
emits a UsageContent final update which is recorded via IAIUsageRecordFactory and
persisted by the usage tracker. ISO 27001 audit + cost monitoring keep working even
for streaming traffic.
Embedding generation proxy
Section titled “Embedding generation proxy”| Method | Route | Description |
|---|---|---|
POST | /ai/embeddings/{workspaceName} | Generate embeddings for one or more inputs |
POST /ai/embeddings/embeddingsContent-Type: application/json
{ "inputs": ["Hello world", "Granit framework"] }// 200 OK{ "workspaceName": "embeddings", "model": "text-embedding-3-small", "embeddings": [ { "index": 0, "vector": [0.0123, -0.0456, ...] }, { "index": 1, "vector": [0.0789, -0.0012, ...] } ], "usage": { "inputTokens": 8 }}Error handling
Section titled “Error handling”All errors follow RFC 7807 (ProblemDetails):
| Scenario | HTTP | Maps to |
|---|---|---|
| Validation failure | 400 | Auto via FluentValidationAutoEndpointFilter |
| Workspace not found | 404 | AIWorkspaceNotFoundException |
| Duplicate workspace name | 409 | DB unique-constraint violation, caught in EfAIWorkspaceStore |
| System workspace modify/delete | 422 | Immutable-kind guard |
| Workspace deactivated | 422 | AIWorkspaceNotActiveException |
Quota exceeded (Reject mode) | 429 | AIQuotaOptions.ExceededBehavior = Reject |
| Provider unavailable | 502 | Network/timeout to OpenAI/Azure/Ollama |
| Streaming provider unavailable | 503 | Service-down during a stream attempt |
Validation
Section titled “Validation”All request DTOs are auto-validated via MapGranitGroup:
| DTO | Rules |
|---|---|
AIWorkspaceCreateRequest | Name: ^[a-z0-9][a-z0-9-]*$, max 128; Provider/Model: required; Temperature: 0–2; MaxOutputTokens > 0 |
AIWorkspaceUpdateRequest | Same as create (without Name); Activated: required |
AIChatRequest | Messages: required, max 100; Role: user/assistant/system; Content: max 128 KB |
AIEmbeddingRequest | Inputs: required, max 50; each max 32 KB |
Limits on chat messages and embedding inputs are overridable via AIEndpointsOptions.
Configuration
Section titled “Configuration”{ "AI:Endpoints": { "RoutePrefix": "ai", "MaxChatMessages": 100, "MaxEmbeddingInputs": 50, "ProvidersTagName": "AI - Providers", "WorkspacesTagName": "AI - Workspaces", "UsageTagName": "AI - Usage", "InferenceTagName": "AI - Inference" }}OpenAPI tags split the surface into four logical groups — your generated client can
emit AIProvidersApi, AIWorkspacesApi, AIUsageApi, AIInferenceApi instead of one
monolithic class.
See also
Section titled “See also”- AI: Setup — providers, workspaces, quotas, audit trail
- Authorization — permission grant administration
- QueryEngine — the engine powering
/ai/usage - AI: Semantic Search — vector storage and RAG
- AI: Document Extraction — structured extraction from PDFs