AI Endpoints — Workspaces, Models, Usage, Inference

Building an admin UI on top of IChatClient means rebuilding the same five pieces in every project: a workspace CRUD form, a “which provider/model can I pick?” dropdown, a usage dashboard with filters and aggregates, a chat tester, and an embedding playground. Granit.AI.Endpoints ships all five — typed, permission-scoped, OpenAPI-described — so the admin frontend is a thin layer over a documented contract.

You need	This package gives you
Admin UI for AI configuration	`GET/POST/PUT/DELETE /ai/workspaces`
Provider/model dropdown in your UI	`GET /ai/providers`, `GET /ai/providers/{name}/models` (live catalog)
Cost & usage dashboard	`GET /ai/usage` via `Granit.QueryEngine` — filtering, sorting, paging, groupBy with aggregates
Chat sandbox	`POST /ai/chat/{workspace}` (+ SSE stream variant)
Embedding playground	`POST /ai/embeddings/{workspace}`
Locked-down by default	Permission-policy authorization on every route group
OpenAPI for code generation	Tagged groups, `Produces<T>`, `ProblemDetails`

Package structure

DirectoryGranit.AI.Endpoints/
- DirectoryEndpoints/
  - AIWorkspaceEndpoints.cs Workspace CRUD (list, get, create, update, delete)
  - AIProviderEndpoints.cs Provider + model discovery
  - AIChatEndpoints.cs Chat completion (sync + SSE streaming)
  - AIEmbeddingEndpoints.cs Embedding generation proxy
- DirectoryQueries/
  - AIUsageRecordQueryDefinition.cs Usage tracking via Granit.QueryEngine
- DirectoryDtos/ Request/response records
  - …
- DirectoryValidators/ FluentValidation (auto-applied via MapGranitGroup)
  - …
- DirectoryPermissions/ AIPermissions constants
  - …
- DirectoryOptions/ AIEndpointsOptions (route prefix, OpenAPI tags, limits)
  - …
- DirectoryLocalization/ 18 cultures
  - …

Installation

builder.AddGranitAI();                            // Core abstractions
builder.AddGranitAIEntityFrameworkCore(o => ...); // Persistence (workspaces + usage)
builder.AddGranitAIOpenAI();                      // Provider (or AzureOpenAI, Ollama…)

// Route registration
app.MapGranitAI();

// With custom options:
app.MapGranitAI(opts =>
{
    opts.RoutePrefix = "api/ai";
    opts.MaxChatMessages = 50;
    opts.MaxEmbeddingInputs = 25;
});

MapGranitAI() returns the parent RouteGroupBuilder so you can chain framework filters (CORS, rate-limit policies, additional metadata) on the entire AI surface.

Authorization model

Every route is gated by a Granit permission policy — not by a role string. Permissions are defined in AIPermissionDefinitionProvider and surface in the standard Granit.Authorization admin UI for grant management.

Permission	Used by
`AI.Workspaces.Read`	`GET /ai/providers`, `GET /ai/providers/{name}/models`
`AI.Workspaces.Manage`	All `/ai/workspaces` CRUD endpoints
`AI.Usage.Read`	`/ai/usage` query endpoints
`AI.Chat.Execute`	`POST /ai/chat/{workspace}` (+ stream)
`AI.Embeddings.Execute`	`POST /ai/embeddings/{workspace}`

Provider & model discovery

The two endpoints that make admin UIs possible. The frontend asks: “which providers are wired in?” and “what can I pick inside each?”.

`GET /ai/providers`

Lists every IAIProviderFactory registered in DI, with a quick capability summary derived from the provider’s model catalog.

[
  { "name": "OpenAI",      "supportsChat": true,  "supportsEmbeddings": true  },
  { "name": "AzureOpenAI", "supportsChat": true,  "supportsEmbeddings": true  },
  { "name": "Ollama",      "supportsChat": true,  "supportsEmbeddings": true  }
]

`GET /ai/providers/{providerName}/models`

Returns the live model catalog for a single provider:

OpenAI — GET /v1/models, enriched with known capabilities (gpt-4o → vision + tool use; text-embedding-3-* → embeddings only).
Ollama — GET /api/tags (30 s cache), then /api/show per model to map completion, embedding, vision, tools into typed AIModelCapabilities.
Azure OpenAI — the deployments configured for the resource.

[
  {
    "id": "gpt-4o",
    "displayName": "GPT-4o",
    "maxContextTokens": 128000,
    "capabilities": {
      "chat": true,
      "embeddings": false,
      "vision": true,
      "imageGeneration": false,
      "audio": false,
      "toolUse": true,
      "streaming": true,
      "structuredOutput": true,
      "extensions": []
    }
  }
]

HTTP	When
`200 OK`	Catalog returned
`404 Not Found`	No factory registered with that name
`502 Bad Gateway`	Provider does not implement `IAIModelCatalog`, or the remote catalog call failed

This is what powers the frontend “model dropdown” — it knows in advance whether to hide the “Upload image” toggle (because vision: false) or to disable streaming (because the underlying model doesn’t support it).

Workspace management

Method	Route	Behavior
`GET`	`/ai/workspaces`	List all workspaces (system + dynamic) with resolved `Capabilities`
`GET`	`/ai/workspaces/{name}`	Get a workspace by name
`POST`	`/ai/workspaces`	Create a dynamic workspace
`PUT`	`/ai/workspaces/{name}`	Update a dynamic workspace
`DELETE`	`/ai/workspaces/{name}`	Delete a dynamic workspace

Create

// POST /ai/workspaces
{
  "name": "support-chat",
  "provider": "OpenAI",
  "model": "gpt-4o",
  "systemPrompt": "You are a helpful customer support assistant.",
  "temperature": 0.7,
  "maxOutputTokens": 1024
}

// 201 Created
{
  "name": "support-chat",
  "provider": "OpenAI",
  "model": "gpt-4o",
  "systemPrompt": "You are a helpful customer support assistant.",
  "temperature": 0.7,
  "maxOutputTokens": 1024,
  "kind": "Dynamic",
  "activated": true,
  "capabilities": {
    "chat": true, "vision": true, "toolUse": true, "streaming": true,
    "structuredOutput": true, "embeddings": false, "audio": false,
    "imageGeneration": false, "extensions": []
  }
}

Failure	HTTP	Detail
Duplicate workspace name	`409 Conflict`	Caught at the DB level — survives concurrent creates
System workspace modify/delete	`422 Unprocessable Entity`	System workspaces are immutable
Workspace not found	`404 Not Found`	—

The previous v1 contract used isActive. From Granit 0.32, the field is activated on both response and update request (consistent with IActive.Activated across the framework). Plan a frontend migration if you generated clients before 0.32.

Update

PUT /ai/workspaces/{name} replaces provider, model, system prompt, temperature, maxOutputTokens, and the activated flag. Deactivating ("activated": false) keeps the workspace visible in the admin API but blocks execution — chat and embedding calls return 422 Unprocessable Entity (AIWorkspaceNotActiveException) until it’s reactivated.

Usage tracking — powered by Granit.QueryEngine

Instead of custom listing endpoints, usage records are exposed through MapGranitQuery<AIUsageRecord> — filtering, sorting, paging, groupBy with aggregates, saved views, and metadata, all from one query definition.

Route	Purpose
`GET /ai/usage`	Paginated query with filters
`GET /ai/usage/meta`	Query metadata for the frontend (fields, operators, defaults)
`GET /ai/usage/saved-views`	Caller’s saved views

Query definition schema:

Feature	Fields
Filterable	`WorkspaceName`, `Provider`, `Model`, `Timestamp`, `ConversationId`
Sortable	`Timestamp` (default `-timestamp`), `InputTokens`, `OutputTokens`, `EstimatedCost`
GlobalSearch	`WorkspaceName`, `Provider`, `Model`
GroupBy	`WorkspaceName`, `Provider`, `Model`, `ConversationId`
Aggregates	`Sum(InputTokens)`, `Sum(OutputTokens)`, `Sum(EstimatedCost)`
Date filter	`Timestamp` (default `ThisMonth`)

Cost summary by provider:

GET /ai/usage?groupBy=provider&dateRange=last30days

returns one row per provider with summed tokens and estimated cost — exactly what a billing dashboard needs, in a single request.

Cost per agentic-chat conversation:

GET /ai/usage?filter=conversationId eq {id}&groupBy=conversationId

ConversationId (Guid?, null for non-chat calls) is the relational path for per-conversation billing and audit. It is intentionally not an OpenTelemetry metric tag — its cardinality is unbounded — so this query, not a metric dimension, is how you slice cost by conversation. The column lives on ai_usage_records; the consuming app owns the EF migration that adds it. See Usage tracking.

Chat completion proxy

Method	Route	Description
`POST`	`/ai/chat/{workspaceName}`	Synchronous chat completion
`POST`	`/ai/chat/{workspaceName}/stream`	SSE streaming completion

Synchronous
SSE Streaming

POST /ai/chat/support-chat
Content-Type: application/json

{
  "messages": [
    { "role": "user", "content": "Where can I see my invoices?" }
  ]
}

// 200 OK
{
  "workspaceName": "support-chat",
  "model": "gpt-4o",
  "content": "Open the Billing section in the side menu…",
  "usage": {
    "inputTokens": 28,
    "outputTokens": 64,
    "estimatedCost": null,
    "costCurrency": null
  },
  "duration": "00:00:01.234"
}

POST /ai/chat/support-chat/stream
Content-Type: application/json

{ "messages": [{ "role": "user", "content": "Summarize Granit.AI in one sentence." }] }

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"content":"Granit.AI"}

data: {"content":" gives you"}

data: {"content":" provider-agnostic"}

…

event: usage
data: {"inputTokens":12,"outputTokens":24}

data: [DONE]

If the provider errors after the stream has started, a final event: error is emitted instead of switching the HTTP status code (headers are already flushed):

event: error
data: {"error":"Model 'gpt-4o-mini' not found on provider 'OpenAI'."}

Usage tracking happens after the final chunk for streaming responses — Microsoft.Extensions.AI emits a UsageContent final update which is recorded via IAIUsageRecordFactory and persisted by the usage tracker. ISO 27001 audit + cost monitoring keep working even for streaming traffic.

Embedding generation proxy

Method	Route	Description
`POST`	`/ai/embeddings/{workspaceName}`	Generate embeddings for one or more inputs

POST /ai/embeddings/embeddings
Content-Type: application/json

{ "inputs": ["Hello world", "Granit framework"] }

// 200 OK
{
  "workspaceName": "embeddings",
  "model": "text-embedding-3-small",
  "embeddings": [
    { "index": 0, "vector": [0.0123, -0.0456, ...] },
    { "index": 1, "vector": [0.0789, -0.0012, ...] }
  ],
  "usage": { "inputTokens": 8 }
}

Error handling

All errors follow RFC 7807 (ProblemDetails):

Scenario	HTTP	Maps to
Validation failure	`400`	Auto via `FluentValidationAutoEndpointFilter`
Workspace not found	`404`	`AIWorkspaceNotFoundException`
Duplicate workspace name	`409`	DB unique-constraint violation, caught in `EfAIWorkspaceStore`
System workspace modify/delete	`422`	Immutable-kind guard
Workspace deactivated	`422`	`AIWorkspaceNotActiveException`
Quota exceeded (`Reject` mode)	`429`	`AIQuotaOptions.ExceededBehavior = Reject`
Provider unavailable	`502`	Network/timeout to OpenAI/Azure/Ollama
Streaming provider unavailable	`503`	Service-down during a stream attempt

Validation

All request DTOs are auto-validated via MapGranitGroup:

DTO	Rules
`AIWorkspaceCreateRequest`	`Name`: `^[a-z0-9][a-z0-9-]*$`, max 128; `Provider`/`Model`: required; `Temperature`: 0–2; `MaxOutputTokens` > 0
`AIWorkspaceUpdateRequest`	Same as create (without `Name`); `Activated`: required
`AIChatRequest`	`Messages`: required, max 100; `Role`: `user`/`assistant`/`system`; `Content`: max 128 KB
`AIEmbeddingRequest`	`Inputs`: required, max 50; each max 32 KB

Limits on chat messages and embedding inputs are overridable via AIEndpointsOptions.

Configuration

{
  "AI:Endpoints": {
    "RoutePrefix": "ai",
    "MaxChatMessages": 100,
    "MaxEmbeddingInputs": 50,
    "ProvidersTagName": "AI - Providers",
    "WorkspacesTagName": "AI - Workspaces",
    "UsageTagName": "AI - Usage",
    "InferenceTagName": "AI - Inference"
  }
}

OpenAPI tags split the surface into four logical groups — your generated client can emit AIProvidersApi, AIWorkspacesApi, AIUsageApi, AIInferenceApi instead of one monolithic class.