Agentic Chat — a streaming, tool-using conversation engine
Structured Completion turns one prompt into one
typed object. Agentic Chat is the other shape of AI: a long-lived conversation where
the model thinks, calls your tools, streams its answer token-by-token, and occasionally
stops to ask the user a question. Granit.AI.Chat is the engine behind that experience —
the Claude.ai / ChatGPT interaction, owner-scoped and multi-tenant, sitting on the same
provider-agnostic AI workspace as every other .AI module.
It is delivered as a family of packages so you compose only what you need: the core domain and chat service, plus opt-in satellites for HTTP, persistence, attachments, privacy, and retention.
The package family
Section titled “The package family”- Granit.AI.Chat core — domain,
IChatService, extension seams - Granit.AI.Chat.Endpoints REST + SSE streaming endpoints
- Granit.AI.Chat.EntityFrameworkCore conversation persistence (isolated DbContext)
- Granit.AI.Chat.BlobStorage resolve attachments from blob storage
- Granit.AI.Chat.Privacy GDPR export + right-to-erasure
- Granit.AI.Chat.BackgroundJobs retention cleanup (data minimization)
Each satellite references the core and activates a capability through DI composition — the same soft-dependency rule as the rest of the framework. Install only what the deployment needs.
A working chat backend is the core module, persistence, and the HTTP surface. The core module registers itself; the satellites expose explicit registration calls.
builder.AddGranitAI(); // the provider-agnostic workspace layerbuilder.AddGranitAIOllama(); // or OpenAI / Azure OpenAI / Anthropic
// Conversation persistence — isolated, tenant-aware DbContextbuilder.Services.AddGranitAIChatEntityFrameworkCore( shared => shared.UseNpgsql(connectionString));
// Attachments backed by Granit.BlobStorage (optional)builder.Services.AddBlobStorageChatAttachments(options =>{ options.MaxAttachments = 3; options.MaxAttachmentBytes = 5 * 1024 * 1024; // 5 MiB});
// GDPR participation (optional)builder.Services.AddGranitPrivacy() .AddGranitAIChatPrivacyProvider();// Map the REST + SSE endpoints under /conversationsapp.MapGranitConversations();The core GranitAIChatModule registers IChatService, the workspace catalog, the
clarification tool, and the mention/suggestion/attachment resolvers — all scoped.
How a turn works
Section titled “How a turn works”The chat service is two-phase by design: a synchronous PrepareAsync that validates
and resolves context, then an IAsyncEnumerable StreamAsync that runs the agentic loop.
The split lets the endpoint map a bad request to a clean 404/422 before the SSE
stream opens — once bytes are flowing, you can no longer change the status code.
namespace Granit.AI.Chat;
public interface IChatService{ Task<ChatSendHandle> PrepareAsync( ChatSendRequest request, CancellationToken cancellationToken = default);
IAsyncEnumerable<ChatTurnUpdate> StreamAsync( ChatSendHandle handle, CancellationToken cancellationToken = default);}PrepareAsync checks that the target workspace is chat-capable, that the caller owns the
conversation, and resolves mentions, attachments, and prompt references into the per-turn
context. The returned ChatSendHandle already knows its ConversationId — even for a
brand-new conversation — so the endpoint can flush it immediately.
sequenceDiagram
participant C as Client
participant E as Conversations endpoint
participant S as IChatService
participant L as FunctionInvokingChatClient
C->>E: POST /conversations/messages
E->>S: PrepareAsync(request)
S-->>E: handle (ConversationId known)
E-->>C: SSE: conversation frame (headers commit in ms)
E->>S: StreamAsync(handle)
S->>S: persist user message
loop agentic loop
S->>L: GetStreamingResponseAsync
L-->>S: delta / tool_call / tool_result
S-->>C: SSE frames
end
S->>S: persist assistant message
S-->>C: SSE: persisted, usage, [suggestions]
StreamAsync persists the user turn before the loop runs (so a mid-stream crash never
loses the question) and appends the assistant turn once the loop settles. Usage is
stamped even if the client disconnects mid-stream.
The streaming updates
Section titled “The streaming updates”StreamAsync yields ChatTurnUpdate records, each tagged by ChatTurnUpdateKind:
| Kind | Carries | Meaning |
|---|---|---|
Delta | Delta | One slice of assistant text |
ToolCall | ToolName, ToolCallId | A tool started |
ToolResult | ToolName, ToolCallId, Succeeded | A tool finished |
Completed | Result (ChatSendResult) | Terminal — final answer, usage, suggestions, clarification |
The terminal ChatSendResult carries the final Content, InputTokens / OutputTokens,
the persisted message rows, any SuggestedActions, an optional Clarification, and
MaxIterationsReached.
The SSE wire protocol
Section titled “The SSE wire protocol”The endpoint projects those updates onto Server-Sent Events. The conversation frame is
flushed first — its id comes from the handle — so time-to-first-byte is milliseconds,
not the full agent duration. Tool frames carry only the tool name and call id, never
arguments or raw results (privacy); the client maps the name to a localized label.
| Frame | Payload | When |
|---|---|---|
conversation | conversationId | first, immediately |
delta | content | per assistant token |
tool_call | toolName, toolCallId | a tool started |
tool_result | toolName, toolCallId, succeeded | a tool finished |
persisted | messages | terminal — the turn’s two saved rows (user, then assistant) |
usage | inputTokens, outputTokens | terminal |
suggestions / clarification | … | terminal |
error | code (rate_limit, provider_unavailable, server_error) | terminal, on failure |
Tool frames are de-duplicated by call id (streamed function-call fragments arrive in
pieces). No thinking frame is sent — the client derives a “Thinking…” indicator from a
tool_result not yet followed by a delta. Unknown frames are ignored by older clients,
so the protocol extends without breaking them.
REST endpoints
Section titled “REST endpoints”MapGranitConversations() maps the full surface under /conversations (configurable via
AIChatEndpointsOptions.RoutePrefix). Every route is owner-scoped — a caller only ever
sees their own conversations.
| Method | Path | Permission | Purpose |
|---|---|---|---|
GET | / | AIChat.Conversations.Read | List the caller’s conversations (newest first) |
POST | / | AIChat.Conversations.Manage | Create a conversation |
GET | /{id} | AIChat.Conversations.Read | Conversation metadata (messages paged separately) |
PUT | /{id}/title | AIChat.Conversations.Manage | Rename |
PUT | /{id}/favorite | AIChat.Conversations.Manage | Toggle favorite (idempotent) |
DELETE | /{id} | AIChat.Conversations.Delete | Soft-delete |
GET | /{id}/messages | AIChat.Conversations.Read | Keyset-paginated message page (newest first) |
POST | /messages | AIChat.Conversations.Send | Send a message, stream the answer over SSE |
POST | /messages/{messageId}/report | AIChat.Conversations.Report | Flag a message for review |
GET | /workspaces | AIChat.Conversations.Read | List selectable chat workspaces |
The send endpoint is bound to the rate-limit policy ai-chat-send. The module wires
the policy but enforces nothing until the host configures it — bound the per-user model
spend to prevent “denial of wallet”:
"RateLimiting": { "Policies": { "ai-chat-send": { "PermitLimit": 20, "Window": "00:01:00", "PartitionBy": "User" } }}SendMessageRequest is validated before it reaches the service: message non-empty and
≤ 16 000 chars, ≤ 25 mentions, ≤ 5 prompt refs, and attachment count/type/size within the
configured attachment limits.
The domain model
Section titled “The domain model”Three aggregates, all multi-tenant and owner-stamped. Messages are append-only — there is no edit or in-place delete; a conversation is an immutable transcript.
classDiagram
class Conversation {
+Guid Id
+string Title
+Guid OwnerId
+bool IsFavorite
+string? WorkspaceKey
+AddMessage(role, content)
+Rename(title)
+SetFavorite(flag)
}
class Message {
+Guid Id
+MessageRole Role
+string Content
+string? WorkspaceKey
+DateTimeOffset CreatedAt
}
class MessageReport {
+Guid Id
+Guid MessageId
+Guid OwnerId
+string Reason
+MessageReportCategory? Category
}
Conversation "1" --> "*" Message : append-only
Message "1" --> "*" MessageReport : flagged by owner
MessageRole is User, Assistant, System, or Tool. MessageReportCategory is
Inaccurate, Harmful, or Other, and creating a report raises a
ChatMessageReportedEvent carrying only identifiers and the user’s reason — never the
message content or any tool data.
The EF Core satellite registers an isolated AIChatDbContext and maps three tables
(ai_chat_conversations, ai_chat_messages, ai_chat_message_reports, prefix
configurable via GranitAIChatDbProperties). Conversations are indexed by
(TenantId, OwnerId, CreatedAt) for the “my conversations, newest first” query;
messages by (ConversationId, CreatedAt) for thread pagination. Two store seams sit on
top: IConversationStore (owner-scoped CRUD) and IConversationDataManager
(privacy/retention operations that deliberately cross the owner boundary).
Extension points
Section titled “Extension points”The engine is generic; you make it useful by plugging in your application’s context. Four seams, all optional.
Attachments
Section titled “Attachments”Implement IAIAttachmentSource to resolve an opaque reference into bytes; the framework
extracts the text and injects it as an untrusted context block. Granit.AI.Chat.BlobStorage
ships a ready-made source over Granit.BlobStorage — register it with
AddBlobStorageChatAttachments. To wire your own:
public sealed class InvoiceAttachmentSource(IInvoiceStore store) : IAIAttachmentSource{ public async Task<AIAttachmentData?> GetAsync( string reference, CancellationToken cancellationToken = default) { var invoice = await store.FindPdfAsync(reference, cancellationToken) .ConfigureAwait(false); return invoice is null ? null // absent or not accessible under the caller's ACLs : new AIAttachmentData(invoice.Bytes, "application/pdf", invoice.FileName); }}
builder.Services.AddGranitChatAttachments<InvoiceAttachmentSource>(options => options.AllowedContentTypes.Add("application/pdf"));GranitAIChatAttachmentOptions (config section AI:Chat:Attachments) defaults to
5 attachments, 10 MiB each, and a permissive set of text/Office/PDF MIME types.
Mentions
Section titled “Mentions”When a user types @Dr Smith, the client sends an AIMention(Type, Id). Implement
IAIMentionContextResolver to turn resolved mentions into a context block — and to enforce
ACLs by returning null for references the caller may not see:
public sealed class DoctorMentionResolver(IDoctorReader doctors) : IAIMentionContextResolver{ public async ValueTask<string?> ResolveContextAsync( IReadOnlyList<AIMention> mentions, CancellationToken cancellationToken = default) { var doctor = mentions.FirstOrDefault(m => m.Type == "doctor"); if (doctor is null) return null; var profile = await doctors.GetAsync(Guid.Parse(doctor.Id), cancellationToken) .ConfigureAwait(false); return profile is null ? null : $"Mentioned doctor: {profile.FullName}, {profile.Specialty}."; }}Suggested actions
Section titled “Suggested actions”A provider returns declarative call-to-actions — deep links the UI renders as chips. They are never auto-invoked; the user decides.
public sealed class ConnectCalendarSuggestion : IAISuggestionProvider{ public ValueTask<IReadOnlyList<AISuggestedAction>> GetSuggestionsAsync( AISuggestionContext context, CancellationToken cancellationToken = default) { IReadOnlyList<AISuggestedAction> actions = context.Message.Contains("appointment") ? [new AISuggestedAction("calendar.connect", "Connect your calendar", "/settings/calendar")] : []; return ValueTask.FromResult(actions); }}
builder.Services.AddGranitChatSuggestions(b => b.Add<ConnectCalendarSuggestion>());Clarifying questions
Section titled “Clarifying questions”The engine registers a built-in request_clarification tool. When the model is
ambiguous, it calls the tool, which halts the loop and surfaces an
AIClarificationRequest (a Question, discrete Options, and an AllowOther flag)
instead of a final answer. The client renders the choices; the user’s pick becomes the
next turn, and the loop resumes normally. No code on your side — it ships in the core.
Per-user settings
Section titled “Per-user settings”Three user-scoped settings (prefix Granit.AI.Chat., auto-discovered by the settings
module) shape every turn:
| Setting | Values | Effect |
|---|---|---|
DefaultWorkspace | a workspace name, or Auto | Which workspace a new conversation uses |
WebSearchPolicy | Deny · Allow · AlwaysAsk | Whether the agent may search the web (default Deny) |
CustomContext | free text, ≤ 4000 chars | Persistent user context injected each turn |
IChatWorkspaceCatalog.GetSelectableWorkspacesAsync returns Auto first, then every
chat-capable workspace — the same list the /workspaces endpoint serves.
Privacy, retention & diagnostics
Section titled “Privacy, retention & diagnostics”Granit.AI.Chat.Privacy plugs into the framework’s GDPR pipeline via
AddGranitAIChatPrivacyProvider(). It exports a user’s conversations, messages, and
report reasons as an ai-chat-conversations.json fragment (Articles 15/20) and
hard-deletes all of them on an erasure request (Article 17) — soft delete is forbidden
here, since a recoverable transcript is not erased. Attachment content is never exported;
it is transient and owned by blob storage.
Granit.AI.Chat.BackgroundJobs adds a distributed retention job. It is opt-in: set
AI:Chat:Retention:RetentionDays (default 0 disables cleanup) and conversations idle
past the window are purged in batches of CleanupBatchSize (default 500).
The core emits OpenTelemetry under the Granit.AI.Chat activity source and meter:
| Metric | Meaning |
|---|---|
granit.ai.chat.conversation.created | New conversations |
granit.ai.chat.turn.completed | Completed turns |
granit.ai.chat.tokens.input / .output | Token usage per turn |
All are tagged tenant_id (coalesced to global) and workspace_key.
See also
Section titled “See also”- Setup & Configuration — providers, workspaces, capability flags
- Structured Completion — typed single-shot output
- API Endpoints — the non-agentic workspace chat/embedding proxy
- ADR-067 — conversation model, tool registry, prompt catalog
- ADR-071 — agentic streaming over SSE