Skip to content

Agentic Chat — a streaming, tool-using conversation engine

Structured Completion turns one prompt into one typed object. Agentic Chat is the other shape of AI: a long-lived conversation where the model thinks, calls your tools, streams its answer token-by-token, and occasionally stops to ask the user a question. Granit.AI.Chat is the engine behind that experience — the Claude.ai / ChatGPT interaction, owner-scoped and multi-tenant, sitting on the same provider-agnostic AI workspace as every other .AI module.

It is delivered as a family of packages so you compose only what you need: the core domain and chat service, plus opt-in satellites for HTTP, persistence, attachments, privacy, and retention.

  • Granit.AI.Chat core — domain, IChatService, extension seams
  • Granit.AI.Chat.Endpoints REST + SSE streaming endpoints
  • Granit.AI.Chat.EntityFrameworkCore conversation persistence (isolated DbContext)
  • Granit.AI.Chat.BlobStorage resolve attachments from blob storage
  • Granit.AI.Chat.Privacy GDPR export + right-to-erasure
  • Granit.AI.Chat.BackgroundJobs retention cleanup (data minimization)

Each satellite references the core and activates a capability through DI composition — the same soft-dependency rule as the rest of the framework. Install only what the deployment needs.

A working chat backend is the core module, persistence, and the HTTP surface. The core module registers itself; the satellites expose explicit registration calls.

Program.cs
builder.AddGranitAI(); // the provider-agnostic workspace layer
builder.AddGranitAIOllama(); // or OpenAI / Azure OpenAI / Anthropic
// Conversation persistence — isolated, tenant-aware DbContext
builder.Services.AddGranitAIChatEntityFrameworkCore(
shared => shared.UseNpgsql(connectionString));
// Attachments backed by Granit.BlobStorage (optional)
builder.Services.AddBlobStorageChatAttachments(options =>
{
options.MaxAttachments = 3;
options.MaxAttachmentBytes = 5 * 1024 * 1024; // 5 MiB
});
// GDPR participation (optional)
builder.Services.AddGranitPrivacy()
.AddGranitAIChatPrivacyProvider();
// Map the REST + SSE endpoints under /conversations
app.MapGranitConversations();

The core GranitAIChatModule registers IChatService, the workspace catalog, the clarification tool, and the mention/suggestion/attachment resolvers — all scoped.

The chat service is two-phase by design: a synchronous PrepareAsync that validates and resolves context, then an IAsyncEnumerable StreamAsync that runs the agentic loop. The split lets the endpoint map a bad request to a clean 404/422 before the SSE stream opens — once bytes are flowing, you can no longer change the status code.

namespace Granit.AI.Chat;
public interface IChatService
{
Task<ChatSendHandle> PrepareAsync(
ChatSendRequest request, CancellationToken cancellationToken = default);
IAsyncEnumerable<ChatTurnUpdate> StreamAsync(
ChatSendHandle handle, CancellationToken cancellationToken = default);
}

PrepareAsync checks that the target workspace is chat-capable, that the caller owns the conversation, and resolves mentions, attachments, and prompt references into the per-turn context. The returned ChatSendHandle already knows its ConversationId — even for a brand-new conversation — so the endpoint can flush it immediately.

sequenceDiagram
    participant C as Client
    participant E as Conversations endpoint
    participant S as IChatService
    participant L as FunctionInvokingChatClient
    C->>E: POST /conversations/messages
    E->>S: PrepareAsync(request)
    S-->>E: handle (ConversationId known)
    E-->>C: SSE: conversation frame (headers commit in ms)
    E->>S: StreamAsync(handle)
    S->>S: persist user message
    loop agentic loop
        S->>L: GetStreamingResponseAsync
        L-->>S: delta / tool_call / tool_result
        S-->>C: SSE frames
    end
    S->>S: persist assistant message
    S-->>C: SSE: persisted, usage, [suggestions]

StreamAsync persists the user turn before the loop runs (so a mid-stream crash never loses the question) and appends the assistant turn once the loop settles. Usage is stamped even if the client disconnects mid-stream.

StreamAsync yields ChatTurnUpdate records, each tagged by ChatTurnUpdateKind:

KindCarriesMeaning
DeltaDeltaOne slice of assistant text
ToolCallToolName, ToolCallIdA tool started
ToolResultToolName, ToolCallId, SucceededA tool finished
CompletedResult (ChatSendResult)Terminal — final answer, usage, suggestions, clarification

The terminal ChatSendResult carries the final Content, InputTokens / OutputTokens, the persisted message rows, any SuggestedActions, an optional Clarification, and MaxIterationsReached.

The endpoint projects those updates onto Server-Sent Events. The conversation frame is flushed first — its id comes from the handle — so time-to-first-byte is milliseconds, not the full agent duration. Tool frames carry only the tool name and call id, never arguments or raw results (privacy); the client maps the name to a localized label.

FramePayloadWhen
conversationconversationIdfirst, immediately
deltacontentper assistant token
tool_calltoolName, toolCallIda tool started
tool_resulttoolName, toolCallId, succeededa tool finished
persistedmessagesterminal — the turn’s two saved rows (user, then assistant)
usageinputTokens, outputTokensterminal
suggestions / clarificationterminal
errorcode (rate_limit, provider_unavailable, server_error)terminal, on failure

Tool frames are de-duplicated by call id (streamed function-call fragments arrive in pieces). No thinking frame is sent — the client derives a “Thinking…” indicator from a tool_result not yet followed by a delta. Unknown frames are ignored by older clients, so the protocol extends without breaking them.

MapGranitConversations() maps the full surface under /conversations (configurable via AIChatEndpointsOptions.RoutePrefix). Every route is owner-scoped — a caller only ever sees their own conversations.

MethodPathPermissionPurpose
GET/AIChat.Conversations.ReadList the caller’s conversations (newest first)
POST/AIChat.Conversations.ManageCreate a conversation
GET/{id}AIChat.Conversations.ReadConversation metadata (messages paged separately)
PUT/{id}/titleAIChat.Conversations.ManageRename
PUT/{id}/favoriteAIChat.Conversations.ManageToggle favorite (idempotent)
DELETE/{id}AIChat.Conversations.DeleteSoft-delete
GET/{id}/messagesAIChat.Conversations.ReadKeyset-paginated message page (newest first)
POST/messagesAIChat.Conversations.SendSend a message, stream the answer over SSE
POST/messages/{messageId}/reportAIChat.Conversations.ReportFlag a message for review
GET/workspacesAIChat.Conversations.ReadList selectable chat workspaces

The send endpoint is bound to the rate-limit policy ai-chat-send. The module wires the policy but enforces nothing until the host configures it — bound the per-user model spend to prevent “denial of wallet”:

"RateLimiting": {
"Policies": {
"ai-chat-send": { "PermitLimit": 20, "Window": "00:01:00", "PartitionBy": "User" }
}
}

SendMessageRequest is validated before it reaches the service: message non-empty and ≤ 16 000 chars, ≤ 25 mentions, ≤ 5 prompt refs, and attachment count/type/size within the configured attachment limits.

Three aggregates, all multi-tenant and owner-stamped. Messages are append-only — there is no edit or in-place delete; a conversation is an immutable transcript.

classDiagram
    class Conversation {
        +Guid Id
        +string Title
        +Guid OwnerId
        +bool IsFavorite
        +string? WorkspaceKey
        +AddMessage(role, content)
        +Rename(title)
        +SetFavorite(flag)
    }
    class Message {
        +Guid Id
        +MessageRole Role
        +string Content
        +string? WorkspaceKey
        +DateTimeOffset CreatedAt
    }
    class MessageReport {
        +Guid Id
        +Guid MessageId
        +Guid OwnerId
        +string Reason
        +MessageReportCategory? Category
    }
    Conversation "1" --> "*" Message : append-only
    Message "1" --> "*" MessageReport : flagged by owner

MessageRole is User, Assistant, System, or Tool. MessageReportCategory is Inaccurate, Harmful, or Other, and creating a report raises a ChatMessageReportedEvent carrying only identifiers and the user’s reason — never the message content or any tool data.

The EF Core satellite registers an isolated AIChatDbContext and maps three tables (ai_chat_conversations, ai_chat_messages, ai_chat_message_reports, prefix configurable via GranitAIChatDbProperties). Conversations are indexed by (TenantId, OwnerId, CreatedAt) for the “my conversations, newest first” query; messages by (ConversationId, CreatedAt) for thread pagination. Two store seams sit on top: IConversationStore (owner-scoped CRUD) and IConversationDataManager (privacy/retention operations that deliberately cross the owner boundary).

The engine is generic; you make it useful by plugging in your application’s context. Four seams, all optional.

Implement IAIAttachmentSource to resolve an opaque reference into bytes; the framework extracts the text and injects it as an untrusted context block. Granit.AI.Chat.BlobStorage ships a ready-made source over Granit.BlobStorage — register it with AddBlobStorageChatAttachments. To wire your own:

public sealed class InvoiceAttachmentSource(IInvoiceStore store) : IAIAttachmentSource
{
public async Task<AIAttachmentData?> GetAsync(
string reference, CancellationToken cancellationToken = default)
{
var invoice = await store.FindPdfAsync(reference, cancellationToken)
.ConfigureAwait(false);
return invoice is null
? null // absent or not accessible under the caller's ACLs
: new AIAttachmentData(invoice.Bytes, "application/pdf", invoice.FileName);
}
}
builder.Services.AddGranitChatAttachments<InvoiceAttachmentSource>(options =>
options.AllowedContentTypes.Add("application/pdf"));

GranitAIChatAttachmentOptions (config section AI:Chat:Attachments) defaults to 5 attachments, 10 MiB each, and a permissive set of text/Office/PDF MIME types.

When a user types @Dr Smith, the client sends an AIMention(Type, Id). Implement IAIMentionContextResolver to turn resolved mentions into a context block — and to enforce ACLs by returning null for references the caller may not see:

public sealed class DoctorMentionResolver(IDoctorReader doctors) : IAIMentionContextResolver
{
public async ValueTask<string?> ResolveContextAsync(
IReadOnlyList<AIMention> mentions, CancellationToken cancellationToken = default)
{
var doctor = mentions.FirstOrDefault(m => m.Type == "doctor");
if (doctor is null) return null;
var profile = await doctors.GetAsync(Guid.Parse(doctor.Id), cancellationToken)
.ConfigureAwait(false);
return profile is null ? null : $"Mentioned doctor: {profile.FullName}, {profile.Specialty}.";
}
}

A provider returns declarative call-to-actions — deep links the UI renders as chips. They are never auto-invoked; the user decides.

public sealed class ConnectCalendarSuggestion : IAISuggestionProvider
{
public ValueTask<IReadOnlyList<AISuggestedAction>> GetSuggestionsAsync(
AISuggestionContext context, CancellationToken cancellationToken = default)
{
IReadOnlyList<AISuggestedAction> actions = context.Message.Contains("appointment")
? [new AISuggestedAction("calendar.connect", "Connect your calendar", "/settings/calendar")]
: [];
return ValueTask.FromResult(actions);
}
}
builder.Services.AddGranitChatSuggestions(b => b.Add<ConnectCalendarSuggestion>());

The engine registers a built-in request_clarification tool. When the model is ambiguous, it calls the tool, which halts the loop and surfaces an AIClarificationRequest (a Question, discrete Options, and an AllowOther flag) instead of a final answer. The client renders the choices; the user’s pick becomes the next turn, and the loop resumes normally. No code on your side — it ships in the core.

Three user-scoped settings (prefix Granit.AI.Chat., auto-discovered by the settings module) shape every turn:

SettingValuesEffect
DefaultWorkspacea workspace name, or AutoWhich workspace a new conversation uses
WebSearchPolicyDeny · Allow · AlwaysAskWhether the agent may search the web (default Deny)
CustomContextfree text, ≤ 4000 charsPersistent user context injected each turn

IChatWorkspaceCatalog.GetSelectableWorkspacesAsync returns Auto first, then every chat-capable workspace — the same list the /workspaces endpoint serves.

Granit.AI.Chat.Privacy plugs into the framework’s GDPR pipeline via AddGranitAIChatPrivacyProvider(). It exports a user’s conversations, messages, and report reasons as an ai-chat-conversations.json fragment (Articles 15/20) and hard-deletes all of them on an erasure request (Article 17) — soft delete is forbidden here, since a recoverable transcript is not erased. Attachment content is never exported; it is transient and owned by blob storage.

Granit.AI.Chat.BackgroundJobs adds a distributed retention job. It is opt-in: set AI:Chat:Retention:RetentionDays (default 0 disables cleanup) and conversations idle past the window are purged in batches of CleanupBatchSize (default 500).

The core emits OpenTelemetry under the Granit.AI.Chat activity source and meter:

MetricMeaning
granit.ai.chat.conversation.createdNew conversations
granit.ai.chat.turn.completedCompleted turns
granit.ai.chat.tokens.input / .outputToken usage per turn

All are tagged tenant_id (coalesced to global) and workspace_key.