ADR-071: Agentic chat streaming via FunctionInvokingChatClient
Date: 2026-06-18 Authors: Jean-Francois Meyers Scope:
Granit.AI.Tools(orchestrator),Granit.AI.Chat(chat service),Granit.AI.Chat.Endpoints(SSE endpoint). New dependencyMicrosoft.Extensions.AI(MIT). React surface →granit-front(handoff).
Context
Section titled “Context”The agentic chat endpoint (POST {basePath}/conversations/messages, ADR-067) streamed its
answer over Server-Sent Events, but the pipeline was buffered end to end: the endpoint
await-ed the whole agentic loop before returning the ServerSentEvents result, so the SSE
response headers only committed once the entire loop had finished. Time-to-first-byte therefore
equalled the full agent duration, and the front (axios, 10s) timed out on any non-trivial turn.
The StreamAsync iterator merely replayed the already-computed answer word-by-word — cosmetic
streaming.
Two things were wrong:
- No early flush. Nothing was written to the response until the loop settled.
- No real streaming. The orchestrator (
AIToolOrchestrator) hand-rolled the think → call → execute → repeat loop onIChatClient.GetResponseAsync(buffered), notGetStreamingResponseAsync.
Decision
Section titled “Decision”1. Adopt FunctionInvokingChatClient (Microsoft.Extensions.AI)
Section titled “1. Adopt FunctionInvokingChatClient (Microsoft.Extensions.AI)”Replace the hand-rolled loop with M.E.AI’s built-in agentic loop — the idiomatic, long-term .NET
primitive. The orchestrator wraps the workspace IChatClient in a per-run
FunctionInvokingChatClient and drives it with GetStreamingResponseAsync. Granit-specific
concerns ride the client’s extension points rather than a custom loop:
- Per-tool authorization — only authorized tools are declared in
ChatOptions.Tools. - Truncation, metrics, audit, and the clarification interrupt — the
FunctionInvokerhook resolves theIAITool, guards the result size, records metrics/outcomes, and turns a tool’sAIToolResult.Interruptinto a graceful loop stop viaFunctionInvocationContext.Terminate. - Iteration accounting — a thin counting
DelegatingChatClientreports the number of model round-trips and whether the last one still wanted tools (soIterations/MaxIterationsReachedstay exact without mining synthesized streamed updates).
IAIToolOrchestrator.RunStreamingAsync becomes the primitive (yields Delta / ToolCall /
ToolResult / Completed); the buffered RunAsync is a thin drain over it.
2. Two-phase chat service (validate-then-stream)
Section titled “2. Two-phase chat service (validate-then-stream)”IChatService.SendAsync is replaced by PrepareAsync + StreamAsync. PrepareAsync runs all
fast validation (workspace chat-capability, conversation ownership, mention/attachment/prompt
resolution) and returns a handle whose ConversationId is known up front — so the endpoint can
map failures to a clean 404/422 before the SSE stream opens. StreamAsync then streams the
loop and persists.
3. Immediate conversation frame + extended SSE protocol
Section titled “3. Immediate conversation frame + extended SSE protocol”The endpoint flushes the conversation frame first (its ConversationId is known from the
handle), committing the response headers in milliseconds. The wire contract is extended with two
live tool-status frames (front-rendered as chips, à la Claude.ai/ChatGPT):
| Frame | Payload | Meaning |
|---|---|---|
conversation | conversationId | flushed first |
delta | content | assistant text token |
tool_call | toolName, toolCallId | a tool started |
tool_result | toolName, toolCallId, succeeded | a tool finished |
persisted | messages | terminal — the turn’s saved rows |
usage | inputTokens, outputTokens | terminal |
suggestions / clarification | … | terminal |
Tool frames carry only the tool name and call id — never arguments or raw results (privacy);
the front maps the name to a localized label. Tool frames are de-duplicated by call id (streamed
FunctionCallContent arrives fragmented). No thinking frame is emitted: the front derives the
“Thinking…” indicator from a tool_result not yet followed by a delta, keeping the wire minimal.
The persisted frame carries the turn’s two newly-saved message rows — the user message then the
assistant message, oldest-first — as the same messages (id, role, content, createdAt)
projection GET {basePath}/conversations/{id} returns. Emitted once on a successful turn, just
before usage, it lets the client render the real rows (server ids + timestamps) and report the
just-streamed assistant message immediately, instead of optimistically appending client-synthesised
ids. It is omitted on a clarification turn (no answer to append) and on the error frame; when
absent, the client keeps its optimistic, client-id append. Older clients ignore the unknown frame, so
the addition is backward compatible.
4. Persist the user turn before the stream
Section titled “4. Persist the user turn before the stream”StreamAsync persists the user message (and creates the conversation) before the loop runs,
and appends the assistant turn at the end. Usage is stamped with a 5-second guard token so it
records even if the client disconnects mid-stream, without leaking an orphaned write.
Consequences
Section titled “Consequences”- Time-to-first-byte drops to milliseconds; the answer streams token-by-token. The front keeps
timeout: 0(gaps between tokens remain legitimate). - Wire contract change —
granit-front(@granit/ai-chat,@granit/ai) must render the newtool_call/tool_resultframes and derive the thinking state. Handled as a separate front PR. - New dependency
Microsoft.Extensions.AI10.7.0 (MIT) — the middleware package, separate from the abstractions already referenced. - Behaviour shifts vs. the hand-rolled loop, by design: an unknown tool call is handled by the
function-invoking client itself (error result fed back, loop continues) rather than the Granit
invoker; the iteration cap (
MaxIterations→MaximumIterationsPerRequest) allows one final round-trip that still wants tools, which is the signal used to reportMaxIterationsReached. - Residual data-loss window (documented, accepted): the user turn is persisted before the stream, but the assistant turn is persisted only after it settles. A server crash between stream-end and persistence loses the assistant turn (regenerable); the user turn and conversation survive. Incremental assistant persistence is a possible future refinement.
Alternatives considered
Section titled “Alternatives considered”- Keep the hand-rolled loop, add streaming. Lower risk, but perpetuates a bespoke agentic loop
that duplicates
FunctionInvokingChatClient. Rejected in favour of the standard primitive. - Text-only streaming (no tool frames). Would not change the wire contract, but loses the live tool-status UX. Rejected — the product wants the Claude.ai-style experience.
- Migrate to a pipeline-wide
UseFunctionInvocation. A shared pipeline can’t hold per-run capture state (interrupt, outcomes); the orchestrator wraps the client locally instead, leaving the non-agentic/ai/workspaceschat path untouched.
References
Section titled “References”- ADR-067: Agentic chat, tool registry, prompt catalog
- ADR-067 §9 — Reading a message thread (backwards keyset) — thread history loads newest-first via the backwards-keyset messages endpoint; this ADR covers only the send/stream path.
FunctionInvokingChatClient