Indexing — Background reindex with checkpoint resume

Granit.Indexing.BackgroundJobs ships RebuildIndexJob<TKey> — an on-demand job that iterates every key emitted by an IIndexedEntrySource<TKey>.EnumerateKeysAsync and calls IIndexer<TKey>.IndexAsync once per entry. The default indexing pipeline keeps the index in sync via lifecycle events; reach for the rebuild job when:

A new tokenizer / analyzer is deployed and the corpus needs re-indexing.
A new tenant is onboarded and existing rows must be back-filled.
A new embedding model is wired and every vector needs regenerating.
An operator manually triggers a full rebuild after a data-quality incident.

Registration

builder.Services.AddGranitIndexing();
builder.Services.AddGranitBackgroundJobs();
builder.Services.AddGranitIndexingBackgroundJobs();

// One per TKey you want to rebuild:
builder.Services.AddGranitIndexingRebuildSource<Guid, MyDocumentSource>();

// Persistent checkpoints (recommended in prod — survives worker restarts):
builder.Services.AddGranitIndexingEntityFrameworkCoreCheckpointStore<Guid>();

Without AddGranitIndexingEntityFrameworkCoreCheckpointStore<TKey>() the framework falls back to InMemoryRebuildCheckpointStore<TKey> — fine for dev / single-process, but state is lost on restart.

Triggering a rebuild — permission at the dispatch site

RebuildIndexJob is on-demand and carries no [RecurringJob] attribute. Hosts that want a recurring rebuild wire their own cron-driven trigger that emits the job.

public sealed class RebuildController(
    IBackgroundJobDispatcher dispatcher,
    IPermissionChecker permissionChecker,
    ICurrentTenant currentTenant)
{
    public async Task TriggerAsync(CancellationToken ct)
    {
        if (!await permissionChecker.IsGrantedAsync(IndexingRebuildPermissions.Rebuild.Execute))
            throw new ForbiddenException();

        await dispatcher.PublishAsync(
            new RebuildIndexJob<Guid>(currentTenant.Id), cancellationToken: ct);
    }
}

Resource budget

The rebuild iterates every key emitted by the source and calls IIndexer<TKey>.IndexAsync once per entry — which in hosts wiring Granit.Indexing.Embeddings triggers one embedding call per entry. To bound the blast radius of a runaway or hostile dispatch:

Option	Default	Purpose
`MaxEntriesPerRun`	`null` (unbounded)	Hard cap on entries processed in a single run. Set this in production.
`MaxRunDuration`	`null` (unbounded)	Wall-clock budget. The service checkpoints + throws `RebuildBudgetExceededException`; Wolverine retries from the checkpoint.
`MaxConsecutiveFailures`	`50`	Circuit-breaker on per-key faults.
`CheckpointBatchSize`	`100`	Persist a checkpoint every N entries.

{
  "Indexing": {
    "BackgroundJobs": {
      "CheckpointBatchSize": 100,
      "MaxConsecutiveFailures": 50,
      "MaxEntriesPerRun": 1000000,
      "MaxRunDurationSeconds": 3600
    }
  }
}

When either budget is hit the checkpoint is preserved and a typed exception (RebuildBudgetExceededException) is raised. Hosts wire this to a Wolverine retry policy that re-dispatches the job after a cool-down.

Crash recovery

The rebuild service persists a checkpoint after every CheckpointBatchSize entries (default 100). On crash, the next dispatch reads the checkpoint and resumes past it — no duplicate indexing, no missing rows, provided your IIndexedEntrySource.EnumerateKeysAsync honours the resumeAfter contract: return keys strictly past the checkpoint, same order across calls.

A run that exceeds MaxConsecutiveFailures aborts and the checkpoint is preserved so a follow-up trigger resumes from where the corruption started — rather than re-scanning the whole corpus and re-hitting the same bad row.

Operational events

Every rebuild raises in-process domain events on ILocalEventBus so audit / SIEM subscribers can correlate dispatch-time identity with the run lifecycle:

Event	Raised at
`IndexRebuildStartedEvent`	Very start, before reading the first key
`IndexRebuildCompletedEvent`	Successful end-of-stream
`IndexRebuildAbortedEvent`	`MaxConsecutiveFailures` / `MaxEntriesPerRun` / `MaxRunDuration` cut the run short, or cancellation. Carries the abort reason.

Each event carries TenantId, SourceName, TKey discriminator, processed count, and (when available) the dispatching UserId — wire it into Granit.Auditing for a GDPR-grade processing log.