Full-Text & Semantic Search — Postgres & Elasticsearch

Every product team builds the same search feature twice. First a quick LIKE '%term%' on a single table. Then — when the corpus grows — a hasty rewrite onto tsvector or Elasticsearch, with a re-indexing batch glued together over a long weekend. Both attempts share the same blind spots: no per-resource ACL filter past tenant scope, no language-aware analysis, no GDPR-grade erasure when a data subject revokes consent, no graceful path from lexical-only to hybrid semantic retrieval, and every endpoint silently re-implements query parsing — usually forwarding raw operator syntax straight into to_tsquery.

Granit.Indexing is the horizontal full-text + semantic search framework. One write port (IIndexer<TKey>), one read port (ISearchService<TKey, TResult>), pluggable backends behind both. The default backend is Postgres tsvector (Granit.Indexing.EntityFrameworkCore) — zero new infrastructure for a Granit host. Elasticsearch 9.x is opt-in (Granit.Indexing.Elasticsearch) for low-millions-of-rows corpora or per-language analyzer needs. Embeddings + hybrid Reciprocal Rank Fusion ship as a separate opt-in (Granit.Indexing.Embeddings), as do AI summaries (Granit.Indexing.AI), background reindex with checkpoint resume (Granit.Indexing.BackgroundJobs), and the GDPR Art. 17 bridge (Granit.Indexing.Privacy).

Pain	This package’s answer
Hand-rolled `LIKE '%…%'` per endpoint with no ACL filter past tenant scope	One `ISearchService<TKey, TResult>` with consumer-supplied `ISearchResultAuthorizer<TKey>` on every hit
tsquery / Lucene injection from anonymous traffic	`plainto_tsquery` / `simple_query_string` (restricted flags) on the default path; advanced syntax gated on a `Search.Advanced.Execute` permission
Existence-oracle probing via empty-result pings	Per-principal sliding window — `MaxEmptyResultQueriesPerPrincipalPerMinute = 10` by default; 11th hits `429`
Long-tail “no results” page reveals row counts to restricted principals	Exponential over-fetch loop with a `MaxAuthorizationDepth = 5 000` ceiling; aggregated `HitAuthorizationLimit` hint, not a per-query oracle
Index outlives the source row after a GDPR Art. 17 request	`IIndexedDataEraser` + `Granit.Indexing.Privacy` bridge: one bulk statement per backend, atomic with the source delete
Embedding sidecar table de-syncs from `Content` on erasure	Embeddings live on the same row as `Content` — pinned by an architecture test (`Embeddings_must_live_on_the_same_row_as_Content_no_sidecar_entity_types`)
Reindex after a tokenizer change requires a custom batch	`RebuildIndexJob<TKey>` + checkpoint store; survives worker restarts
Switching from Postgres to ES means rewriting consumers	Same `IIndexer<TKey>` / `ISearchService<TKey, TResult>` contracts — backend swap is a DI line

Package structure

DirectoryGranit.Indexing/ Contracts: IIndexer<TKey>, ISearchService<TKey, TResult>, ISearchBackend<TKey, TResult>, ISearchResultAuthorizer<TKey>, IIndexedEntrySource<TKey>, IIndexedDataEraser, IndexedEntry<TKey>, SearchPage<TResult>
- DirectoryGranit.Indexing.EntityFrameworkCore/ Default backend — Postgres tsvector (GENERATED ALWAYS … STORED) + GIN index, PersonalDataDeletionHandler
  - …
- DirectoryGranit.Indexing.Elasticsearch/ ES 9.x backend, Shared / PerTenant strategies, restricted simple_query_string
  - …
- DirectoryGranit.Indexing.AI/ ISummarizer — one-shot LLM call, JSON-schema-pinned, AIQuota-rate-limited
  - …
- DirectoryGranit.Indexing.Embeddings/ Decorator: IEmbeddingGenerator write + RRF hybrid retriever (k=60 default, dense ranking)
  - …
- DirectoryGranit.Indexing.BackgroundJobs/ RebuildIndexJob<TKey> + checkpoint store (in-memory or EF)
  - …
- DirectoryGranit.Indexing.Privacy/ Wolverine handler bridging PersonalDataDeletionRequestedEto → every IIndexedDataEraser
  - …

Package	Role	Depends on
`Granit.Indexing`	Contract root — `IIndexer<TKey>`, `ISearchService<TKey, TResult>`, `ISearchBackend<TKey, TResult>`, `ISearchResultAuthorizer<TKey>`, `IIndexedEntrySource<TKey>`, `IIndexedDataEraser`, `IndexedEntry<TKey>`, `SearchPage<TResult>`, `DefaultSearchService`, empty-result rate limiter	`Granit`, `Granit.LanguageDetection`
`Granit.Indexing.EntityFrameworkCore`	Default Postgres `tsvector` backend — `IndexingDbContext`, `IndexedEntryRow<TKey>`, `HasGeneratedTsVectorColumn`, `EfIndexer<TKey>`, `EfSearchBackend<TKey, TResult>`, `EfIndexedDataEraser`	`Granit.Indexing`, `Granit.Persistence.EntityFrameworkCore`
`Granit.Indexing.Elasticsearch`	ES 9.x backend — BM25 multi-field per-language analyzers, `Shared` / `PerTenant` strategy, `delete_by_query` eraser	`Granit.Indexing`, `Elastic.Clients.Elasticsearch`
`Granit.Indexing.AI`	`ISummarizer` LLM provider — JSON-schema response, AIQuota counter, prompt-injection isolation	`Granit.Indexing`, `Granit.AI`
`Granit.Indexing.Embeddings`	Decorator pair: `IIndexer<TKey>` writer (embeds Content) + `ISearchBackend<TKey, TResult>` hybrid retriever (BM25 ∪ kNN → RRF)	`Granit.Indexing`, `Microsoft.Extensions.AI.Abstractions`
`Granit.Indexing.BackgroundJobs`	`RebuildIndexJob<TKey>` (on-demand, host-dispatched), `InMemoryRebuildCheckpointStore<TKey>` default, EF persistent checkpoint opt-in	`Granit.Indexing`, `Granit.BackgroundJobs`
`Granit.Indexing.Privacy`	`PersonalDataDeletionHandler` Wolverine handler — fans `PersonalDataDeletionRequestedEto` to every registered `IIndexedDataEraser`	`Granit.Indexing`, `Granit.Privacy`

Contracts

Write port — `IIndexer<TKey>`

public interface IIndexer<TKey>
{
    Task IndexAsync(IndexedEntry<TKey> entry, CancellationToken cancellationToken = default);
    Task RemoveAsync(TKey key, Guid? tenantId, CancellationToken cancellationToken = default);
}

public sealed record IndexedEntry<TKey>
{
    public required TKey Key { get; init; }
    public required Guid? TenantId { get; init; }
    public required string Content { get; init; }
    public string? Language { get; init; }          // ISO 639-1 — from ILanguageDetector
    public string? Summary { get; init; }           // optional — ISummarizer
    public IReadOnlyList<string>? Tags { get; init; }
    public ReadOnlyMemory<float>? Embedding { get; init; }
    public IReadOnlyDictionary<string, string>? Facets { get; init; }
    public bool IsTruncated { get; init; }          // from TextExtractionResult
    public int CharCount { get; init; }
    public Guid? DataSubjectId { get; init; }       // drives GDPR Art. 17 erasure
}

IndexAsync is idempotent: re-indexing (TenantId, Key) overwrites the existing row in place. RemoveAsync takes tenantId explicitly because background workers — which may not have ICurrentTenant in scope — must still be able to remove rows from any tenant without leaking through ambient state.

Read port — `ISearchService<TKey, TResult>`

public interface ISearchService<TKey, TResult>
{
    Task<SearchPage<TResult>> SearchAsync(SearchRequest request, CancellationToken ct = default);
}

public sealed record SearchRequest(
    string Query,
    int Page = 1,
    int PageSize = 20,
    string? Language = null,
    string? PrincipalIdentifier = null,
    bool UseAdvancedSyntax = false);

SearchRequest.Query is treated as a phrase by default. UseAdvancedSyntax opts the backend into operator-aware parsing (to_tsquery on Postgres, Lucene query_string on ES) — endpoints MUST gate this on a dedicated Search.Advanced.Execute permission before forwarding the flag. PrincipalIdentifier (typically User.GetSubjectId()) is hashed before any log emission and used as the bucket key for the empty-result rate limiter.

Authorization boundary

The single most important contract in Granit.Indexing:

The framework enforces tenant isolation only. Per-resource ACL is the consumer module’s responsibility and is enforced at read time via ISearchResultAuthorizer<TKey> — never serialised into the index.

sequenceDiagram
    autonumber
    participant E as Endpoint
    participant S as ISearchService<TKey, TResult><br/>(DefaultSearchService)
    participant B as ISearchBackend<TKey, TResult><br/>(EfSearchBackend / EsSearchBackend)
    participant A as ISearchResultAuthorizer<TKey><br/>(consumer-supplied)
    E->>S: SearchAsync(request)
    loop Over-fetch loop
        S->>B: SearchAsync(request, offset, limit)<br/>limit = pageSize × multiplier × 2^i
        B-->>S: hits + HasMore (tenant-filtered)
        S->>A: FilterAsync(keys)
        A-->>S: AuthorizedResult(keys)
        Note over S: Stop when page full,<br/>backend exhausted,<br/>or MaxAuthorizationDepth hit
    end
    S-->>E: SearchPage<TResult>

Layer	Concern	Default
`ISearchBackend<TKey, TResult>`	Tenant isolation — every query scoped to `ICurrentTenant`, applied by the backend (never by the orchestrator)	EF: `GranitDbContext` parameterised tenant filter rewritten into every SQL statement. ES: mandatory `term tenant_id` on every read/write
`ISearchResultAuthorizer<TKey>`	Per-resource ACL — workspace, role-based row-level, public-link grants	`NullSearchResultAuthorizer<TKey>` (authorises every hit) — appropriate only when tenant isolation is the complete authorization story
`DefaultSearchService<TKey, TResult>`	Exponential over-fetch loop fills the requested page with authorised hits without leaking row counts	Iteration 1: `pageSize × RecommendedInitialMultiplier`. Doubles per iteration. Stops at `MaxAuthorizationDepth = 5 000`

The over-fetch loop avoids a class of leaks:

No per-page existence oracle. A restricted principal who never sees more than n rows cannot bisect-search a private term — the response always carries the same authorised-page shape; the only signal is the aggregated HitAuthorizationLimit flag, which endpoints MUST throttle to at most one display per principal per 60 s (the framework cannot enforce this — it has no UI state).
No empty-result probing. A principal that exceeds MaxEmptyResultQueriesPerPrincipalPerMinute (default 10) inside a 60 s window gets EmptyResultRateLimitedException, which the endpoint adapter converts to Problem(429).
No backend hit count leak. SearchPage<TResult>.BackendHitCount is marked [JsonIgnore] + [EditorBrowsable(Never)] so it never round-trips through HTTP; an architecture test (BackendHitCount_must_not_be_referenced_from_any_Endpoints_package) forbids cross-package access from .Endpoints projects.

Writing an authorizer

public sealed class WorkspaceAclAuthorizer(IWorkspaceAccess access)
    : ISearchResultAuthorizer<Guid>
{
    // Restricted principal sees ~10 % of hits — over-fetch 10× on iteration 1.
    public int RecommendedInitialMultiplier => 10;

    public async Task<AuthorizedResult<Guid>> FilterAsync(
        IReadOnlyList<Guid> candidates, CancellationToken ct)
    {
        IReadOnlyList<Guid> allowed = await access
            .FilterReadableAsync(candidates, ct).ConfigureAwait(false);
        return new AuthorizedResult<Guid>(allowed);
    }
}

// Composition root
services.AddGranitIndexing();
services.AddSingleton<ISearchResultAuthorizer<Guid>, WorkspaceAclAuthorizer>();

Use the rule of thumb ceil(1 / expected_authorized_ratio) for the multiplier. Too low costs an extra round-trip on the common path; too high wastes backend rows on the rare path. Unknown principals default to 3 — covers admin and restricted alike within two iterations.

Backends

Postgres `tsvector` (default)

Granit.Indexing.EntityFrameworkCore is the default backend — zero new infrastructure for any Granit host that already runs Postgres.

Aspect	Behaviour
Storage	`IndexedEntryRow<TKey>` with `Content`, `SearchVector tsvector` (`GENERATED ALWAYS … STORED`), `Language`, `Summary`, `Tags string[]`, `IsTruncated`, `CharCount`, `DataSubjectId`. One physical table per registered `TKey`.
Query syntax	`plainto_tsquery` by default; `websearch_to_tsquery` via `IndexingEntityFrameworkCoreOptions.UseWebSearchSyntax`. `to_tsquery` is not reachable from the default path — operator characters (`&`, `\|`, `!`, parentheses) are treated as literals.
Index	GIN over `SearchVector`, emitted by the `HasGeneratedTsVectorColumn(...)` `ModelBuilder` extension.
Tenant isolation	Inherited from `GranitDbContext` — parameterised tenant filter rewritten into every SQL statement at execution time (no closure-leak risk).
GDPR Art. 17	`EfIndexedDataEraser` fans out a single `ExecuteDelete()` per registered `TKey` filtered by `(TenantId, DataSubjectId)`.
Architecture pins	`Granit.Indexing.EntityFrameworkCore` is the only package allowed to reference EF Core NuGets (`EntityFrameworkCore_NuGets_only_in_the_EntityFrameworkCore_backend`). `IgnoreQueryFilters` usage is on an audit allowlist.

builder.Services.AddGranitIndexing();
builder.Services.AddGranitIndexingEntityFrameworkCore(
    opts => opts.UseNpgsql(connectionString),
    typeof(Guid));

builder.Services.AddGranitIndexingBackend<Guid, MyHitResponse>(
    row => new MyHitResponse(row.Key, row.Summary ?? string.Empty, row.Tags));

The package ships no EF migrations — the consumer host owns them:

dotnet ef migrations add InitIndexing \
    --context IndexingDbContext \
    --project YourHost/YourHost.csproj

Elasticsearch 9.x (opt-in)

Granit.Indexing.Elasticsearch swaps the backend wholesale: registering it strips any previously-registered IIndexer<TKey> and IIndexedDataEraser to guarantee the host runs a single backend.

Reach for it when:

The corpus exceeds what a single Postgres tsvector index can comfortably serve (low-millions of rows or multi-GB content).
Per-language analyzers, synonym maps, or phrase scoring are core to UX.
An Elasticsearch cluster is already operated and consolidating full-text workloads makes sense.

builder.Services.AddGranitIndexing();
builder.Services.AddGranitIndexingElasticsearch(
    configureClient: null,
    typeof(Guid));
builder.Services.AddGranitIndexingElasticsearchBackend<Guid, MyResponse>(
    keyProjection: doc => Guid.Parse(doc.Key),
    resultProjection: doc => new MyResponse(doc.Key, doc.Summary, doc.Tags));

{
  "Indexing": {
    "Elasticsearch": {
      "Uri": "https://es.internal:9200",
      "ApiKey": "your-api-key",
      "Strategy": "Shared",
      "IndexPrefix": "granit-indexing",
      "BulkBatchSize": 500,
      "StoreFullContentInIndex": true,
      "UseSimpleQueryString": true,
      "DefaultAnalyzer": "standard"
    }
  }
}

Setting	Choice
`Strategy: Shared` (default)	One index per `TKey`; tenants isolated by mandatory `term tenant_id` filter on every read / write.
`Strategy: PerTenant`	One index per `(TKey, tenant)` pair. Stricter physical isolation, one extra index per tenant. The framework still applies the `tenant_id` filter as defence-in-depth for misrouted bulk imports.
`UseSimpleQueryString: true` (default)	`simple_query_string` with the restricted flag set `AND \| OR \| PHRASE \| PREFIX`. Lucene’s full `query_string` (regex, fuzzy, field-targeted operators) is reachable only when the request carries `UseAdvancedSyntax = true` and the endpoint has gated on `Search.Advanced.Execute`.
`StoreFullContentInIndex`	Trade-off — see below.

Granit.Indexing.Elasticsearch ships an IIndexedDataEraser that fans out a single delete_by_query across every registered TKey. delete_by_query is a logical delete; physical disposal happens at the next segment merge or via an explicit forcemerge schedule — Article 17 is satisfied because the data is no longer addressable, but bit-level disposal depends on the host’s storage policy.

Backend comparison

Concern	Postgres `tsvector`	Elasticsearch 9.x
Infrastructure cost	None beyond Postgres	Dedicated cluster
Default query parser	`plainto_tsquery` (operator characters → literals)	`simple_query_string` with restricted flags
Advanced syntax	`to_tsquery` — gated on `Search.Advanced.Execute`	Full Lucene `query_string` — gated on `Search.Advanced.Execute`
Tenant isolation	`GranitDbContext` parameterised filter	Mandatory `term tenant_id` filter, plus optional `PerTenant` physical isolation
Per-language analyzers	One `tsvector` config per row (chosen from `Language`)	One sub-field per analyzer; synonym maps and phrase scoring built-in
GDPR Art. 17	`ExecuteDelete()` per `TKey` (synchronous, atomic)	`delete_by_query` (logical delete; physical disposal on next merge / `forcemerge`)
Embeddings	`vector(N)` pgvector column on the same row as `Content`	`dense_vector(dims: N)` field on the same document as `Content`

Language-aware analysis

IndexedEntry.Language is consumed by every backend at index time to pick the right analyser (Postgres tsvector configuration, ES <lang>_<analyzer>). The value comes from Granit.LanguageDetection — a cross-cutting ILanguageDetector with a deterministic trigram default and optional priority-chain overrides:

public sealed class MyDocumentSource(
    IDocumentRepository repo,
    ITextExtractionPipeline extraction,
    ILanguageDetector languageDetector) : IIndexedEntrySource<Guid>
{
    public string Name => "document";

    public async IAsyncEnumerable<Guid> EnumerateKeysAsync(
        Guid? tenantId, Guid? resumeAfter, [EnumeratorCancellation] CancellationToken ct)
    {
        await foreach (Guid id in repo.EnumerateIdsAsync(tenantId, resumeAfter, ct))
            yield return id;
    }

    public async Task<IndexedEntry<Guid>?> BuildEntryAsync(Guid key, CancellationToken ct)
    {
        Document? doc = await repo.GetAsync(key, ct).ConfigureAwait(false);
        if (doc is null) return null;

        TextExtractionResult body = await extraction.ExtractAsync(
            doc.OpenRead(), doc.ContentType, ct).ConfigureAwait(false);

        string? language = body.DetectedLanguage
            ?? await languageDetector.DetectAsync(body.Content, ct).ConfigureAwait(false);

        return new IndexedEntry<Guid>
        {
            Key = key,
            TenantId = doc.TenantId,
            Content = body.Content,
            Language = language,
            IsTruncated = body.IsTruncated,
            CharCount = body.CharCount,
            DataSubjectId = doc.OwnerPartyId,    // GDPR Art. 17 hook
        };
    }

    public Task<Guid?> GetDataSubjectIdAsync(Guid key, CancellationToken ct)
        => repo.GetOwnerPartyIdAsync(key, ct);
}

AI providers (opt-in)

Every AI add-on is opt-in. The base Granit.Indexing pipeline runs fully lexical with the deterministic trigram detector — no network calls, no embedded LLM. Bring in AI providers package by package when the cost/quality trade-off makes sense.

flowchart LR
    subgraph base["Always-on baseline"]
      ext["Granit.TextExtraction<br/>bytes → text"] --> lang["Granit.LanguageDetection.Trigram<br/>ISO 639-1"]
      lang --> entry["IndexedEntry<TKey>"]
      entry --> ix["IIndexer<TKey>"]
    end
    subgraph ai["Opt-in AI providers"]
      lang -. higher priority .-> aiLang["Granit.LanguageDetection.AI<br/>(short / mixed corpora)"]
      entry -. before IndexAsync .-> sum["Granit.Indexing.AI<br/>ISummarizer (LLM snippet)"]
      ix -. decorator .-> emb["Granit.Indexing.Embeddings<br/>IEmbeddingGenerator + RRF retriever"]
    end

Package	Adds	Cost ceiling
`Granit.LanguageDetection.AI`	LLM-backed `ILanguageDetectorProvider` at priority 200 — disambiguates short or mixed-language inputs the trigram detector cannot reliably classify	Inherits `Granit.AI` `AIQuotaOptions.MaxRequestsPerTenantPerHour`
`Granit.Indexing.AI`	`ISummarizer` — one-shot LLM call producing a SERP-style snippet for `IndexedEntry.Summary`. JSON-schema-pinned, content wrapped in `<untrusted_document>...</untrusted_document>` (OWASP LLM01)	`MaxAICallsPerHourPerTenant` (default 1 000); on cap, returns `null` — the entry persists without a summary
`Granit.Indexing.Embeddings`	Decorator pair — embeds `Content` via `IEmbeddingGenerator` at write time, fuses BM25/tsvector + cosine kNN with Reciprocal Rank Fusion at read time	Wraps the host’s `IEmbeddingGenerator`; native cost ceiling tracked under follow-up (the cost-accounting contract is fleshed out in I-F3.2)

Hybrid retrieval — Reciprocal Rank Fusion

Granit.Indexing.Embeddings decorates both ports — write-time embedding of Content on the same row as the body (atomic GDPR erasure), read-time fusion of the lexical and dense channels via Reciprocal Rank Fusion (Cormack et al. 2009, k = 60, dense ranking on score ties). The full registration order, RrfFetchPoolSize / pagination-after-fusion contract, graceful lexical-only degradation, and the HNSW ghost-vector operations concern live on the dedicated page:

→ Indexing — Embeddings (RRF)

Indexed copies must not outlive the source row. Three pieces co-operate:

sequenceDiagram
    autonumber
    participant P as Granit.Privacy
    participant Bus as Wolverine bus
    participant H as PersonalDataDeletionHandler<br/>(Granit.Indexing.Privacy)
    participant E1 as EfIndexedDataEraser
    participant E2 as EsIndexedDataEraser
    P->>Bus: PersonalDataDeletionRequestedEto(tenantId, dataSubjectId)
    Bus->>H: handler picks it up
    par per backend, in parallel
        H->>E1: EraseAsync(tenantId, dataSubjectId)
        E1->>E1: ExecuteDelete()<br/>per registered TKey
        E1-->>H: rows deleted
    and
        H->>E2: EraseAsync(tenantId, dataSubjectId)
        E2->>E2: delete_by_query<br/>per registered TKey
        E2-->>H: rows deleted
    end

Layer	Role
Producer — `IIndexedEntrySource<TKey>.GetDataSubjectIdAsync`	Returns the natural-person id the indexed body refers to; `null` for non-personal data (system documents, public reference data). Populated into `IndexedEntry.DataSubjectId` at build time.
Backend — `IIndexedDataEraser`	One implementation per backend (`EfIndexedDataEraser`, `EsIndexedDataEraser`). Bulk-deletes rows filtered by `(TenantId, DataSubjectId)` in a single statement. Idempotent — Wolverine retries and manual replays converge safely.
Bridge — `Granit.Indexing.Privacy`	`PersonalDataDeletionHandler` subscribed to `PersonalDataDeletionRequestedEto`. Resolves every `IIndexedDataEraser` from DI and fans out the request. Skip the package on hosts that do not need the cascade.

Why a separate hook instead of IIndexer<TKey>.RemoveAsync? IIndexer<TKey>.RemoveAsync removes a single known (TenantId, Key) tuple. The GDPR cascade does not know the keys — only the DataSubjectId the entries reference. Backends store that value at index time and expose EraseAsync to erase by subject in one bulk statement; calling RemoveAsync in a loop would require enumerating every key first — slower and racier.

When Granit.Indexing.Embeddings is wired, embeddings are persisted on the same row as Content in every storage backend — pinned by Embeddings_must_live_on_the_same_row_as_Content_no_sidecar_entity_types. When the eraser fires, both atoms vanish atomically in a single DELETE / delete_by_query. No sidecar table, no orphan vectors.

Background reindex with checkpoint resume

Granit.Indexing.BackgroundJobs ships RebuildIndexJob<TKey> — an on-demand job that iterates every key emitted by IIndexedEntrySource<TKey> and calls IIndexer<TKey>.IndexAsync per entry. Use it after a tokenizer / analyzer change, after a new tenant backfill, after wiring a new embedding model, or after a data-quality incident.

The job is on-demand only — the framework ships no [RecurringJob]. The permission check MUST live at the dispatch site (the Wolverine handler has no HTTP context), and a MaxEntriesPerRun / MaxRunDuration budget should be set in production to bound the blast radius of a runaway or hostile dispatch. Resource budget, lifecycle events, checkpoint store options, and the full dispatch-controller example live on the dedicated page:

→ Indexing — Background reindex

Configuration cookbook

[DependsOn(
    typeof(GranitIndexingEntityFrameworkCoreModule),
    typeof(GranitLanguageDetectionTrigramModule))]
public sealed class SearchModule : GranitModule
{
    public override void ConfigureServices(ServiceConfigurationContext context)
    {
        context.Services.AddGranitIndexing();
        context.Services.AddGranitIndexingEntityFrameworkCore(
            opts => opts.UseNpgsql(context.Configuration.GetConnectionString("Indexing")!),
            typeof(Guid));

        context.Services.AddGranitIndexingBackend<Guid, MyHitResponse>(
            row => new MyHitResponse(row.Key, row.Summary ?? string.Empty, row.Tags));

        context.Services.AddSingleton<ISearchResultAuthorizer<Guid>, WorkspaceAclAuthorizer>();
        context.Services.AddScoped<IIndexedEntrySource<Guid>, MyDocumentSource>();
    }
}

{
  "Indexing": {
    "DefaultPageSize": 20,
    "MaxPageSize": 100,
    "MaxAuthorizationDepth": 5000,
    "MaxEmptyResultQueriesPerPrincipalPerMinute": 10,
    "AuthorizationOverfetchMultiplier": 3
  }
}

[DependsOn(typeof(GranitIndexingElasticsearchModule))]
public sealed class SearchModule : GranitModule
{
    public override void ConfigureServices(ServiceConfigurationContext context)
    {
        context.Services.AddGranitIndexing();
        context.Services.AddGranitIndexingElasticsearch(
            configureClient: null,
            typeof(Guid));

        context.Services.AddGranitIndexingElasticsearchBackend<Guid, MyHitResponse>(
            keyProjection: doc => Guid.Parse(doc.Key),
            resultProjection: doc => new MyHitResponse(doc.Key, doc.Summary, doc.Tags));
    }
}

{
  "Indexing": {
    "Elasticsearch": {
      "Uri": "https://es.internal:9200",
      "ApiKey": "your-api-key",
      "Strategy": "Shared",
      "IndexPrefix": "granit-indexing",
      "StoreFullContentInIndex": false,
      "UseSimpleQueryString": true
    }
  }
}

[DependsOn(
    typeof(GranitIndexingAIModule),
    typeof(GranitIndexingPrivacyModule))]
public sealed class SearchAIModule : GranitModule
{
    public override void ConfigureServices(ServiceConfigurationContext context)
    {
        context.Services.AddGranitIndexingAISummarizer();
    }
}

{
  "Indexing": {
    "AI": {
      "WorkspaceName": "default",
      "MaxAICallsPerHourPerTenant": 1000,
      "RedactPIIBeforeLLMCall": true,
      "MaxContentLength": 8192,
      "MaxSummaryLength": 500,
      "TimeoutSeconds": 20
    }
  }
}

Consumers retrieve ISummarizer from DI and call SummarizeAsync(content) before building the IndexedEntry, then assign the result to entry.Summary. The framework does NOT auto-wire a decorator on IIndexer<TKey> — the call cost lives where the consumer can control it.

Architecture invariants

The following invariants are pinned by tests in IndexingArchitectureTests and run on every CI build:

Invariant	Rationale
`BackendHitCount_must_not_be_referenced_from_any_Endpoints_package`	The raw backend hit count is a tenant-wide row-count oracle. Endpoint adapters must read only `SearchPage.Items` + `HitAuthorizationLimit`.
`Embeddings_must_live_on_the_same_row_as_Content_no_sidecar_entity_types`	Atomic GDPR Art. 17 erasure — embeddings and content vanish in the same `DELETE`.
`IgnoreQueryFilters_calls_in_Granit_Indexing_EntityFrameworkCore_stay_within_the_audit_allowlist`	Bypassing tenant filters is allowed only at named, reviewed call sites.
`Granit_Indexing_packages_must_not_reference_Microsoft_AspNetCore`	Indexing is a horizontal framework — CLI tools and background workers consume it without dragging in the web stack.
`EntityFrameworkCore_NuGets_only_in_the_EntityFrameworkCore_backend`	EF Core stays confined to the EF backend; the base contract and other backends remain provider-pure.
`IIndexer_implementations_must_not_call_Console_or_Trace`	All telemetry flows through `IndexingMetrics` + `IndexingActivitySource` — never stdout.

Full-Text & Semantic Search — Postgres & Elasticsearch

Package structure

Contracts

Write port — `IIndexer<TKey>`

Read port — `ISearchService<TKey, TResult>`

Authorization boundary

Writing an authorizer

Backends

Postgres `tsvector` (default)

Elasticsearch 9.x (opt-in)

Backend comparison

Language-aware analysis

AI providers (opt-in)

Hybrid retrieval — Reciprocal Rank Fusion

Background reindex with checkpoint resume

Configuration cookbook

Architecture invariants

See also

Sub-pages

Full-Text & Semantic Search — Postgres & Elasticsearch

Package structure

Contracts

Write port — IIndexer<TKey>

Read port — ISearchService<TKey, TResult>

Authorization boundary

Writing an authorizer

Backends

Postgres tsvector (default)

Elasticsearch 9.x (opt-in)

Backend comparison

Language-aware analysis

AI providers (opt-in)

Hybrid retrieval — Reciprocal Rank Fusion

GDPR Art. 17 — atomic erasure path

Background reindex with checkpoint resume

Configuration cookbook

Architecture invariants

See also

Sub-pages

Related

Write port — `IIndexer<TKey>`

Read port — `ISearchService<TKey, TResult>`

Postgres `tsvector` (default)