Semantic Search & RAG
Traditional search relies on keywords: if the user types “connection problem”, your
LIKE '%connection problem%' query won’t find the document titled “Authentication error 401”.
The words are different, but the meaning is the same.
Semantic search solves this by comparing meaning, not text. Granit.AI.VectorData brings
this capability to your application with multi-tenant isolation and a clean abstraction.
What are embeddings? (the 30-second explanation)
Section titled “What are embeddings? (the 30-second explanation)”An embedding is a list of numbers (a vector) that represents the meaning of a text. Texts with similar meanings have similar vectors — even if they use completely different words.
"connection problem" → [0.82, 0.15, 0.67, 0.31, ...] ─┐"authentication error" → [0.79, 0.18, 0.64, 0.29, ...] ─┤ Similar vectors! │ (close in vector space)"chocolate cake recipe" → [0.12, 0.91, 0.03, 0.88, ...] ─┘ Different vector (far in vector space)The AI model (embedding model) converts text into these vectors. Then, to search, you convert the user’s query into a vector and find the stored vectors closest to it. That’s semantic search.
How it works in Granit
Section titled “How it works in Granit”flowchart LR
subgraph Indexing
D[Document text] --> EG[Embedding model]
EG --> V1[Vector 0.82, 0.15, ...]
V1 --> DB[(Vector store)]
end
subgraph Searching
Q[User query] --> EG2[Embedding model]
EG2 --> V2[Query vector]
V2 --> DB
DB -->|nearest neighbors| R[Ranked results]
end
Two phases:
- Indexing — convert documents into vectors and store them
- Searching — convert the query into a vector and find the closest stored vectors
Granit.AI.VectorData handles both through ISemanticSearchService.
What is RAG? (Retrieval-Augmented Generation)
Section titled “What is RAG? (Retrieval-Augmented Generation)”RAG is a pattern that combines semantic search with LLM generation:
flowchart TD
Q[User question] --> SS[Semantic Search]
SS -->|top 5 relevant docs| CTX[Context builder]
CTX --> LLM[IChatClient]
LLM --> A[Answer grounded in your data]
style SS fill:#e8f5e9
style LLM fill:#e3f2fd
- Retrieve — find the most relevant documents using semantic search
- Augment — add those documents as context to the LLM prompt
- Generate — the LLM answers using your data, not its training data
This is how you build:
- FAQ bots that answer from your actual documentation
- Customer support assistants that know your product
- Internal knowledge bases that search across all company documents
The LLM doesn’t hallucinate (as much) because it’s grounded in real documents that you control.
builder.AddGranitAI();builder.AddGranitAIOllama(); // for embeddings (nomic-embed-text)builder.AddGranitAIVectorData();// + a vector store provider (coming soon: Granit.AI.VectorData.PgVector)High-level API: ISemanticSearchService
Section titled “High-level API: ISemanticSearchService”For most use cases, ISemanticSearchService is all you need. It combines embedding
generation + vector storage in two simple methods:
Indexing documents
Section titled “Indexing documents”public class FaqIndexingService(ISemanticSearchService search){ public async Task IndexFaqAsync( FaqEntry faq, CancellationToken cancellationToken) { // Convert FAQ content to a vector and store it await search.IndexAsync( collectionName: "faq", key: faq.Id.ToString(), text: $"{faq.Question}\n{faq.Answer}", cancellationToken).ConfigureAwait(false); }}Searching
Section titled “Searching”public class FaqSearchService(ISemanticSearchService search){ public async Task<IReadOnlyList<SemanticSearchResult>> SearchAsync( string userQuestion, CancellationToken cancellationToken) { return await search.SearchAsync( collectionName: "faq", query: userQuestion, limit: 5, cancellationToken).ConfigureAwait(false); }}Full RAG example
Section titled “Full RAG example”public class FaqChatService( ISemanticSearchService search, IAIChatClientFactory chatFactory){ public async Task<string> AskAsync( string question, CancellationToken cancellationToken) { // 1. Retrieve relevant FAQ entries IReadOnlyList<SemanticSearchResult> relevant = await search .SearchAsync("faq", question, limit: 3, cancellationToken) .ConfigureAwait(false);
// 2. Build context from retrieved documents string context = string.Join("\n\n", relevant.Select(r => r.Text));
// 3. Generate answer grounded in context IChatClient client = await chatFactory .CreateAsync("support-chat", cancellationToken) .ConfigureAwait(false);
ChatResponse response = await client.GetResponseAsync( $""" Answer the user's question based ONLY on the following context. If the answer is not in the context, say "I don't know".
Context: {context}
Question: {question} """, cancellationToken: cancellationToken).ConfigureAwait(false);
return response.Text; }}Low-level API: IVectorCollection
Section titled “Low-level API: IVectorCollection”For advanced use cases (custom record types, metadata filtering, batch operations),
use IVectorCollectionFactory directly:
public class PatientSimilarityService(IVectorCollectionFactory factory){ public async Task<IReadOnlyList<VectorSearchResult<PatientRecord>>> FindSimilarAsync( ReadOnlyMemory<float> queryVector, CancellationToken cancellationToken) { IVectorCollection<PatientRecord> collection = factory.GetCollection<PatientRecord>("patients");
return await collection .SearchAsync(queryVector, limit: 10, cancellationToken) .ConfigureAwait(false); }}Multi-tenancy
Section titled “Multi-tenancy”Vector collections are automatically scoped to the current tenant (same pattern as
IBlobKeyStrategy in blob storage). Tenant A’s documents are never returned in
Tenant B’s searches — enforced at the storage layer, not the application layer.
Embedding models
Section titled “Embedding models”The embedding model converts text to vectors. Different models produce different quality embeddings:
| Model | Provider | Dimensions | Quality | Speed |
|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | Good | Fast |
text-embedding-3-large | OpenAI | 3072 | Best | Slower |
nomic-embed-text | Ollama (local) | 768 | Good | Fast, free |
Configure the embedding model via a dedicated workspace:
{ "AI": { "VectorData": { "EmbeddingWorkspace": "embeddings" } }}Then define the workspace:
context.Add(new AIWorkspace{ Name = "embeddings", Provider = "Ollama", Model = "nomic-embed-text",});Vector store providers
Section titled “Vector store providers”Granit.AI.VectorData defines the abstractions. You need a provider to actually
store and search vectors:
| Provider | Package | Status | Best for |
|---|---|---|---|
| PgVector | Granit.AI.VectorData.PgVector | Coming soon | Zero extra infra (PostgreSQL) |
| Qdrant | Granit.AI.VectorData.Qdrant | Planned | High-performance, dedicated vector DB |
| Redis | Granit.AI.VectorData.Redis | Planned | Already using Redis for caching |
Advantages and risks
Section titled “Advantages and risks”Advantages
Section titled “Advantages”- Search by meaning — “connection problem” finds “authentication error 401”
- Multilingual — French questions find English documents (embedding models are multilingual)
- No reindexing on schema changes — vectors are schema-free
- Composable — combine with Granit.Querying for hybrid search (semantic + structured filters)
Risks and mitigations
Section titled “Risks and mitigations”| Risk | Impact | Mitigation |
|---|---|---|
| Embedding quality | Poor embeddings = poor search results | Use a good model (text-embedding-3-small minimum), test with real queries |
| Stale index | Document updated but vector not re-indexed | Re-index on entity change events (Wolverine integration event) |
| Cost | Embedding generation costs tokens | Batch indexing, cache embeddings, local models (Ollama) for dev |
| Storage size | 1536 floats × 4 bytes = 6 KB per vector | PgVector handles millions; Qdrant handles billions |
| GDPR | Embeddings are derived from data | Right to erasure applies — delete vectors when deleting source data |
Configuration reference
Section titled “Configuration reference”| Property | Type | Default | Description |
|---|---|---|---|
EmbeddingWorkspace | string | "default" | Workspace for embedding generation |
DefaultSearchLimit | int | 10 | Default number of search results |
See also
Section titled “See also”- Granit.AI overview — core module, providers, workspaces
- AI: Document Extraction — extract text from PDFs for indexing
- AI: Natural Language Query — structured filter search (complementary to semantic search)