Granit.Observability & Diagnostics

Granit.Observability wires Serilog structured logging and OpenTelemetry (traces + metrics) into a single AddGranitObservability() call. Granit.Diagnostics adds Kubernetes-native health check endpoints with stampede-protected caching and a JSON response writer for Grafana dashboards.

Package structure

Granit.Observability Serilog + OpenTelemetry OTLP export
Granit.Diagnostics Kubernetes health probes (liveness, readiness, startup)

Package	Role	Depends on
`Granit.Observability`	Serilog + OpenTelemetry (traces, metrics, logs)	`Granit.Core`
`Granit.Diagnostics`	Health check endpoints, response writer, caching	`Granit.Timing`

OTLP pipeline

graph LR
    App[ASP.NET Core App] --> Serilog
    App --> OTel[OpenTelemetry SDK]

    Serilog -->|WriteTo.OpenTelemetry| Collector[OTLP Collector :4317]
    OTel -->|OTLP gRPC| Collector

    Collector --> Loki[Loki — Logs]
    Collector --> Tempo[Tempo — Traces]
    Collector --> Mimir[Mimir — Metrics]

    Loki --> Grafana
    Tempo --> Grafana
    Mimir --> Grafana

Setup

[DependsOn(typeof(GranitObservabilityModule))]
public class AppModule : GranitModule { }

{
  "Observability": {
    "ServiceName": "my-backend",
    "ServiceVersion": "1.2.0",
    "OtlpEndpoint": "http://otel-collector:4317",
    "ServiceNamespace": "my-company",
    "Environment": "production"
  }
}

[DependsOn(typeof(GranitDiagnosticsModule))]
public class AppModule : GranitModule { }

In Program.cs, after app.Build():

app.MapGranitHealthChecks();

[DependsOn(
    typeof(GranitObservabilityModule),
    typeof(GranitDiagnosticsModule))]
public class AppModule : GranitModule { }

var app = builder.Build();
app.MapGranitHealthChecks();
app.Run();

Serilog configuration

AddGranitObservability() configures two Serilog sinks:

Sink	Purpose
`Console`	Local development, `[HH:mm:ss LEV] SourceContext Message`
`OpenTelemetry`	OTLP export to Loki via the collector

Every log entry is enriched with ServiceName, ServiceVersion, and Environment properties, matching the OpenTelemetry resource attributes for correlation.

Additional Serilog settings (minimum level, overrides, extra sinks) can be added via standard Serilog configuration in appsettings.json — ReadFrom.Configuration is called before the Granit enrichers.

OpenTelemetry instrumentation

Three built-in instrumentations are registered automatically:

Instrumentation	What it captures
ASP.NET Core	Inbound HTTP requests (method, route, status code)
HttpClient	Outbound HTTP calls (dependency tracking)
EF Core	Database queries (command text, duration)

Health check endpoints (/health/*) are filtered out of traces to avoid noise.

Activity source auto-registration

Granit modules register their own ActivitySource names via GranitActivitySourceRegistry.Register() during host configuration. AddGranitObservability() reads the registry and calls AddSource() for each — no manual wiring needed.

// Inside a module's AddGranit*() extension method
GranitActivitySourceRegistry.Register("Granit.Workflow");

Registered activity sources

Modules that create spans register their ActivitySource name at startup. The table below lists every source and the span names it emits.

ActivitySource	Span names
`Granit.Vault`	`vault.encrypt`, `vault.decrypt`, `vault.get-secret`, `vault.check-rotation`
`Granit.Vault.Azure`	`akv.encrypt`, `akv.decrypt`, `akv.get-secret`, `akv.check-rotation`
`Granit.Wolverine`	`wolverine.send`, `wolverine.handle`
`Granit.Notifications`	`notification.dispatch`, `notification.deliver`
`Granit.Notifications.Email.Smtp`	`smtp.send`
`Granit.Notifications.Email.AwsSes`	`ses.send`
`Granit.Notifications.Email.AzureCommunicationServices`	`acs-email.send`
`Granit.Notifications.Sms.AzureCommunicationServices`	`acs-sms.send`
`Granit.Notifications.MobilePush.AzureNotificationHubs`	`anh.send`
`Granit.Notifications.Brevo`	`brevo.send`
`Granit.Notifications.Zulip`	`zulip.send`
`Granit.Workflow`	`workflow.transition`
`Granit.BlobStorage`	`blob.upload`, `blob.download`, `blob.delete`
`Granit.DataExchange`	`import.execute`, `export.execute`

Kubernetes health probes

MapGranitHealthChecks() registers three endpoints, all AllowAnonymous (the kubelet cannot authenticate):

Probe	Path	Behavior	Failure effect
Liveness	`/health/live`	Always returns `200` — no dependency checks	Pod restart
Readiness	`/health/ready`	Checks tagged `"readiness"`	Pod removed from load balancer
Startup	`/health/startup`	Checks tagged `"startup"`	Liveness/readiness disabled until healthy

Status code mapping (readiness and startup)

HealthStatus	HTTP	Effect
`Healthy`	`200`	Pod receives traffic
`Degraded`	`200`	Pod stays in load balancer (non-critical degradation)
`Unhealthy`	`503`	Pod removed from load balancer

Built-in health checks

Granit modules provide opt-in health checks via AddGranit*HealthCheck() extension methods on IHealthChecksBuilder. Each check follows the same pattern: sanitized error messages (never exposing credentials), structured data where applicable, and appropriate tags for Kubernetes probes.

Module	Extension method	Probe	Tags
`Granit.Persistence`	`AddGranitDbContextHealthCheck()`	EF Core `CanConnectAsync`	readiness, startup
`Granit.Caching.StackExchangeRedis`	`AddGranitRedisHealthCheck()`	Redis `PING` with latency threshold	readiness, startup
`Granit.Vault`	`AddGranitVaultHealthCheck()`	Vault seal status + auth	readiness, startup
`Granit.Vault.Aws`	`AddGranitKmsHealthCheck()`	KMS `DescribeKey` (Degraded on PendingDeletion)	readiness, startup
`Granit.Identity.Keycloak`	`AddGranitKeycloakHealthCheck()`	`client_credentials` token request	readiness, startup
`Granit.Identity.EntraId`	`AddGranitEntraIdHealthCheck()`	`client_credentials` token request	readiness, startup
`Granit.BlobStorage.S3`	`AddGranitS3HealthCheck()`	`ListObjectsV2(MaxKeys=1)`	readiness, startup
`Granit.Notifications.Email.Smtp`	`AddGranitSmtpHealthCheck()`	EHLO handshake via MailKit	readiness
`Granit.Notifications.Email.AwsSes`	`AddGranitAwsSesHealthCheck()`	`GetAccount()` (Degraded if sending paused)	readiness, startup
`Granit.Notifications.Brevo`	`AddGranitBrevoHealthCheck()`	`GET /account`	readiness
`Granit.Notifications.Zulip`	`AddGranitZulipHealthCheck()`	`GET /api/v1/users/me`	readiness
`Granit.Vault.Azure`	`AddGranitAzureKeyVaultHealthCheck()`	GetKey probe	readiness
`Granit.Notifications.Email.AzureCommunicationServices`	`AddGranitAcsEmailHealthCheck()`	Send probe	readiness
`Granit.Notifications.Sms.AzureCommunicationServices`	`AddGranitAcsSmsHealthCheck()`	Send probe	readiness
`Granit.Notifications.MobilePush.AzureNotificationHubs`	`AddGranitAzureNotificationHubsHealthCheck()`	Hub description	readiness

Defensive timeout

All built-in health checks wrap their external call with .WaitAsync(10s, cancellationToken). If the dependency does not respond within 10 seconds, the check returns Unhealthy immediately instead of blocking the Kubernetes probe cycle.

Registering health checks

Tag your health checks with "readiness" and/or "startup" so they are picked up by the correct probe. Use the built-in extension methods when available:

builder.Services.AddHealthChecks()
    .AddGranitDbContextHealthCheck<AppDbContext>()
    .AddGranitRedisHealthCheck(degradedThreshold: TimeSpan.FromMilliseconds(100))
    .AddGranitKeycloakHealthCheck()
    .AddGranitAwsSesHealthCheck()
    .AddGranitBrevoHealthCheck()
    .AddGranitZulipHealthCheck()
    .AddGranitAzureKeyVaultHealthCheck()
    .AddGranitAcsEmailHealthCheck()
    .AddGranitAcsSmsHealthCheck()
    .AddGranitAzureNotificationHubsHealthCheck();

JSON response format

GranitHealthCheckWriter produces a structured JSON payload for Grafana/Loki dashboards. Kubernetes only reads the HTTP status code; the body is for operations teams.

{
  "status": "Healthy",
  "duration": 12.3,
  "checks": [
    {
      "name": "database",
      "status": "Healthy",
      "duration": 8.1,
      "tags": ["readiness", "startup"]
    }
  ]
}

Health check caching

CachedHealthCheck wraps any IHealthCheck with a SemaphoreSlim double-check locking pattern to prevent stampede when many pods are probed simultaneously.

sequenceDiagram
    participant P1 as Probe 1
    participant P2 as Probe 2
    participant C as CachedHealthCheck
    participant DB as Dependency

    P1->>C: CheckHealthAsync()
    P2->>C: CheckHealthAsync()
    C->>C: Cache expired
    C->>C: Acquire SemaphoreSlim
    Note over P2,C: P2 waits (lock held)
    C->>DB: inner.CheckHealthAsync()
    DB-->>C: Healthy
    C->>C: Cache result (10s default)
    C->>C: Release lock
    C-->>P1: Healthy
    C->>C: Double-check → cache hit
    C-->>P2: Healthy (from cache)

With 50 pods probed every 3 seconds, an uncached database check generates ~16 req/s. The cache reduces that to 1 request per DefaultCacheDuration per pod.

Configuration reference

ObservabilityOptions section: Observability

Property	Type	Default	Description
`ServiceName`	`string`	`"unknown-service"`	Service name for OTEL resource
`ServiceVersion`	`string`	`"0.0.0"`	Service version
`OtlpEndpoint`	`string`	`"http://localhost:4317"`	OTLP gRPC endpoint
`ServiceNamespace`	`string`	`"my-company"`	Service namespace
`Environment`	`string`	`"development"`	Deployment environment
`EnableTracing`	`bool`	`true`	Enable trace export via OTLP
`EnableMetrics`	`bool`	`true`	Enable metrics export via OTLP

DiagnosticsOptions section: DiagnosticsOptions

Property	Type	Default	Description
`LivenessPath`	`string`	`"/health/live"`	Liveness probe endpoint path
`ReadinessPath`	`string`	`"/health/ready"`	Readiness probe endpoint path
`StartupPath`	`string`	`"/health/startup"`	Startup probe endpoint path
`DefaultCacheDuration`	`TimeSpan`	`00:00:10`	Cache TTL for `CachedHealthCheck`

Public API summary

Category	Key types	Package
Modules	`GranitObservabilityModule`, `GranitDiagnosticsModule`	—
Options	`ObservabilityOptions`, `DiagnosticsOptions`	—
Health checks	`CachedHealthCheck`, `GranitHealthCheckWriter`	`Granit.Diagnostics`
Activity sources	`GranitActivitySourceRegistry`	`Granit.Core`
Extensions	`AddGranitObservability()`, `AddGranitDiagnostics()`, `MapGranitHealthChecks()`	—