Skip to content

Diagnostics

Granit.Diagnostics adds Kubernetes-native health check endpoints with stampede-protected caching and a structured JSON response writer for Grafana dashboards.

PackageRoleDepends on
Granit.DiagnosticsHealth check endpoints, response writer, cachingGranit.Timing
[DependsOn(typeof(GranitDiagnosticsModule))]
public class AppModule : GranitModule { }

In Program.cs, after app.Build():

app.MapGranitHealthChecks();

MapGranitHealthChecks() registers three endpoints, all AllowAnonymous (the kubelet cannot authenticate):

ProbePathBehaviorFailure effect
Liveness/health/liveAlways returns 200 — no dependency checksPod restart
Readiness/health/readyChecks tagged "readiness"Pod removed from load balancer
Startup/health/startupChecks tagged "startup"Liveness/readiness disabled until healthy

Status code mapping (readiness and startup)

Section titled “Status code mapping (readiness and startup)”
HealthStatusHTTPEffect
Healthy200Pod receives traffic
Degraded200Pod stays in load balancer (non-critical degradation)
Unhealthy503Pod removed from load balancer

Granit modules provide opt-in health checks via AddGranit*HealthCheck() extension methods on IHealthChecksBuilder. Each check follows the same pattern: sanitized error messages (never exposing credentials), structured data where applicable, and appropriate tags for Kubernetes probes.

ModuleExtension methodProbeTags
Granit.PersistenceAddGranitDbContextHealthCheck()EF Core CanConnectAsyncreadiness, startup
Granit.Caching.StackExchangeRedisAddGranitRedisHealthCheck()Redis PING with latency thresholdreadiness, startup
Granit.VaultAddGranitVaultHealthCheck()Vault seal status + authreadiness, startup
Granit.Vault.AwsAddGranitKmsHealthCheck()KMS DescribeKey (Degraded on PendingDeletion)readiness, startup
Granit.Identity.KeycloakAddGranitKeycloakHealthCheck()client_credentials token requestreadiness, startup
Granit.Identity.EntraIdAddGranitEntraIdHealthCheck()client_credentials token requestreadiness, startup
Granit.BlobStorage.S3AddGranitS3HealthCheck()ListObjectsV2(MaxKeys=1)readiness, startup
Granit.Notifications.Email.SmtpAddGranitSmtpHealthCheck()EHLO handshake via MailKitreadiness
Granit.Notifications.Email.AwsSesAddGranitAwsSesHealthCheck()GetAccount() (Degraded if sending paused)readiness, startup
Granit.Notifications.BrevoAddGranitBrevoHealthCheck()GET /accountreadiness
Granit.Notifications.ZulipAddGranitZulipHealthCheck()GET /api/v1/users/mereadiness
Granit.Vault.AzureAddGranitAzureKeyVaultHealthCheck()GetKey probereadiness
Granit.Notifications.Email.AzureCommunicationServicesAddGranitAcsEmailHealthCheck()Send probereadiness
Granit.Notifications.Sms.AzureCommunicationServicesAddGranitAcsSmsHealthCheck()Send probereadiness
Granit.Notifications.MobilePush.AzureNotificationHubsAddGranitAzureNotificationHubsHealthCheck()Hub descriptionreadiness

All built-in health checks wrap their external call with .WaitAsync(10s, cancellationToken). If the dependency does not respond within 10 seconds, the check returns Unhealthy immediately instead of blocking the Kubernetes probe cycle.

Tag your health checks with "readiness" and/or "startup" so they are picked up by the correct probe. Use the built-in extension methods when available:

builder.Services.AddHealthChecks()
.AddGranitDbContextHealthCheck<AppDbContext>()
.AddGranitRedisHealthCheck(degradedThreshold: TimeSpan.FromMilliseconds(100))
.AddGranitKeycloakHealthCheck()
.AddGranitAwsSesHealthCheck()
.AddGranitBrevoHealthCheck()
.AddGranitZulipHealthCheck()
.AddGranitAzureKeyVaultHealthCheck()
.AddGranitAcsEmailHealthCheck()
.AddGranitAcsSmsHealthCheck()
.AddGranitAzureNotificationHubsHealthCheck();

GranitHealthCheckWriter produces a structured JSON payload for Grafana/Loki dashboards. Kubernetes only reads the HTTP status code; the body is for operations teams.

{
"status": "Healthy",
"duration": 12.3,
"checks": [
{
"name": "database",
"status": "Healthy",
"duration": 8.1,
"tags": ["readiness", "startup"]
}
]
}

CachedHealthCheck wraps any IHealthCheck with a SemaphoreSlim double-check locking pattern to prevent stampede when many pods are probed simultaneously.

sequenceDiagram
    participant P1 as Probe 1
    participant P2 as Probe 2
    participant C as CachedHealthCheck
    participant DB as Dependency

    P1->>C: CheckHealthAsync()
    P2->>C: CheckHealthAsync()
    C->>C: Cache expired
    C->>C: Acquire SemaphoreSlim
    Note over P2,C: P2 waits (lock held)
    C->>DB: inner.CheckHealthAsync()
    DB-->>C: Healthy
    C->>C: Cache result (10s default)
    C->>C: Release lock
    C-->>P1: Healthy
    C->>C: Double-check → cache hit
    C-->>P2: Healthy (from cache)

With 50 pods probed every 3 seconds, an uncached database check generates ~16 req/s. The cache reduces that to 1 request per DefaultCacheDuration per pod.

PropertyTypeDefaultDescription
LivenessPathstring"/health/live"Liveness probe endpoint path
ReadinessPathstring"/health/ready"Readiness probe endpoint path
StartupPathstring"/health/startup"Startup probe endpoint path
DefaultCacheDurationTimeSpan00:00:10Cache TTL for CachedHealthCheck
CategoryKey typesPackage
ModuleGranitDiagnosticsModule
Health checksCachedHealthCheck, GranitHealthCheckWriterGranit.Diagnostics
OptionsDiagnosticsOptionsGranit.Diagnostics
ExtensionsAddGranitDiagnostics(), MapGranitHealthChecks()Granit.Diagnostics