Background Jobs in .NET Without Hangfire: Wolverine + Cronos + Outbox
You run two pods of your .NET API behind a load balancer. Every night at 03:00, “send daily summary emails” runs twice and 5,000 customers get duplicate invoices. You add a distributed lock with Redis. It works until Redis fails over and you get duplicates again. You add idempotency keys to the email sender. The accountant asks why three customers got two emails on Wednesday.
The hard part of background jobs is not “run this in five minutes”. It is:
- Exactly-once execution in a cluster — only one node runs the 03:00 job.
- Atomic scheduling — if the handler succeeds, the next run is persisted. If it crashes, no orphan and no duplicate.
- Transactional consistency — the job and the side effects (DB writes, outbound messages) commit together.
- Cluster-safe leader election — works during rolling deploys, pod restarts, network partitions.
Hangfire handles (1) with a SQL-level distributed lock and is comfortable up to a single SQL server. For (2), (3) and (4), most teams discover the limits the hard way. Wolverine + Cronos gives you all four by piggybacking on the same outbox you already use for events. No second database. No second scheduler. One mental model.
What “background job” actually means in 2026
Section titled “What “background job” actually means in 2026”The label covers three quite different things:
| Kind | Trigger | Example | What it needs |
|---|---|---|---|
| Fire-and-forget | Inline (PublishAsync) | “send welcome email” | Outbox, retries |
| Delayed | One-shot future time | ”remind in 24h if no reply” | Persistent scheduling |
| Recurring (cron) | Cron expression | ”nightly export at 03:00” | Leader election, idempotency |
Hangfire blurs these. Wolverine separates them cleanly — and the boring ones (fire-and-forget, delayed) are already handled by its message bus and outbox. The interesting case is recurring jobs, which is where this article focuses.
The two failures nobody warns you about
Section titled “The two failures nobody warns you about”Failure 1 — Two pods, one cron
Section titled “Failure 1 — Two pods, one cron”Naive cron schedulers run on every node. Two nodes, one job at 03:00, two executions. You learn this on day two of horizontal scaling.
The fix is a singleton agent: one (and only one) pod runs the scheduler. If that pod dies, another takes over within seconds. Wolverine ships exactly this primitive: SingularAgent — a cluster-wide leader-elected service hosted by one node at a time.
context.Services.AddSingularAgent<CronSchedulerAgent>();Inside the agent, scheduling is a one-line call:
DateTimeOffset next = cron.GetNextOccurrence(clock.Now, TimeZoneInfo.Utc) ?? DateTimeOffset.MaxValue;await messageBus.ScheduleAsync(new HourlyCleanupJob(), next);That’s the entire scheduler. Cronos parses the expression, Wolverine handles persistence, the singular agent guarantees one writer.
Failure 2 — Scheduling outside the transaction
Section titled “Failure 2 — Scheduling outside the transaction”Look closely at any cron scheduler that is not transactional. The “schedule the next occurrence” call happens after the handler returns. If the node crashes between the handler success and the schedule call, the next occurrence is lost. The job stops firing. Silently.
The opposite mistake: schedule first, run second. Now the node crashes between scheduling and the actual work. The work never happened, but the next one will fire. Silent gap, no audit trail.
The only safe pattern is: schedule the next occurrence inside the same database transaction as the handler.
This is the textbook Outbox pattern. Wolverine implements it natively: IMessageContext.ScheduleAsync(...) inside a handler writes to the outbox table, in the handler’s DB transaction. Commit succeeds → next run is durable. Commit fails → next run never existed; the current message is redelivered.
The Granit ergonomics — [RecurringJob] and zero glue
Section titled “The Granit ergonomics — [RecurringJob] and zero glue”The full pattern is fiddly enough that you do not want to write it per job. Granit collapses it to two declarations:
[RecurringJob("0 * * * *", "billing-hourly-cleanup")]public sealed record HourlyCleanupJob : IBackgroundJob;internal static partial class HourlyCleanupHandler{ public static async Task HandleAsync(HourlyCleanupJob job, IBillingService billing, CancellationToken ct) => await billing.CleanupStaleQuotesAsync(ct); // Rescheduling is injected automatically.}That’s it. No explicit scheduler call. No worker class. No IHostedService.
What’s happening behind the scenes:
- At startup,
RecurringJobDiscoveryscans for[RecurringJob]types and seeds them into theBackgroundJobDefinitionstable. - The Wolverine handler chain for each decorated message has
RecurringJobSchedulingMiddlewareinjected. After every successful handler invocation, the middleware reads the current cron from the store, computes the next occurrence, and callsIMessageContext.ScheduleAsync— inside the handler’s transaction. - The
SingularAgentscheduler ensures the first occurrence (and the catch-up logic on startup) runs from one node only.
public async Task AfterAsync(Envelope envelope, IMessageContext context, CancellationToken ct){ RecurringJobAttribute? attr = envelope.Message?.GetType().GetCustomAttribute<RecurringJobAttribute>(); if (attr is null) return;
BackgroundJobDefinition? job = await storeReader.FindAsync(attr.Name, ct); if (job is not { IsEnabled: true }) return;
DateTimeOffset? next = CronExpression.Parse(job.CronExpression).GetNextOccurrence(clock.Now, TimeZoneInfo.Utc); if (next is null) return;
await context.ScheduleAsync(envelope.Message!, next.Value);}The handler can pause itself by flipping IsEnabled = false in the store — the next reschedule is skipped, the cron resumes when an admin re-enables it. No code change, no redeploy.
Where Cronos fits
Section titled “Where Cronos fits”Cronos is the small, fast cron parser by Hangfire’s author. It supports:
- 5-field (
min hour day month dow) and 6-field (sec min hour day month dow) expressions - DST-aware time zones (
TimeZoneInfo) - Calculation of the next occurrence from any starting
DateTime, including a range query for catch-up
Granit parses 6-field first and falls back to 5-field. You can write "*/30 * * * * *" for “every 30 seconds” or "0 3 * * 1-5" for “weekdays at 03:00”.
Comparison: Hangfire vs Wolverine + Cronos
Section titled “Comparison: Hangfire vs Wolverine + Cronos”| Concern | Hangfire | Wolverine + Cronos |
|---|---|---|
| Storage | Separate Hangfire schema | Your application DB (outbox) |
| Transaction with business logic | No (separate connection) | Yes (same DbContext + tx) |
| Cluster-safe singleton | SQL sp_getapplock (SQL Server) | SingularAgent (any backend) |
| Exactly-once on crash | Best-effort | Outbox-guaranteed |
| Dashboard UI | Bundled | Bring your own / Granit endpoints |
| License | LGPL (free) / Pro (paid) | MIT |
| Dependencies | Hangfire + storage adapter | Wolverine + Cronos (~2 MB) |
The killer column is “transaction with business logic”. Hangfire is near your DB; Wolverine writes the next-run schedule in the same transaction as the work it just did. That difference is what eliminates the duplicate-email class of bugs.
Operational ergonomics — what your SRE asks about
Section titled “Operational ergonomics — what your SRE asks about”A scheduler that admins cannot inspect at 03:14 AM is a liability. The minimum surface every background job system needs:
- List of jobs with last run, last status, next occurrence.
- Pause / resume without redeploy.
- Trigger now for manual catch-up.
- Audit — who triggered what, when. Required for ISO 27001 (A.12.4 logging).
Granit exposes these as IBackgroundJobReader (CQRS query side) and IBackgroundJobWriter (write side, audited). The writer captures the X-Triggered-By header into the job definition so that manual runs are distinguishable from cron runs in the audit log.
app.MapPost("/admin/jobs/{name}/trigger", async (string name, IBackgroundJobWriter writer, ICurrentUser user, CancellationToken ct) =>{ await writer.TriggerNowAsync(name, triggeredBy: user.Email, ct); return Results.Accepted();});Takeaways
Section titled “Takeaways”- Background jobs in a cluster are an outbox problem, not a scheduling problem. Solve the outbox and the schedule comes for free.
- One scheduler, leader-elected. Many workers, all stateless. Anything else creates duplicates.
- Reschedule inside the handler’s transaction. Commit-or-rollback together is the only safe semantic.
- Pick a small cron library (Cronos), let your message bus (Wolverine) do persistence and retries.
- Expose pause/resume/trigger-now and audit from day one. Schedulers without admin surface become 03 AM pages.