Dynamic Consistency Boundary in Marten, Part 4: Production considerations

The first three parts of this series (Part 1, Part 2, Part 3) covered the problem DCB solves, the plain Marten implementation, and the Wolverine variant. This part covers what changes when this leaves the laptop and reaches production.

Tag storage: TagTables vs HStore

Marten ships two physical layouts for tags:

TagTables (default): each tag type gets its own normalised table, with foreign keys to the event store. Easy to understand, easy to query ad-hoc with SQL, works on any Postgres install with no extensions required.
HStore (opt-in): tags live in an HSTORE column on the event row itself, indexed via a GIN index. Fewer rows, fewer joins, denser indexes. Faster reads on tag-heavy workloads. Requires the hstore extension. All major managed Postgres providers (RDS, Cloud SQL, Neon, Supabase) support it.

Rough guidance:

Just getting started or running moderate throughput: TagTables.
Multi-thousand-per-second appends with several tags per event: HStore.
DBA forbids Postgres extensions for compliance reasons: TagTables.
Ad-hoc reporting joins on tags from psql are useful to you: TagTables.

Switching modes after the fact is possible but not free. Both layouts will need to coexist during the migration. Pick once, deliberately, before reaching scale.

The global-lock failure mode

This is the single most common production injury self-inflicted with DCB and the section worth reading most carefully.

Consider what looks like a reasonable extension to the coupon system. You want to enforce a store-wide cap on Black Friday discount spend:

var query = new EventTagQuery()
    .Or<CouponCode>(new CouponCode(code))
    .Or<Promotion>(new Promotion("BLACK_FRIDAY"));

If 50,000 coupons run under the Black Friday promotion at peak, every redemption now contends on the same Promotion("BLACK_FRIDAY") tag. Every FetchForWritingByTags scans all events tagged with the promotion. Every SaveChangesAsync runs an EXISTS check against the same tag. The DCB query has become a global lock with a more elegant API.

The pattern to internalise: the more events a tag matches, the closer that tag is to a global lock.

A few defensive rules:

Per-instance identifiers are fine. Cohort identifiers are dangerous. CouponCode("SUMMER25") is one coupon. Promotion("BLACK_FRIDAY") is a cohort of many coupons.
If a tag matches an unbounded set of events, the rule probably belongs somewhere other than DCB. Likely a rate limiter, a monitor, or an out-of-band ceiling check.
When reaching for a third or fourth .Or(...), ask how many events the tag actually matches in steady state. If the answer is “a lot”, reconsider.

A useful litmus test: a healthy DCB query touches dozens of events, maybe a few hundred. A sick one touches millions. Two orders of magnitude separate “doing exactly what it should” from “you reinvented row-level locking with extra steps.”

Hard cap by default, and what it costs

Part 2 showed DCB defending an exact cap: fifty concurrent redemptions against a cap of five, and exactly five land. As of Marten 9.4.0 that exactness is the default, and it is worth understanding the mechanism behind it - because it is also the source of the cost.

At SaveChangesAsync the DCB check is backed by a real serializing constraint. Marten keeps a side table parallel to mt_events (one row per tag boundary, carrying a version); the save emits an INSERT … ON CONFLICT DO UPDATE … WHERE version = $captured RETURNING 1 against that row. Two transactions appending under the same tag now contend on a row-level write lock at the session’s default READ COMMITTED isolation: the first to commit bumps the version, the second blocks until that commit lands, then finds its captured version stale, matches no rows on the RETURNING, and throws DcbConcurrencyException. One wins, the other loses - exactly the way stream-based optimistic concurrency already worked via the (stream_id, version) unique index. The check shape no longer depends on DcbStorageMode; both TagTables and HStore share the same constraint. There is no SERIALIZABLE transaction and no advisory lock of your own - just one extra row touched per save.

NOTE

This landed in Marten 9.4.0. Marten issue #4591 reported that, before 9.4, the DCB check ran as a plain non-locking SELECT EXISTS(...) separate from the insert at READ COMMITTED. Two truly-concurrent same-tag appends could each run that SELECT before either committed, both see a clean field, and both insert - so the cap was a soft cap with bounded slack (the count could briefly sit at N+1). The fix gives the DCB path the serializing side table described above, making the cap hard by default: your existing retry loop rejects the loser cleanly, with no advisory lock or SERIALIZABLE of your own. It ships as a mandatory schema change, so deployments with AutoCreate.None must run db-patch / db-apply before rolling out 9.4+. The companion sample targets 9.5.0, which is why its race tests assert exactly the cap; on 9.3.x they would intermittently see cap+1.

Exactness is not free, and the cost is the same throughput ceiling you would have paid to make a soft cap hard by hand. By definition, an exact cap means every redemption of the same coupon serializes through one decision point: a coupon hammered by hundreds of concurrent requests now processes them one at a time at the constraint. For a single coupon code that is cheap - the contention is naturally scoped to one hot key. The danger is when the tag is broad. This is exactly the global-lock failure mode from earlier in this part, now with teeth: under 9.4+ a redemption tagged Promotion("BLACK_FRIDAY") doesn’t just scan a huge set of events, it serializes every writer through that promotion’s constraint row. A hot tag is a hot tag, and the new constraint makes a broad one a genuine bottleneck. The defence is the same: keep tags per-instance, not cohort-wide.

A useful way to hold it: the question used to be “soft or hard?” - a scalability-vs-exactness trade you made per command. On 9.4+ that trade collapses for Marten users; you get exactness, and the cost moves into Postgres’s row-lock contention plus your existing retry. The remaining lever is tag design, not isolation level. If a rule genuinely cannot tolerate serializing on its tag - the tag is unavoidably broad and the volume is high - that is a signal the invariant may belong somewhere other than DCB (a rate limiter, an out-of-band monitor), the same conclusion the global-lock section reaches.

Two footnotes for completeness:

If you’re on 9.3.x or a DCB implementation without an equivalent constraint, the cap is still soft and you close the gap by hand - open the session at IsolationLevel.Serializable (Postgres aborts the loser with 40001, which you retry) or take a pg_advisory_xact_lock(hashtext(code)) scoped to the coupon before the fetch. Both serialize same-key redemptions; the advisory lock has the narrower blast radius. On 9.4+ you need neither.
Do you even want exactness? For most rules yes, and now you get it for free. But it is worth noticing what one extra would have cost: for a marketing coupon, pennies, recoverable by voiding a redemption; for physical inventory or a regulated limit, an oversold seat or a compliance breach. The 9.4 default happens to be right for the strict cases and harmless for the lax ones - the only place it bites is the broad-tag bottleneck above.

Forgetting the Wolverine retry policy

This trap only applies if you adopted the Wolverine.HTTP pattern from Part 3. It is the second silent failure mode worth flagging.

Wolverine does not auto-retry on transient exceptions. That includes DcbConcurrencyException. Without an explicit retry policy, a single concurrent redemption that triggers the consistency check turns into a 500 Internal Server Error for the losing request. The client sees a generic error page, your log fills with concurrency exceptions, and the actual cause looks like an outage instead of the routine, expected outcome it is.

The fix is small. Add a Configure(HandlerChain) method to the endpoint class:

public static void Configure(HandlerChain chain)
{
    chain.OnException<ConcurrencyException>()
        .RetryWithCooldown(50.Milliseconds(), 100.Milliseconds(), 250.Milliseconds());
}

Or set it once in Program.cs if every DCB endpoint in the service should share the policy:

builder.Host.UseWolverine(opts =>
{
    opts.Policies.OnException<ConcurrencyException>()
        .RetryWithCooldown(50.Milliseconds(), 100.Milliseconds(), 250.Milliseconds());
});

Why this is easy to miss: in development you almost never see two redemptions for the same coupon land in the same millisecond, so the policy is effectively dead code locally. Under real load with two clicks racing, the policy is what stands between you and a 500 page. The plain-Marten implementation from Part 2 cannot have this bug because the retry is a for-loop you have to write. The trade Wolverine makes is “I will generate the loop for you, but you have to tell me to.”

Wolverine 6 setup gotchas

A cluster of Wolverine traps, most of which throw loudly at app startup. They cost a confused half-hour the first time rather than a production incident, but they are easy to hit on a clean project. A couple of them throw at first request rather than at startup, which is harder to notice and worth flagging separately.

Missing AddWolverineHttp(). WolverineFx.Http requires this registration in addition to UseWolverine(). Without it the host throws at startup:

Required usage of IServiceCollection.AddWolverineHttp() is necessary for Wolverine.HTTP to function correctly

The fix is one line in Program.cs:

builder.Host.UseWolverine();
builder.Services.AddWolverineHttp();

Missing WolverineFx.RuntimeCompilation. Wolverine 6 removed the Roslyn runtime compiler from core. Handler code is still generated and compiled at runtime by default, but the compiler itself ships as a separate package. Without it:

Wolverine is running in TypeLoadMode.Dynamic, which compiles handler/middleware code at runtime, but no IAssemblyGenerator (Roslyn) is registered.

The fix is a package reference. It auto-registers when present:

<PackageReference Include="WolverineFx.RuntimeCompilation" Version="6.*" />

For production deployments you may want the alternative path: pre-generate handler code with the Wolverine codegen write CLI and set opts.CodeGeneration.TypeLoadMode = TypeLoadMode.Static. That removes the runtime Roslyn dependency entirely (smaller deploy, AOT-friendlier) at the cost of a build step. For a sample or a small service, the runtime-compilation package is the path of least friction.

UseLightweightSessions() on older builds (historical). Worth knowing if you hit it in an older sample, though it no longer applies on the versions this series targets. OutboxedSessionFactory - Wolverine’s session source for handlers and HTTP endpoints - opens its sessions through the registered ISessionFactory. On earlier builds the default was a heavy session, so without .UseLightweightSessions() identity-map behaviour kicked in and FetchForWritingByTags<T>(...) returned null aggregates even when the events existed - every DCB endpoint quietly returned 404, silently at startup and only visible at first request. Lightweight sessions are the default now, so the explicit call is redundant (the companion sample omits it); if you’re on an older build and seeing phantom 404s, adding .UseLightweightSessions() to the AddMarten(...) chain is the fix.

Missing AddEventType<T>() under IntegrateWithWolverine. Under DocumentStore.For(...) Marten lazily registers event types the first time session.Events.BuildEvent(new T(...)) is called. Under AddMarten(...) + IntegrateWithWolverine() the EventGraph that backs the DCB query’s event-type filter is finalised at startup, before any BuildEvent runs, so that lazy registration never reaches it. The fetch then filters on an empty set of event types, quietly returns zero events, and the endpoint returns 404 again. Register every event type explicitly during configuration:

opts.Events.AddEventType<CouponDefined>();
opts.Events.AddEventType<CouponRedeemed>();

Don’t put state-dependent checks in Validate. Wolverine.HTTP’s Validate / Before middleware methods exist for input-shape validation - things that don’t depend on the state of the world. “Is the request body well-formed”, “is this customer ID parseable”, “is the order total positive” belong there. DCB-style checks like “does this coupon exist”, “is this customer under the cap” do not. They are the business decision the endpoint exists to make, and they read from the same boundary the endpoint will write to.

Splitting that read across a sibling Validate(... [BoundaryModel] T state) method and the main endpoint creates the illusion of two stages where there is one, and at least at time of writing Wolverine’s source generator emits one fetch per [BoundaryModel] parameter with a type-derived variable name - two of them collide as CS0128: A local variable '...' is already defined in this scope. A patch is in flight upstream, but even once it merges the recommendation stands: state-dependent decisions belong in the endpoint, not in middleware.

Integration-test configuration timing

The WebApplication.CreateBuilder(args) minimal-host model has a subtle interaction with WebApplicationFactory<TProgram> (the test host that Alba and ASP.NET Core’s Microsoft.AspNetCore.Mvc.Testing both build on). It is not Wolverine-specific, but it bites Wolverine + Marten tests hard because the connection string is consumed eagerly.

Most production samples register Marten like this in Program.cs:

builder.Services.AddMarten(opts =>
{
    opts.Connection(builder.Configuration.GetConnectionString("Postgres")!);
    // ...
});

The opts => lambda runs synchronously during AddMarten(...), which is during service registration. It reads builder.Configuration right then. Anything added to configuration after that point cannot influence the connection string.

The natural test-side approach is to inject the connection string via Alba’s ConfigureAppConfiguration:

_host = await AlbaHost.For<Program>(b =>
{
    b.ConfigureAppConfiguration((_, cfg) =>
    {
        cfg.AddInMemoryCollection(new Dictionary<string, string?>
        {
            ["ConnectionStrings:Postgres"] = _postgres.GetConnectionString()
        });
    });
});

This does not work. ConfigureAppConfiguration registers an additional configuration source that is merged later in the host pipeline. By the time the test’s source is merged, AddMarten has already captured the original (default-only) configuration and pinned the connection string.

The symptom is usually a confusing Npgsql.PostgresException: 28P01: password authentication failed against whatever happens to be on the developer’s localhost:5432 - not against the Testcontainers Postgres the test actually started.

The reliable injection point is an environment variable, because WebApplication.CreateBuilder(args) adds environment-variable configuration up front, before Program.cs even gets a chance to read builder.Configuration. ASP.NET reads ConnectionStrings__Postgres (double underscore - the standard nested-key convention) as ConnectionStrings:Postgres:

public async Task InitializeAsync()
{
    await _postgres.StartAsync();
    Environment.SetEnvironmentVariable(
        "ConnectionStrings__Postgres", _postgres.GetConnectionString());
    _host = await AlbaHost.For<Program>(_ => { });
}

public async Task DisposeAsync()
{
    await _host.DisposeAsync();
    Environment.SetEnvironmentVariable("ConnectionStrings__Postgres", null);
    await _postgres.DisposeAsync();
}

Clear the variable on dispose, otherwise it leaks to other test classes in the same xUnit process. If you have several fixtures starting their own Testcontainers Postgres instances in parallel, you have a bigger problem to solve - the env-var approach is single-writer per process. The fix there is usually to refactor Program.cs to read the connection string lazily (deferred until first session), at which point Alba’s ConfigureAppConfiguration works again.

Tag governance

In Part 2 we registered tags like this:

opts.Events.RegisterTagType<CouponCode>("coupon");

The string "coupon" is persisted with every event tagged with CouponCode. It is part of your event store schema. Treat it as such:

Renaming "coupon" to "coupon_code" in code means old events still carry "coupon" on disk. They will not match new queries.
Adding a new tag to an event type means only events appended after the change carry it. Old events do not match queries on the new tag, so do not trust the new tag in invariants that look at history.
Splitting a tag (one CustomerId into BuyerId and SellerId) requires a backfill, not just a code change.

The cleanest practice: pick tag names once during initial design, write them down, and treat changes to that list as database migrations. If a rename is unavoidable, dual-write to both old and new tags for a release, then drop the old one once nothing queries it.

Event schema evolution under DCB

Event schema evolution under DCB is the same problem as under classic event sourcing (events live forever and must remain readable), with one extra wrinkle: a single FetchForWritingByTags pulls events of multiple types into one projection. The boundary aggregate must tolerate every shape every relevant event has ever had.

Practical guidelines:

Prefer additive changes. Add fields with sensible defaults rather than changing or removing them.
Use the Apply(...) method as an upcaster. If an old event lacks a field the guard needs, fill it in from a default. Do not try to “fix the past” by rewriting events.
Version events explicitly when you must. A new CouponRedeemedV2 event type alongside the old CouponRedeemed is uglier in code but easier in operations.
Be especially careful with tag changes on existing events. Old events do not get new tags retroactively, so queries depending on the new tag will under-match for historical data.

Marten’s projection tools (event upcasters, schema version negotiation) work the same under DCB as under aggregate-based event sourcing.

Migrating from existing aggregates incrementally

You probably have a Marten codebase already with aggregates and streams everywhere. There is no need to rip it out. A reasonable incremental adoption:

Add tags to existing events on append, without using DCB yet. Start tagging events at the point of creation. No DCB queries, no behaviour change. The tags simply populate the tag tables or HSTORE columns for future use. Risk is near zero.
Identify the one rule that has been causing trouble. The saga that compensates for over-redemption. The eventual-consistency window that bothers Compliance. The rule that has caused two production incidents in six months. Pick one command.
Rewrite that one command to use DCB. Build the boundary aggregate, switch the command to FetchForWritingByTags. Leave the rest of the codebase alone. Verify the saga is now unnecessary and delete it.
Stop and measure. Has the operational burden gone down? Has anything new gotten slower? If yes to both, pick the next command. If no, fix the problem before introducing more DCB.

Two well-chosen DCB commands beat a codebase where everything has been converted to [BoundaryAggregate] because the team got excited about the pattern.

When DCB is the wrong tool

DCB fits when:

There is a real business invariant.
It spans two or more entities.
Eventual consistency would be a bug, not a feature.

Avoid it when:

The invariant lives inside one aggregate. A BankAccount balance is fine with plain aggregate event sourcing. Cheaper to reason about, cheaper to run.
The rule is advisory. Quotas, soft rate limits, fraud thresholds rarely need transactional enforcement. A monitor and a circuit breaker is usually the right answer.
The constraint is uniqueness. Username uniqueness is better served by a Postgres unique index than by DCB.
The decision depends on data outside the event store. External pricing, external inventory lookups, anything DCB cannot atomically see.
“Which entity owns this rule?” has no good answer because the rule has not been modelled properly, not because it is genuinely cross-entity. DCB will paper over the modelling gap. Have the modelling conversation first.

A blunter test: if DCB were removed and replaced with a saga, how often would the saga need to compensate per day? If “approximately never”, DCB is overkill. If “twice a day, every day, and each time costs a support ticket”, DCB is doing real work.

DCB is not a better aggregate. It is a tool for one specific failure mode of the aggregate pattern. Use it for that. Keep aggregates for the rest.

Wrapping the series

Across the four posts we covered why aggregate-based event sourcing has a structural blind spot, how Marten’s DCB API closes it, how Wolverine cuts the ceremony, and how to avoid the production traps that otherwise teach the same lessons the hard way.

You should now be able to:

Recognise a DCB-shaped problem in your own systems.
Build one against Marten from scratch.
Adopt Wolverine to remove the boilerplate when it adds up.
Avoid the most common production traps.

If only one DCB-backed command ships out of reading this, pick the one currently keeping a saga alive. That is where DCB earns its keep most visibly.

Companion code, with both plain Marten and Wolverine variants and the test suites that drive them, lives in the dcb-coupon-sample repo.