Categories
Event Sourcing

Mastering Eventual Consistency With Event Sourcing

Event sourcing doesn’t necessitate introducing eventual consistency. I lay out some patterns and techniques to help sort this out!

Event sourcing can often be misunderstood. This includes the idea that using event sourcing automatically means that you must introduce eventual consistency everywhere in your system.

There are various techniques to deal with eventual consistency. But, some techniques and patterns can eliminate eventual consistency entirely!

In this article, I want to highlight some patterns and approaches that use event sourcing and enable flexibility around consistency & availability.

In other words – you don’t necessarily need eventual consistency.

CQRS != Eventual Consistency

Eventual consistency is the idea that your reads or queries are not immediately up-to-date with your writes. This can occur in non-event sourced systems due to database replication:

A replica read-only database is not necessarily up-to-date when queried.

In other words, your read storage is eventually up-to-date. That could be a couple of milliseconds, seconds, minutes, etc.

Many think that event sourcing automatically has to introduce eventual consistency everywhere. This is false.

Let’s look at some patterns where event-sourced systems can have some consistency.

On-The-Fly Projections

The usual/default approach to event-sourced systems is to asynchronously update a different read store than your event (write) store:

Default: Write and read are different databases.

With this approach, you’ve asynchronously pre-calculated and cached your queries.

However, there’s no reason you can’t build a read model at run-time. This is what you do every time you write to your event store: you build a projection of your stream’s current state by replaying all of its events. Then, using that projected state, you decide whether to allow or reject the write operation:

The event stream’s current state is calculated on the fly by replaying all of its events.

There’s no reason you can’t return the resulting read model to the caller.

For example, this might work well if a given HTTP query maps to details that can be derived by replaying events from one stream (although you could read from multiple streams):

You can build a read model on the fly and return it to your UI, for example.

By taking this approach for a given UI component, API endpoint, etc. your query is using the most up-to-date data that your system has.

Inline Projections

This pattern depends on the technology you’ve decided to use as your event store.

An inline projection is when you write to a stream and push any generated domain events to one or many projections – all within the same atomic transaction.

Let’s say you’ve used PostgreSQL to store your streams and events. This affords you the ability to store any new events & updated read models within the same transaction.

This will give your read models immediate consistency.

Updating some read models within the same transaction as your writes can afford immediate consistency.

Of course, with all of these patterns come pros and cons.

Inline projections have a few trade-offs:

  • Your write transactions will take longer to process. This will grow the more inline projections you introduce.
  • Scaling your inline projections means having to scale both your write and read stores together (since they are the same database).
  • Re-building your projections has to be completed before you can write to your event store (vs. async re-builds – they can occur while writes still append to the event store).

Given these trade-offs, using inline projections should be used with care.

Notably, use cases where high write traffic is expected and/or high write latency/throughput is important, then inline projections might not be the best option.

Read Your Writes

One of the big drawbacks of inline projections is that you have to scale your read models with your event store.

What if you really wanted to have the best of both worlds? That is, the ability to scale your read models independently of your event store but with immediate consistency?

One approach is to “read your writes”.

This pattern usually takes the following general form:

  1. Successfully write to an event store.
  2. For the next X seconds, your reads will use the same [system] as your event store.
  3. After X seconds, all reads switch to a set of highly-available asynchronously replicated read models.

“[system]” could be a database, data center, geographical region, etc.

For that reason, this pattern can be implemented in a number of different ways. Let’s look at some examples.

Replicated Inline Projections

In this case, you’re using inline projections to ensure that some of your read models are immediately consistent with your event store.

However, you also have a set of replicas that asynchronously mirror those read models:

A mixture of patterns where reads are highly guaranteed to be consistent with your write store while scaling read models for high availability.

Replicated Local Async Read Models

Asynchronously updated read models have the concern of eventual consistency. Read models might take a while to catch up to the event store (“a while” for some systems can be measured in ms).

You can use the “read your writes” pattern and use dedicated instances of your read models which are updated very quickly within the first X seconds after your writes.

Usually, this is done by using a write & read store that are on the same network or physical hardware.

Reads immediately after writing a to the event store will use the “local” read store that will have a higher guarantee of being up-to-date than the distributed replicas used for high availability.

You might even place the distributed replicas into various geographical regions:

All writes go to one master node. After a given time period from writing, a client’s traffic will switch over to one of many globally distributed replicas.

And yes, I lied. This technique does introduce eventual consistency… but it can still help in situations where you need high write throughput, for example!

Replicated Local Read Models Across Geography

Let’s take the same kind of “read your writes” techniques we covered, but replicate the entire configuration across multiple geographical regions:

Event stores are replicated across geographical regions.

In this configuration, geographical regions will asynchronously update with each other over time. However, a user will interact with a cluster of stores that are “local” to him/her.

If a given region goes down, then a user’s traffic can be re-routed to another cluster.

If a cluster/region goes down, traffic can be re-routed to another region.

Also, each cluster might use a flavour of the “read your writes” technique. Other read models may use inline or on-the-fly projections. This still gives you the freedom to choose how each local cluster will function.

The trade-offs of this approach are:

  • The operational overhead is quite large – having to manage and configure all of this infrastructure.
  • The synchronization of write nodes in various regions may not be supported by certain storage technologies.

For example, PostgreSQL doesn’t support multi-master replication out-of-the-box, but 3rd-party tooling can enable this. Other cloud-native databases like Azure CosmosDB can support something like this.

Conclusion

Next time you are thinking about using event sourcing, remember that there are ways to reduce or even eliminate eventual consistency.

As with all software engineering patterns, don’t blindly apply a given pattern everywhere.

You have to decide based on considerations like:

  • Do any specific event streams require high throughput?
  • Do any specific read models need to be immediately consistent?
  • Are there use cases that will be okay with more lax guarantees?
  • Does my choice of storage technology allow transactional writes to an event store & read model?
  • What read models need to be highly available?
  • Can my engineering team manage the extra operational costs?
  • Does my choice of write storage support multi-master replication?
  • Do I really need this support?

One reply on “Mastering Eventual Consistency With Event Sourcing”

Leave a Reply