Categories
domain-driven design

DDD & Data Modelling: How Do I Persist Aggregates?

Confused about how your DDD aggregates should be persisted? What are the trade-offs? What are the options?

How do we reconcile our aggregate models and data models? How does this affect how we persist aggregates?

In my last article, we looked at understanding domain-driven aggregates from a basic & simple point-of-view.

I want to continue the train-of-thought and dive a bit deeper into further topics.

One of the concerns developers seem to have once they learn about aggregates is how does designing aggregates affect my data model & how can I persist them?

We’ll cover:

Recap Of Our Model

Here’s what we came up with at the end of the previous article.

model

This design is based off of business considerations like:

  • Each team can have multiple staff members associated with it
  • Staff members can be a part of multiple teams
  • Each team can have multiple projects
  • Projects can be orphaned for a period of time (i.e. no associate team)
  •  A team member’s role can change per project

There are still many questions and aggregate specific design heuristics that we’ll look at in future articles. But for now, let’s look at how this might be persisted.

Persistance seems to be a concern that developers struggle with once we start modelling in this way. It’s very different that what the standard text-book web application design looks like (i.e. modelling your objects as a relational database).

Refining Our Model

Let’s refine our model a bit more to get a better understanding of what we’re dealing with. We’ll focus on the Team and Team Member relationships. We might call this the Team Aggregate.

refined model

Notice that I’ve modelled the team more closely to what you would expect in an object model. Our team “root” has a field TeamMembers that is a collection of Team Member entities.

Also, this is a simplified model for the sake of learning. In reality, we might have other pieces of data that include Role, for example.

A Team Member might have a specific Role within the team that dictates whether they are allowed to remove another Team Member from the Team, for example.

Another example might be an Email Address. Let’s say John is part of Team BlueJay. John, as a staff Member, has an email address.

However, Team Members might have a different email address specifically for that Team.

John’s “normal” Email Address might be john@gmail.com. But on Team BlueJay, he might want to have a separate one like john@teambluejay.com.

The Issue Of Duplicate Data

With that in mind, our model might look like:

focused model

The concern of “we are duplicating John’s email address in two places!” comes up. It’s possible that John’s normal and team email address is the same:

duplication

Yes, it looks like duplication. But, is it?

No.

One is John’s Team Member Email Address and the other is his Member Email Address.

Should changing his Team Member Email Address change his Member Email Address? That would seem awfully strange.

Imagine John needs to update his Email Address for Team BlueJay. That shouldn’t change his Email Address in another part of the system…right?

The difference here is data duplication vs. conceptual duplication.

How Do I Persist Aggregates, Then?

Let’s move on to the “main event”, as it were.

Relational Model

Because we are so used to modelling everything as tables with foreign keys and relationships, the model we’ve created here can seem a bit odd.

However, we can model this using a relational model. It might look something like this.

relational model
Sure, we could combine the composite table and Team Member table. For sake of demonstration, we’ll assume this was our first approach to a relational model.

You can see how there’s a mismatch between our domain model (aggregate) and our data model (relational model).

We’ve had to add a composite table Team / Team Member Table, for example.

Also, our code will have to manually iterate through records/rows for Team Member in order to re-build the domain object model in memory.

This is called impedance mismatch.

An object-oriented model, or any conceptual model, won’t necessarily match the relational model required by a relational database model.

Often this is true since relational databases are designed around avoidance of duplication, optimizations around complex set operations, and others.

But, we’ve been trained to think about modelling data using only one approach… there are others!

This brings up a myriad of second-order concerns. For example, how do we now think about:

  • Modelling transactions to ensure consistency for all the data in one aggregate?
  • Mapping our database records to our objects in a simple & performant way?
  • Does the fact that my aggregate’s data is physically stored in multiple tables have any other negative trade-offs?

We won’t answer those questions now, but it’s food for thought.

Document Persistence

In my experience, document databases have been often misunderstood. Document databases don’t magically enable you to build software faster at no-cost and at-scale.

Let me repeat that: Document databases are not a silver bullet. They have trade-offs.

Phew 😅!

A document persistence model seems so simple. But it pushes the onus of a well-thought out structure & design out to the domain model itself.

In other words, you really need to think about and design your conceptual/domain models well in order for a document model to fit.

Note: There are very simple scenarios where a document store might fit well with something like a simple domain model. Perhaps something that is by nature very CRUDy? But then, you wouldn’t be considering using DDD aggregates if that were the case.

If we are used to designing our domain models by using a relational model, then trying to shove that into a document database will pose many issues.

Instead, let’s take the aggregate model we’ve been working on and try to persist each aggregate as a document (excluding the Project aggregate).

Here are the two aggregates for reference:

domain model

Now, here’s what our two JSON documents might look like:

// Team
{
    "teamId": "someguid",
    "name": "Blue Jay",
    "teamMembers": [
        { 
            "teamMemberId": "someguid",
            "memberId": "someguid",
            "role": "admin",
            "emailAddress": "john@teambluejay.com"
        }
    ]
}

// Member
{
    "memberId": "someguid",
    "fullName": "John Doe",
    "emailAddress": "john@gmail.com",
    "phone": "4563452222"
}

Compare this to the relational model. Doesn’t this seem much simpler?

That’s because there’s practically no impedance mismatch between our domain model and data model.

However, what happens when we need to execute complex queries and join data from multiple aggregates together?

There are many options:

  • CQRS is one approach
  • You might prefer using a relational model for this reason
  • You might emit notifications/messages from one part of the system to another and allow it to cache the values it needs for later

Ultimately, it depends 😅.

Event Sourced Persistence

Event sourcing is definely a more advanced persistance strategy. However, let’s have a quick look at what this might entail.

event sourced

This highlights some of the events you might store.

Notice that all events – even from entities inside the aggregate – are tied to the same event stream dedicated to that entire aggregate.

Whenever users of the system try to “write” or perform actions against the system, the results will eventually be stored as events.

The mechanics and implications of event sourcing are beyond the scope of this article, but suffice it to say that individual aggregates are generally associated with a particular event stream.

Event Store, for example, encourages you to go even further and create one event stream per aggregate instance.

And More!

As any focused look at more advanced design topics, there are always more options to choose from such as Key-value store databases and graph databases.

If you’ve started to see that your data model design strategy can be different from your domain model design strategy, then try to think about applying the same ideas to these other kinds of persistence models!

Did you learn anything new from this article? Leave a comment and let me know!

Next Article

The next article in this series is where we begin looking at aggregates and transactional consistency!

10 replies on “DDD & Data Modelling: How Do I Persist Aggregates?”

Did I learn something new? Not much honestly. You started talking about more complex problems and then stopped, when it would get interesting (“We won’t answer those questions now, but …”). But I’ll definitely come back and check for new articles! Thanks

How about actual implementation example of mapping between efcore persistence entity to domain aggregate objects?

Thanks for the feedback. I might do a follow-up article on some of the points I alluded to in the article lol.

The question is why do you create these aggregates ? Your model is over complicated. Aggregate does not mean ‘cutting relationships’

Yes, the examples in the blog post series are by design very simple compared to what you would face in the real-world. DDD is really useful when you’re working in a domain that’s fairly complicated/complex.

Also, aggregates are driven by the invariants of the domain. The invariants for the “exact same” domain in two different businesses could be different (eg. in one business each member and team member share the same email address, while in another business it is expected that a team member has a separate email address than the related member).

There are also constraints related to performance too – is it okay if we kept most of the member data in the same database row instead? How would that affect the write/read performance in a high traffic system? For example, table locks would affect the write performance of writing the member email vs. the team member email.

So one goal is to split the model in a way that can maximize performance / avoid performance issues while also not losing the ability to make a consistency check (ideally) in-memory on all data that needs each other to verify consistency (I don’t think I’ve covered these reasons in the series yet?).

What do you think?

Thank you for this article!

“Instead, let’s take the aggregate model we’ve been working on and try to persist each aggregate as a document (excluding the Team aggregate).” I might misunderstand something, but I think you might have meant the Project aggregate here, instead of the Team aggregate.

Just came across this article. The only thing I didn’t understand was why you we need a composite table Team / Team Member. Can’t we scrap that and just have FK TeamId in table TeamMember in place of its own TeamMemberId ? – then TeamId + MemberId will be your composite PK for that table. Or we can keep TeamMemberId as PK, but define UNIQUE constriant on the combination of the two FKs – TeamId + MemberId.

Leave a Reply