Categories
domain-driven design

What Are Domain-Driven Design Aggregates?

Aggregates are one of the most misunderstood concepts in domain-driven design. Is it just a clump of entities & objects? Or something more?

Aggregates are one of the most misunderstood concepts in domain-driven design.

What is an aggregate? Sure, it’s a pattern that’s central to domain-driven design… but is it just a collection of objects?

Martin Fowler explains:

Aggregates are the basic element of transfer of data storage – you request to load or save whole aggregates. Transactions should not cross aggregate boundaries.

https://www.martinfowler.com/bliki/DDD_Aggregate.html

Those with experience in DDD might understand what that means and why it applies.

But for those starting to get familiar with aggregates, such an explanation might still be too detailed and nuanced.

Let’s start looking at what an aggregate is not.

What An Aggregate Is Not

An aggregate is not:

  • Just a graph of entities
  • Merely a behaviour-rich object
  • An entity or collection of entities that you can dump into your database tables

So… what is it?

This is usually where people start talking about consistency boundaries, transactional consistency, eventual consistency, aggregate boundaries, invariants, aggregate roots, etc.

When learning about these things, it’s natural to grab onto a familiar term or idea when all this jargon is thrown at us. From there, we (falsely) form an idea of what this is all about.

Let’s try to keep things simple and practical.

Bubbles

I like to use a very simple idea to help people understand the essence of what aggregates are.

Bubbles.

Imagine your software project was not one massive codebase – but a collection of small bubbles. Each bubble can be worked on independently. That means, you only need to think about what’s in the bubble at any given moment – not the entire system all at once.

Aggregates are the same. They are bubbles. Just on a smaller scale.

Use Case: Teams

Imagine we are building a new feature for our system. This new feature includes the concept of projects, teams and team members.

Each team can have multiple staff members associated with it.

Staff members can be a part of multiple teams.

Each team can have multiple projects.

team project

Well, that’s simple enough.

Let’s add the fields that might exist on each object:

teams

Database All The Things

Doesn’t that look just like an entity diagram? Don’t the database tables scream out at you?

“Obviously”, we need to have a composite table linking each team with each team member.

Just hold on.

Let’s think about behaviour instead and not treat these as code objects. Let’s treat them as business objects or concepts. What can these objects do?

At face value: You can create a team. Edit a team. Delete a team.

Deleting a team means that all the associated projects ought to cascade and be deleted too.

cascade delete

Note: Let’s put aside the fact that these are not the real behaviours of our system. Anytime you see CRUDy language, it should be a red flag!

Real Business Is Not So Simple

But wait. You just found out from your users that this won’t work. The assumptions you made about the business were wrong…

There are times when projects are moved from one team to another.

There are also times when projects are orphaned for a period of time.

So now, other questions arise:

  • What should our model look like now?
  • When we write our code, should we load the entire graph of objects into memory?
  • What happens when a project is orphaned? Will the teams just have a reference to a null project object?
classes and objects

More Requirements That Complicate Things

Now the business has a new requirement: a team member’s role can change per project.

So… do we just create another composite table to match each team member with each project they are on and the role of each project?

That’s what we usually do. Developers naturally think about systems in terms of database design first 🤦‍♂️.

Note: Yes, that’s a huge problem that domain-driven design tries to help avoid!

With each new requirement, our model gets more bloated. Over time, this might consume lots of memory in our system too.

Imagine a project whose team has 500 members. Yes, these are large projects we’re talking about.

We need to load all the staff members into memory, and all their data too!

That will lead to performance issues around memory usage, etc. Is there a better way?

Aggregates

Aggregates are what solve these kinds of problems.

They help:

  • Simplify our models when they start getting out of hand
  • Isolate complex business rules
  • Deal with performance issues when loading large object graphs into memory
  • Allow flexibility to more easily deal with future unexpected business requirements

That’s what aggregates are for.

But what are they?

Instead of telling you, I’ll show you what one might look like in this case (otherwise, we need to start talking about consistency, transactional boundaries, concurrency, etc!).

aggregates

Notice that I split the original Member model into three?

The Team and the Project have their own dedicated “version” or “view” of the Member that only has the exact data it needs to make decisions about business rules and behaviours within its “bubble”.

For example, the Member’s role is not needed by the team bubble. Why keep it there when it doesn’t belong?

Instead, we have split our model into two “branches”. Two bubbles/aggregates, in this case.

To support being able to assign the same Member to multiple teams, we then have to create a dedicated authoritative model of a Team Member and link the other aggregate’s Member entity as a foreign key-like reference (again, we aren’t talking about databases).

Note: Notice I added specific Ids to the non-authoritative “Member” models (like “TeamMemberId”)

There’s much more to discuss. And many more improvements we could make to this model around using value objects, etc.

I think this is enough in order to help you see that aggregates are more than simply creating a graph of entities.

It’s all about allowing the domain rules to guide you.

Many times, the aggregates we discover are not the aggregates we thought we would need!

Next Article

If you found this article helpful, check out the next in the series about how to think about data persistence and aggregates!

13 replies on “What Are Domain-Driven Design Aggregates?”

Thank you very much for the great article, James!

I have one question regarding modeling of IDs, let’s say MemberId.
Would you model it as a separate class in all packages/namespaces, like team.MemberId, project.MemberId and member.MemberId? Or would you rather model it as a single common class shared among aggregates, like just member.MemberId or common.MemberId?

Thanks! Glad you enjoyed it.

I generally like to keep as much separated as possible (so different proejcts get own class). Unless there’s some shared logic etc. then it might make sense to put that in shared kernel.

But generally, the less “sharing” between contexts = the more de-coupled they are, which I def. prefer. Ya, it “feels” like more work, but having contexts de-coupled is a huge benefit in the long run.

I hate changing something in one context only to find out it broke another!

Thank you for this well laid out article. I coincidentally am building a very similar domain as well and had a few questions.
1. How would you make a member part of a team and effectively a team member? Would you create a .join() method on team that would simple add another teamMember to the teamMembers array in the aggregate?
2. I guess the second part of my question is whether or not you’d load all members into memory when instantiating your aggregate root. How would you go about a situation where a team has a lot of members (say over 50,000 for instance)?

Thank you for this great article. I learned a lot and it all makes sense

Thanks Yazan 🙂 I think the next article in this series answers some of your questions -> https://www.jamesmichaelhickey.com/how-do-i-persist-ddd-aggregates/

Generally, I would create a method on the Team aggregate like you said. You may or may not opt to model the TeamMember object with it’s own email address, name, etc. (see the article I linked above) or you may simply use an Id that links to the Member object via foreign reference.

And yes, there would be a collection of TeamMember objects that only the Team aggregate has access to (e.g. a private collection). The Team aggregate might expose two methods like Join or Add (depending on the semantics of your domain) and Leave / Remove.

For your second question: There are a variety of ways to approach this. The next few articles in this series addresses some of the solutions – for example, your invariants affect how you model. You may opt to use a lazy load approach for the team members collection. Or, you may end-up using raw SQL to perform any checks on the team members array that are needed, methods like, for example, MemberExists(int memberId), etc.

Some ORMs can help with managing this scenario, or it might make sense in your system to “manually” load only the team members that are needed to make some validation or invariant logic.

Hopefully that helps!

Thank you for this article.

I have some questions :

What happens if a Member is deleted ?

The corresponding TeamMember and ProjectMember should be deleted too, right ?

Does the Team and Project aggregate have to listen a “MemberDeleted” event to delete the TeamMember and ProjectMember ?

If yes, where to place the listener ? In the domain layer or the application layer ?

Thank you.

“It depends” is the answer lol. This really depends on your business requirements and what stakeholders expect to happen.

Should you delete all the data for everything related to team members? Does that include comments they made, work they’ve completed, etc.?
If not, then what should happen?

This discussion would most likely reveal more domain specific behaviour. Something like “deactivate” member might be appropriate. That might mean that the data still exists for the member. Maybe also the team member is deactivated and they can’t use other capabilities that team members can use (like chat, notifications, etc.)

Maybe the project member is also deactivated – their avatar/icon and past work will remain, but the member cannot work on any new piece of the project and perhaps can’t view the project details on whatever work management software system they use.

Or, maybe you want to treat each different: soft-delete member and project member – but hard-delete team member.

This is a great example of why splitting these concepts up in the first place set you up for more flexibility or more appropriate domain behaviours.

Does the Team and Project aggregate have to listen a “MemberDeleted” event to delete the TeamMember and ProjectMember ?
If yes, where to place the listener ? In the domain layer or the application layer ?

If a member is requested to be “deleted” or “deactivated”, then yes – other aggregates would somehow get notified of that action. Again, what they choose to do with that event is up to the context in question. Usuaully, you’d use some type of messaging system and would listen to these events via listeners for the given technology you use.

But the typical flow is “event sent on message bus” -> “listener gets notification on the bus” -> “listener will call an action to perform from the application layer” -> “application layer will orchestrate various domain operations that need to occur (via aggregates, etc.)”

What do you do when you need to “mix” aggregates. For example, from UI perspective, I am listing teams on one screen and I want each team’s members names listed on the same screen as well. That can’t be achieved by returning just one aggregate from it’s repository (assuming the design from your drawings here).

How is that resolved? Data redundancy? (Having Member names in TeamMember as well) Or having independent query not related strictly to aggregates? Is there some common way for crossing aggregates boundaries?

(Of course, I have some ideas, but I want to see how do you approach it, or what is DDD “by the book” way of doing it)

My approach is to never use write-side models (e.g. aggregates) as read-side models (e.g. UI models). Aggregates are useful for write-side constraints, but once you try to use them as a means of displaying data on a UI or API all sorts of issues appear (as you outlined).

So the answer is, as you mentioned, “having independent queries not related strictly to aggregates”. Use different functions, classes, etc. in your code to handle write vs. read logic and those issues should disappear đź‘Ť.

Leave a Reply