I was just writing an email to the cqrs group http://groups.google.com/group/dddcqrs and figured it might be useful to put it up here as well as its a very common question I get.
The initial question was
Here’s the example I’m using:
A system that handles user registration for 2 million+ active users. These users should be able to login with their email address and password. They should also be able to change their associated email.
I have the following design:
User(Guid Id, string email, string hashedPassword)
UserRegistered(Guid Id, string email, string hashedPassword)
UserEmailAddressUpdated(Guid Id, string email)
RegisterUser(Guid Id, string email, string hashedPassword)
UpdateEmailAddressForUser(Guid Id, string email)
RegisteredEmailAddresses(emailAddress) – Used for client side validation on email prior to sending a RegisterUser command
When processing a RegisterUser command, I need to validate that no other user has registered with that email. How can I do that without loading every user in the system? I could use a view cache like the client side, but then I would have business logic outside of my domain. Any suggestions?
This is a very common question. There were many responses with various suggestions. Mine follows.
I am just replying to the last one on the list after reading through.
To me the most important concepts have been completely missed in this
thread and they are a big part of why eventual consistency is so cool
(it makes you think about things).
*What is the business impact of having a failure*
This is the key question we need to ask and it will drive our solution
in how to handle this issue as we have many choices of varying degrees
Most of the time the business impact of such a failure is low and the
probability of it happening is low. If we query the eventually
consistent store at the time of submission (either from client or from
server as this is a big part of how one-way commands work) then our
probability of receiving a duplication is directly calculable based on
the amount of eventual consistency. We can drive this probability down
by lowering our SLA very often this is enough.
We can detect asynchronously if we broke our invariant. Imagine an
eventhandler that inserts into a table with a constraint. If it gets
an exception, we broke the constraint (note this is not really the
“read model” but the same db can be used if convenient, it is
important to note the distinction as if we scale to have 5 read models
we don’t have 5 of these …).
What do we do if we break the constraint? We need to come back to that
business impact statement above. For most circumstances, just raising
an alert to an admin etc is enough, these things are very low
probability of happening and are often not worth the time/cost of
implementing automatic recovery. Just imagine 1 username create out of
1,000,000 fails this way. How long would it take to automate the
process of handling the situation? Consider discussions with domain
experts etc. 5 minutes of admin time once a year is much better ROI in
most of these situations than a week of developer time to automate.
Continuing along it has now been decided that this has large enough
impact that it should be automated. The said process that finds the
duplicates could either raise an event DuplicateUsernameDetected or
directly call a command ResolveDuplicateUsername (which involves more
discussion). It is important to note that in either of these cases we
are discussing the “What” not the “How” it would never issue a command
“DeleteUser” etc, how to handle these situations is core domain logic
and should be modeled within the domain. In the username example
perhaps ResolveDuplicateUsername marks the user as not being able to
login (and as a duplicate) and it sends an email to the user saying
“Hey we screwed up but its your lucky day! you get to create a new
But even after all of this if from a business perspective the impact
is too high we can still make things consistent. We could drop in a
service to the domain that deals with a consistent set. This would of
course be the last resort as its the most complicated of these
solutions and brings with it many limiting factors in terms of our
Udi had a great example of this in his explanation of 1-way commands.
It was an ATM that would spit out money having only read your balance
from an eventually consistent read model. The reason this can work is
that from a business perspective the risk is low (and it is built into
the business model itself). You have a bank account with me, I know
your SS# and all of your information. For people who overdraw their
accounts I will recover atleast 90% of the money that has been
overdrawn. On top of that I charge a fee for each overdraw that
occurs. For these reasons the business impact of such a problem is
To sum up I just want to reiterate that this is a *good* thing.
Eventual consistency is forcing us to learn more about our domain. It
is forcing us to ask questions that are otherwise often not asked.
Consistency is over-rated.