Why use Event Sourcing?

Udi and I agree on probably 95% of what we talk about, one of the places that we have differing opinions is in the use of Event Sourcing I use the term as described previously to mean the rebuilding of objects based on events, not the definition that is currently on the bliki. To me this is an important distinction and I figured it would be worthwhile to write a post on why I feel the way I do, I explained parts of it in the previous post about CQRS and Event Sourcing but I wanted to talk not just about how the patterns are symbiotic but also some of the other reasons I use event sourcing.

 

Using a RDBMS

To start with let’s go through the alternative architecture, that is to rebuild objects from something that saves current state on the write side. I say something because that something could be many things; bigtable, mongodb, a relational database, xml files, photos of the objects that are then scanned in … It really doesn’t matter for the sake of this discussion, what matters is the storing of current state. For the sake of discussion let’s imagine that it is a relational database which Udi generally recommends here is the slide he generally uses to talk about it.

 

cqrs

 

Data is stored, as normal in the relational database and the domain is instrumented to send events as well which can then be used with read model. There are some really good things about this architecture that I would like to go through before I talk about some of the issues I have run into with it in the past.

 

First of all this architecture is highly applicable to legacy systems that would like to move to a separated read model for some of their data (it is important to note that you may not move all of your data to a separated read model). it is also very familiar to development teams, operations, and management. We are well aware of how to deal with such a system, never underestimate the value of familiarity.

 

There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state. If you look in the above diagram you will see there are two operations, write and publish.

 

Any time that you represent things in more than one model you have to worry about keeping those two models in sync and this situation does not escape that. As an example, how do you rationalize that the data you saved to your database actually matches up with the events that you sent out? One can write unit tests to help try to keep things synchronized but they will eventually fall out of sync with each other and when they do, you have a problem.

 

Another problem with the having of two models is that it is necessarily more work. One must create the code to save the current state of the objects and one must write the code to generate and publish the events. No matter how go about doing these things it cannot possibly be easier than only publishing events, even if you had something that made storing current state completely trivial to say a document storage, there is still the effort of bringing that into the project.

 

Beyond all of that for me, the focus tends to be on the current state based model when the system is really event centric. This may sound like a nitpicky issue but I find teams doing this look at events with less importance than those who use events as storage as well as the latter only use events they are extremely event centric.

 

Using Event Sourcing

Using Event Sourcing does not change the architecture above much, the primary difference is that we have an Event Store holding the events to rebuild an object behind the domain as opposed to something storing the current state. There are however many interesting differences between the two architectures.

 

The first major difference is that it solves the problem of having two models. It removes the cost of having to deal with synchronization between the two and most importantly it removes the possibility that the two diverge. Having models diverge could be very bad as we use the event model as our integration model so others can create parallel models of our model. If we have a synchronization issue, they will have bad data while we have correct data, ouch. With Event Sourcing we only save the event (that can even be called “publishing it”)

 

There are however other benefits to using Event Sourcing as opposed to storing current state. I went through some of them briefly in another post. One of those benefits is that we can avoid having to use a 2pc transaction between the data model and the message queue (if we are using one)… the reason for this is that the event storage itself is also a queue, we could be trailing the event storage to place the items on the queue (or directly use the event storage as a queue).

 

Testing however is a big win with Event Sourcing, since all of your state changes are done through events you can simply test the events coming out the other side. This is particularly interesting when one considers that with events since you are being explicit about what is changing you are also testing what is not happening. Very few people write tests to show what doesn’t happen on behaviors (also a prime place where models can lose sync) eg: I am calling ScheduleAppointment on an object, do you check to make sure the address hasn’t changed? Since I in testing would assert that I only received an AppointmentScheduledEvent. Beyond that since we only deal with events on the other side we can view (and test) our domain as being a rather complex finite state machine which offers some very cool possibilities.

 

Of course we have left out what I think is one of the key values of Event Sourcing, we actually have the changes. Let’s say I get a concurrency violation (request is from version 5 while the current data is at version 12). With Event Sourcing I can ask for all of those events in between and see if any of those actually conflict with the command that I want to run (I will write a post on more about this works soon). In other words we provide intelligent merging…

 

You will note that I have not talked about any of the general benefits of Event Sourcing, such as the business value of the events, the value of having a log, the fact that the Event Store is additive only. I am leaving these out because many of these can exist in both systems. In the former you can save all of your events historically as well.

 

I hope this explains a bit about the differences between the models. I want to provide people as well a quick list of some of the pros and cons of each.

 

Current State Backing

Pros

Easier to retrofit on a legacy project

Well known technology/tools/skill sets

Easier sell for an organization

Cons

Dual models cost more and contain risk

Non-Event Centric

Event Storage Backing

Pros

Single Event Centric Model

Simplified/Better Testing

Conflict Management

Cons

Harder to sell to an organization

Less known tools/technologies (though you can implement the Event Store in a RDBMS which kind of mitigates this)

Very difficult to migrate a legacy app to

 

Hope this helps people.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

31 Responses to Why use Event Sourcing?

  1. Pingback: I read a DDD book | Joonas' Blog

  2. Pingback: CQRS: Links, News And Resources (4) « Angel ”Java” Lopez on Blog

  3. Pingback: Some Beautiful Minds I follow « Insight's Delight

  4. Pingback: “Aggregate merging” and event sourcing | The .Net frog

  5. Pingback: Lightweight code generation to access private method to implement Event Sourcing

  6. Pingback: Traffic light vNext

  7. Pingback: Architektura škálovatelných aplikací - Augiho web

  8. Pingback: When to avoid CQRS

  9. Greg says:

    http://dl.dropbox.com/u/9355756/course.zip feel free to put any commentary on it on the dddcqrs google group.

    Cheers,

    Greg

  10. Roy Oliver says:

    That’s great, Greg.

    Where is this material?

  11. Greg says:

    @Roy

    Writing an event store takes all of about an afternoon. I have already put out about 35 pages of material (free) showing in depth how to do it. Not sure what more you want.

  12. Roy Oliver says:

    Ok, I’m a fan of Event Sourcing coupled with CQRS now. Let me tell ya… setting up the intial plumbing is no fun, but once you have a structure in place that works for you, the skeleton is highly reusable for other projects.

    Event Sourcing coupled with CQRS is amazing and worth implementing. The read model is highly scalable. My system doesn’t need to use query joins any more and the 100% accurate audit trail is a plus.

    Perhaps when you’re done making money off those highly desirable training classes, you’ll share your setup (with code) for Event Sourcing w/ CQRS.

    Thanks Greg.

  13. Aaron Navarro says:

    Regarding refactoring of legacy systems, especially ones with active record style architectures (assuming that is not the right choice);
    as a first step it seems to me that in many cases you should be able to to take your existing database and turn it into the reporting database, keeping your data access layer calls that read data and maybe turning your dumb entities into viewmodels.

    Now with the query side done, you can build a real domain model and stuff the event store behind it.

    You may even be able to use some of your old data access write methods as event handlers (to update the reporting db) for the domain events raised by your new domain.

    Just a thought really, I haven’t actually tried it ;)

  14. Greg says:

    I would suggest serializing objects to your database. What you suggest will be a disaster and offer no benefit in any way.

    Greg

  15. Roy Oliver says:

    I do not like the idea of storing serialized objects in a database. If I were going to use Event Sourcing for with a RDMS, what setup would you suggest?

    The best setup I can think of is creating an Event Store with tables named after the events they collect. So I would have a table named CustomerChangedAddress.

    Is that weird – what do you think ?

  16. I would be keen to hear of any OSS projects that show how event sourcing can be used. Certainly more used to a situation where everything is live in the database. Interesting stuff.

  17. Greg says:

    @Rikard no Prevlayer would not be a good option. Note that the events have domain meaning. Prevlayer and other tools such as it lose the intent of what was happening.

  18. Sounds like the old project Prevayler could take the driver seat for the event sourcing :)

    Nice post but we need to find some comon strategies on how to apply this onto a legacy system.

  19. Very nice post, but Leonardo question is good one. If there are 100000000 events that restores state of my object, how do you optimize loading process ? How do you do that if you are not storing snapshots ?

    @Leonardo: For my current system we are creating snapshots of our objects and later when restoring state, first we load snapshot then we are applying all the rest events that occurred after snapshot was created

  20. How about data that is only needed for reporting? Suppose I have a shipping domain where my cargo has a ‘delivery status’ which describes its current location, estimated arrival date and so on. I don’t need this data when processing transactions, I only display it to users.

    In CQRS with current state backing I would create a ‘Delivery’ object in my domain and publish it as part of state change notification. I wouldn’t have to persist it in the domain state. It would be only persisted at reporting side.

    When using event sourcing ‘Delivery’ is part of my event and, as such, is persisted both in the domain event store and in reporting database.

    Is this behavior correct? I am not telling that it is wrong, I only want to make sure I am not missing something obvious. It would be tempting to calculate ‘Delivery’ object when restoring state from events but, as far as I understand, restoring state shouldn’t involve any business logic, right?

  21. Leonardo says:

    How would you build an entity from the domain model with a huge list of events? I can’t accept the fact of rebuilding every time the entities when retrieving them for a command process.

    How would you implement event sourcing in a real application?

  22. I’ve found it necessary to define a partial ordering among the events. This makes it easy to query the event store directly, rather than applying the events to state-based storage. I’ve documented a series of patterns for doing this on http://historicmodeling.com.

  23. amir says:

    can we have something like introduction in code from the very simple thing? Ex: One domain Person and the event PersonCreated how event sourcing help us simplified our live..I’m still don’t get the way to use EventSourcing even though I’m very interested, because the community still don’t have so called the easiest way to implement the simple thing.

  24. Chris says:

    Thanks for this. Can you recommend any OSS projects to read that demonstrate the use of event sourcing, and particularly that illustrate the rebuilding of an object for the query model.

    As someone who thinks in terms of a legacy app and applying this, I find it hard to break strongly enough away from the comfortable model that I’m used to – namely of having everything live in the DB.

  25. @seagile says:

    Come to think of it the domain model is a statemachine and you guys are just arguing whether to store state or transitions. It’s true, you can compute the state from replaying the transitions, but that’s just an implementation detail. If you store both, there is no need to think of it as two models. Store both in the same state/event store and you can forgo several problems. I think storing both is something that should be explored and not dismissed because of personal preference.

    But, as is always the case, one should asses when its better to apply the one or the other.

  26. Jonathan says:

    It may be worthwhile to define well-named pattern for your flavor of event sourcing rather than having to distinguish between yours and Martin’s, much the same way you did with CQS and CQRS, something like event replay/rebuild/hydration, etc.

    Also, I think Udi’s warming up to event sourcing. He talked about it for a few minutes at his SOA course in Austin, TX when I was there. At the same time, virtually all of what Udi does is help people migrate their legacy systems to these newer patterns as fast as the respective client organizations can accept the changes in thinking. Event sourcing is a pretty big pill to swallow for an organization with a massive legacy codebase, so it’s no wonder that he’s not preaching it.

  27. Mike says:

    Great stuff!
    You mention the difficulty of migrating a legacy app to use Event Sourcing.
    To ease that on my current project, I initially adopted the state storage backend, effectively making the model event driven. From there, it was easier to port the whole thing to use Event storage backend.

  28. Ian Cooper says:

    @Greg I think that you and Udi may be even closer than you think. I have just come back from Udi’s course and the idea of the write just persisting the command (i.e. the causal event) was raised, as was the idea that an RDBMS might not be needed as a Db here.

  29. Very interesting set of articles, Greg. I haven’t used Event Sourcing and there are many questions that concern me. I know that answers are domain dependant but some general hints would really help me to understand.

    1) How often should write and read sides be synchronized ?
    2) How to transfer changes from write side to read side? If synch period is long enough one should make a DB lock or enlist to a very long transaction. If there are many read sides workload can be redistributed. But what if currently there is only one read data store?

    3) What is event data format? Is it fully customizable XML that one needs to parse to transfer data to the read side? Or there is one event -> one message mapping so that for every data field (group of fields) there should be a distinct message?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>