Why Can’t I Update an Event?

Last week on a call with someone the question came up about the Event Store about why can they not update and event and how should they handle the case where they need to. The conversation that came out of this was very rich in architectural insight into how the event store works as well as overall event sourcing understanding so I thought that it would be worth spending a bit of time to write up where the constraint comes from.

An Event Sourcing Perspective

Let’s start with why you want to update an event? An event is a fact that happened at a point in time with the understanding of it from that point in time, a new understanding of the fact would be a new fact(naturally this does not apply to politicians). To update a previous event is generally a very bad idea. Many want to go back and update events to new versions, this is not the best way to handle versioning!

The prefered mechanism from an event sourcing perspective is to write a new face that supercedes the old fact. As an example I could write that event 7 is a mistake, this is a correction, I might as well put in a comment “this was due to bug #1342″ (similar to a journal entry in accounting). There are a few reasons this is a better way of handling things.

The first is my ability to look back at my history. If I were to change the fact and I look back at that point in time I have changed what it means. What about others who made decisions at that point in time? I can no longer see what it was they made decisions off of. Beyond this I might have a very valid query to ask your event streams of “how long on average does it take us to find bugs in previous events”.

The second model leads us to two types of queries supported on event streams (as-of vs as-at).

Beyond that with Event Sourcing the updating of an event can be inherently evil. How do you update any projections that the update occured? What about other subscribers who may be listening to the streams? Any easy answer might be to replay fully all involved with the stream but this quickly falls apart.

These are the primary reasons why the Event Store does not support an update operation on an event. There are however some wonderful architectural benefits that come from this particular constraint.

Architectural Goodness

If we prevent an event from ever being updated, what would the cachability of that event be? Yes it would be infinite. The Event Store supports a RESTful API (ATOM). All events served from the event store have infinite cachability, what does that mean?

Imagine you have a projection updating into a SQL table that has been running for the past eight weeks. You make a change and need to restart it (replaying from event 0). When the replay occurs and it requests events from the Event Store where do they likely come from? Your hard drive! You don’t make requests to the Event Store for them.

Beyond the events being infinitely cachable if you look through our atom implementation in fact every single request we serve with the exception of the head uri (http://somewhere.com/streams/{stream}) is also infinitely cachable. In other words when you want to reread $all (say for 5m events) you will hit exactly one non-cachable request!

This is very important when we start talking about scalability and performance. The Event Store can pretty easily serve 3-5k atom requests/second on a laptop (per node in clustered version) but how many will actually get to the Event Store? In order to scale you focus on commoditized reverse proxies in front of the Event Store not scaling the Event Store itself. nginx or varnish can easily saturate your network, just drop them in front only head calls make it through (and there is even a setting per stream to allow caching for x seconds of head links).

This is often a hard lesson to learn for developers. More often than not you should not try to scale your own software but instead prefer to scale commoditized things. Building performant and scalable things is hard, the smaller the surface area the better. Which is a more complex problem a basic reverse proxy or your business domain?

This also affects performance of replays for subscribers as you can place proxies near the subscribers (local http cache is a great start!). This is especially true for say an occasionally connected system. Gmail uses this trick to provide “offlining out of the box” for web apps. Since much of the data will already be in the http cache your hits will be hitting it, in many cases you can build say a mobile app with no further support.

Over Atom if we allowed updates, NO uris could be cachable!

This is all cool but I actually need to update!

Occasionally there might be a valid reason why an event actually needs to be updated, I am not sure what they are but I imagine there must be some that exist. There are a few ways this can actually be done.

The generally accepted way of handling the scenario while staying within the no-update constraintis to create an entire new stream, copy all the events from the old stream (manipulating as they are copied). Then delete the old stream. This may seem like a PITA but remember all of the discussion above (especially about updating subscribers!).

Some however may be using the TCP API and are willing to take the pains and complexity that come from subscribers (you can imagine they have none). In this one case, updates would be acceptable and simpler than adding new events. We have been going back and forth on whether or not to support this. It would not be much work at all for us but I imagine that it would be misused 1000 times for every 1 time it was used reasonably. I am reminded of the examples of being able to call .NET code from a biztalk orchestration or being able to execute .NET code inside my SQL database both have valid uses but should rarely be used. Perhaps we will make a command line parameter –run-with-scissors-updates or make people build from source to enable.

Posted in Uncategorized | Leave a comment

Startups and TDD

Yesterday Uncle Bob put up a post on using TDD in start up environments “The Startup Trap” its a good read. Check it out.

Nate soon after posted:

then

I wanted to write a few comments about TDD in startups. Good code is the least of the risks in a startup. Sorry but worrying about technical debt making us go slower when we have a two month runway and likely will pivot four times to quote Bob.

Captain Sulu when the Klingon power moon of Praxis exploded and a young Lieutenant asked whether they should notify Star-Fleet: “Are you kidding?” ARE YOU KIDDING?

One of the biggest mistakes in my career was building something appropriate…

It was just after Hurricane Katrina. I was living in a hotel. An acquaintance asked me if we could hack together this business idea they had for a trading system. He had the knowledge but not the know how. I said sure, hell I was living in a hotel!

In less than two weeks we had an algorithmic trading system. It was a monstrosity of a source base. It was literally a winforms app connected directly to the stock market. UI interactions happened off events directly from the feed! Everything was in code behinds (including the algos!) Due to the nature of the protocol if anything failed during the day and crashed the app (say bad parsing of a string?) the day for the trader was over as they could not restart.

But after two weeks we put it in front of a trader who started using it. We made about 70-80k$ the first month. We had blundered into the pit of success. A few months later I moved up with the company. We decided that we were going to “do things right”. While keeping the original version running and limping along as stable as we could keep it while adding just a few features.

We ended up with a redundant multi-user architecture nine months or so later, it was really quite a beautiful system. If a client/server crashed, no big deal just sign it back on, multiple clients? no problem. We moved from a third party provider to a direct exchange link (faster and more information!). We had > 95% code coverage on our core stuff, integration suites including a fake stock exchange that actually sent packets over UDP so we could force various problems with retry reconnects etc/errors. We were very stable and had a proper clean architecture.

In fact you could say that we were dealing with what Bob describes in:

As time passes your estimates will grow. You’ll find it harder and harder to add new features. You will find more and more bugs accumulating. You’ll start to parse the bugs into critical and acceptable (as if any bug is acceptable!) You’ll create modules that are so fragile you won’t trust yourself, or anyone else, to modify them; so you’ll work around them. You’ll build a festering pile of code that, with every passing week, requires more and more effort just to keep running. Forward progress will slow and falter. It may even reverse as each release becomes buggier and buggier, and less and less stable. Catastrophes will become more and more common as errors, that should never have happened, create corruptions and damage that take huge traunches of time to repair.

We had built a production prototype and were suffering all the pain described by Bob. We were paying down our debt in an “intelligent” way much the way many companies that start with production prototypes do.

However this is still a naive viewpoint. What really mattered was that after our nine months of beautiful architecture and coding work we were making approximately 10k/month more than what our stupid production prototype made for all of its shortcomings.

We would have been better off making 30 new production prototypes of different strategies and “throwing shit at the wall” to see what worked than spending any time beyond a bit of stabilization of the first. How many new business opportunities would we have found?

There are some lessons here.

1) If we had started with a nine month project it never would have been done

2) A Production Prototype is common as a Minimum Viable Product. Yes testing, engineering, or properly architecting will likely slow you down on a production prototype.

3) Even if you succeed you are often better to stabilize your Production Prototype than to “build it right”. Be very careful about taking the “build it right” point of view.

4) Context is important!

Never underestimate the value of working software.

Posted in Uncategorized | 12 Comments

Projections 4: Event Matching

In the “intermission” post we jumped ahead quite a bit in terms of the complexity of the projection we were building. Let’s jump back into our progression of learning bits.

The projections we have used so far have used a method called when(). This method allows you to match functions back to types of events. Up until now that has been a single match but you can also use more than one.

fromStream('test').when({
                           Event1: f1,
                           Event2: f2
                        });

This defines that every time an event of type Event1 is seen the function f1 should be called with that event and function f2 for events of type Event2. This is a very useful construct when trying to build out projections that require the ability to handle many different types of events.

There are also some special matches defined.

$any: $any will match all events to your function. This is useful for example when you want to build an index for all events. We will get into how this works later but you can imagine if I wanted to build an index based upon the user that created the event (stream per user) then the function would want to look at all events in the system.

It is important to remember that as of now $any cannot be under in conjunction with other filters.

$init: $init gets called before any other handler. The job of $init is to return the initial state that will be passed to the rest of your functions. In the intermission post this handler was used to set up initial state so the other handlers did not have to. The usage can also be seen in looking at the post from the Projections 3

fromStream('$stats-127.0.0.1:2113').
    when({
        "$stats-collected" : function(s,e) {
              var currentCpu = e.body["sys-cpu"];
              if(currentCpu > 40) {
                   if(!s.count) s.count = 0;
                   s.count += 1;
                   if(s.count >= 3)
                        emit("heavycpu", "heavyCpuFound", {"level" : currentCpu,
                                                           "count" : s.count});
              }
              else
                   s.count = 0;
         }
    });

In this projection the line if(!s.count) s.count = 0 is being used to initialize the state if its the first time into the function. This could also be implemented as

fromStream('$stats-127.0.0.1:2113').
    when({
        "$init" : function(s,e) { return {"count":0},
        "$stats-collected" : function(s,e) {
              var currentCpu = e.body["sys-cpu"];
              if(currentCpu > 40) {
                   s.count += 1;
                   if(s.count >= 3)
                        emit("heavycpu", "heavyCpuFound", {"level" : currentCpu,
                                                           "count" : s.count});
              }
              else
                   s.count = 0;
         }
    });

The two will work in the same way. In our next post we will start looking at how indexing works in the event store.

Posted in Uncategorized | 2 Comments

Spring->Summer

Wow it feels weird to have a schedule that is pretty much completely locked in between now and August. I guess things change when you don’t only have your own schedule to worry about. A friend tells me that soon I will be planning nine months in advance.

I have updated from now through the summer most of my schedule on the “where am I page”. I have not included everything but most is up there.

One big change we made is that we are going to spend a month in Australia. She seems to have this crazy idea of driving through the outback with kangaroos jumping everywhere (might be the movie!). While there I will stop in Sydney to teach the new class and Perth to go through the CQRS + ES class.

But the next six months looks pretty crazy! Hopefully we can make the best of it.

Posted in Uncategorized | 2 Comments

Projections 3: Using State

In Projections 2 we looked at creating a very simple projection that would analyze our statistics inside of the Event Store. The projection was:


fromStream('$stats-127.0.0.1:2113').
    when({
        "$stats-collected" : function(s,e) {
              var currentCpu = e.body["sys-cpu"];
              if(currentCpu > 40) {
                   emit("heavycpu", "heavyCpuFound", {"level" : currentCpu})
              }
         }
    });

This is a very common type of scenario we will find in event based systems. We can describe this as

“When this event happens and this information is on the event, trigger a new event to a different stream.”

Very often however its not just one event that will cause something to trigger. This is why the state variable exists. Very often we want to handle a question that is more akin to:

“When this event happens, then this event happens, then this event happens trigger an event to a different stream”.

Let’s try to change our problem from Projections 2 into one like this. I am only interested in highcpu scenarios where the cpu is over 40% for more that 3 samplings in a row. A single one could just be a fluke that happened. In order to do this type of query we will have to use our state variable to tie together multiple function calls.


fromStream('$stats-127.0.0.1:2113').
    when({
        "$stats-collected" : function(s,e) {
              var currentCpu = e.body["sys-cpu"];
              if(currentCpu > 40) {
                   if(!s.count) s.count = 0;
                   s.count += 1;
                   if(s.count >= 3)
                        emit("heavycpu", "heavyCpuFound", {"level" : currentCpu,
                                                           "count" : s.count});
              }
              else
                   s.count = 0;
         }
    });

Note: if you are trying this at home you may want to change how often statistics are sampled. You can set this with –stats-period-sec=SECONDS.

Now we use our state that gets passed from call to call to correlate multiple events together. If we get three or more samples with a CPU usage greater than 40% in a row then we will produce a message to the heavycpu stream that looks like:

{
  "eventStreamId": "heavycpu",
  "eventNumber": 3,
  "eventType": "heavyCpuFound",
  "data": {
    "level": 41.896265,
    "count": 6
  },
  "metadata": {
    "streams": {
      "$stats-127.0.0.1:2113": 8
    }
  }
}

This is a very powerful paradigm as the state variable allows me to bring state from one call to the next allowing me to correlate multiple events together. Another example of this might be I am looking for users in twitter that said the word “coffee” and “happy” within 5 minutes of mentioning “starbucks”. This query would be implemented in the same one as the one we just tried.

As food for thought. Could I now write another projection off of “heavycpu” that then looked for items with 5 measurements>80 and counts >10? You wouldn’t probably do this in practice as you could put that logic in the first projection but you can compose projections as well!

In our next post we will look at having multiple types of events.

Posted in Uncategorized | 3 Comments