Very often people attempting to introduce eventual consistency into a system run into problems from the business side. A very large part of the reason of this is that they use the word consistent or consistency when talking with domain experts / business stakeholders. A quick look up of the word consistent helps show where the confusion comes in.
S: (n) consistency (logical coherence and accordance with the facts) "a rambling argument that lacked any consistency"
S: (n) consistency ((logic) an attribute of a logical system that is so constituted that none of the propositions deducible from the axioms contradict one another)
Business users hear “Consistency” and they tend to think it means that the data will be wrong. That the data will be incoherent and contradictory. This is not actually the case. Instead try using the word “stale” or “old”, in discussions when the word stale is used the business people tend to realize that it just means that someone could have changed the data, that they may not have the latest copy of it.
If you can get this point to be understood the discussion about introducing eventual consistency becomes a fairly simple one.
You can quantify mathematically the “cost” of eventual consistency, the cost can generally be defined by how many more concurrency problems are experienced. If no concurrency problem is experienced then the end user view of the data is essentially identical for most use cases. It is important to note though that although this is one way of thinking about cost there are other aspects including complexity for the development team etc.
Unless you are using pessimistic locking, all data is stale, there are possibilities of optimistic concurrency failures. There is some period of time that it takes to build the DTOs, put them on the wire and for the client to receive them and draw them on the screen. There is also a period of time for a change to come from the client back up to the server. In all of these periods of time the data could change causing an optimistic concurrency failure. Let’s go with some numbers.
Get data from database – 10 ms
Build DTOs – 1 ms
Get data to client – 100 ms
Show on screen – 50ms
Send back to server – 100 ms
Server validation of request – 1 ms
So we can quickly add these together and know that any request the server is processing is operating on 262 ms stale data. Of course we have left out the largest thing the user! The human brain has roughly a 190 ms reaction time to visual stimulus, that’s just to realize the data has been shown on the screen, it is assumed the user is actually changing something as well. Do you measure the amount of time users take on various screens? Are you thinking it might be a good idea? Let’s go with a relatively quick time for the sake of discussion. A mean time of 60 seconds on a given screen. This gets added in as well so the total is now 60.262 s
Let’s imagine that we also tracked the number of optimistic concurrency failures. Hint: this is another value you should be tracking. We could relatively easily define an equation that represented the probability of a concurrency failure given the period of time. Most data sets will follow a normal distribution … Let’s assume that we get one (an example of where we may not would be if we had a periodical update at 62 seconds … thus P(t) approaches 1 at t = 60.
If we were to add in 5 seconds of eventual consistency assuming a normal distribution of changes we would end up with 65.262 seconds.
So we would have increased probability = P(65.262) – P(60.262).
Now for the last step. Let’s estimate the cost of an optimistic concurrency failure. Its a user, they have to redo something because they failed. We can come up with a rough estimate of the cost. The cost to the business from eventual consistency can at this point be estimated. Its important to note that for some transactions you may say “the value is high so we will never give a consistency error”, say for orders over $1000, it is profitable to later handle the problem even in a manual fashion, accept the order no matter what. This is actually a very valuable insight to reach. You know how often the case is being run over a period of time, you estimated the cost of the failure, and you know the increased probability of a failure due to n seconds of eventual consistency.
Estimated Cost = Number of Times * Increased Probability * Cost per time
I hope also that people will see the value in tracking metrics like how long users stay on screens and the number of consistency errors reported … These metrics can help improve user experience drastically.