Do we dare to be data-driven?

We had set the system up for a test of how this A/B testing stuff could work for us. Would it be useful? Could we communicate it clearly to the others around us? Would the data be easy to read?

From http://www.mikebilleter.com/2009/08/impossible-captcha.html

From http://www.mikebilleter.com/2009/08/impossible-captcha.html

In order to know what we where doing we picked a no-brainer feature to test it on: registration with or without CAPTCHA. You know, like the one to the left.

Since all of  us (and probably you too) hate to type those strange, unreadable letters we were confident that we knew which one would win. Hence we had some really good test data for how A/B testing would work. A registration page with CAPTCHA turn on was created and one with it turned off. We then used Google Analytics to direct 50% of the traffic to either page.

The only thing was… the result was puzzling. And it started a really interesting process at the client. And the same reasoning has been made by everyone I told this story to after. I wonder: do we really dare to be data-driven or are we’re chickening out when it comes down to it?

This experiment was done at one of the big sites in Sweden so there were plenty of traffic. We ran the experiment for a week in order to even out any peaks and valleys in traffic (Sundays is the best day for online-shopping site apparently).

And yes, as you probably have guessed by now: CAPTCHA turned on won. On the the winning page 6.5% or 400 people (sorry the exact numbers escapes me) more succeeded to register and become customers. No one, not a single person, in the team or the entire company had guessed that before the test. We were all baffled of the result.

What’s going through your mind right now?

I guess that there’s one of these:

  • “But that is a too small sample to tell. Run the experiment for a week or two more before you make up your mind.”
  • “There’s something wrong with their setup. Are you sure you’re doing it right?”
  • “There’s another thing on the page that trips up more people, which show us the wrong “result.
  • “This post is not about curly braces, I’m out of here.”
  • “But for crying out loud! They cannot use GA to set up a proper A/B test. Product X is much better for that. “

I’m also betting that no one of you thought:

Well, awesome! Keep the CAPTCHA turned on then and do something more useful. Like run the next experiment, for example.

At least that is what happened at this company. We spent hours laughing about the result, trying to explain it, tweaking the test setup, suggested reruns of the test and of course blaming the tooling. And everyone I’ve told the story to after this has done the same thing. Start questioning the measurement, the data, the way we did it etc.

You know what the client ended up doing? Yes, of course we removed the CAPTCHA. We knew that it was better without CAPTCHA. Everyone knows that.

The Obama story

This client actually contacted another company later, since they were getting more serious about A/B testing. The company is called Optimizely and they did a really interesting presentation about their product when it was introduce at this client. In this presentation, to underline the use of A/B testing, they told the story about the Obama campaign site leading up to president Obama being elected president. Pretty good customer case, if you ask me (“this guy used our product and now he’s PRESIDENT OF THE USA”). Optimizely blogged about all the details here and it’s a really interesting read.

The site was really simple with just a single page, single “media”-section and an email address form (one field, one button). The goal was of course to get as many people as possible to sign up for the newsletter and hence “Join the movement”, and some percentage of these people would contribute financially too. It’s all about money, you know. :)

They 24 different combinations of button text, button colors and different movies / pictures above them. Here’s part of their stats that show sign-up rates per combination:

From http://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

From http://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

And the winning combination, combination 11 looks like this:

From http://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

From http://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

Yes, people. Red button with “Learn more” and a picture of presidential family in a pastoral setting. 40.6% improvement in sign-up rate. That translates to 2,880,000 more email addresses, 288,000 more volunteers and since they are likely to contribute on average $21 $60 million more in donations. Read the blog – it’s fascinating stuff.

What’s going through your mind right now?

If you’re anything like me it’s something like:

Ok? Yeah… but … WHY?! I mean, how can they know? Are they really sure about this? How did they dare run with that combo?

Because: they trusted their data and dared to be data-driven. Instead of wasting time discussing the result (“but why that color”, “maybe we should try some other pictures, just to be sure”, “is this really the right tool for this?”) they ran with what they had and pressed on, fulfilling the goal of the application in a great way.

Conclusion

So what am I saying: don’t think and trust your result blindly? No, of course not. But maybe, as an exercise, think about what you will do when the result comes. Even unexpected result. This will probably clear up a few things about your experiment too.

And as always, the longer lead-time, longer iterations you have the more is at stake and the more likely it is that you start questioning the data and go with your gut feeling instead. For some strange reason. Feeling over data – should be easy. But “we knew that without CAPTCHA is better. Come on?” If you’re concerned about risks – change more frequently. There’s a reason good boxers have excellent footwork.

If you have a quick way of changing, with the tools around continuous delivery etc., the less painful it is to go wrong. The only way to reach perfection is to fail fast and adjust fast. The entire Kaizen culture of Toyota is based on these ideas. Read more in the excellent Toyota Kata book.

Finally, acknowledge the fact that you simply don’t know. Nobody does. Not you, not the stakeholders, not the testers, not even the users. We have hunches and well grounded, guesses but until the thing you’re building is being used, by real users, in production… you simply don’t know. In order to know we have to test it out on them. Not only will this guide us further in what to build etc. These ideas is not new either and you should check out Lean Startup if you haven’t, for more information about this.

So go forth and be more data-driven. I dare you.

About Marcus Hammarberg

I am a consultant with Aptitud working with coaching on Lean and Agile thingies. Right now I'm finishing writing a book: Kanban In Action (http://bit.ly/theKanbanBook) When i am not working most of my time is taken up by the Salvation Army and playing my instrument, the euphonium. I am married to Elin since july 2006 and have three sons Albert , Arvid and Gustav
This entry was posted in agile and tagged , , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • marcusoftnet

    Hi Bertrand. Thanks for your comment.

    Exactly! Neither do I. But they did, or we rather (have to take some responsibility now that I blogged it, haven’t I?). And that’s the whole idea behind the blog post. Why on earth did they remove the CAPTCHA when they had hard data that said that it worked better with it turned on?

    It seems stupid, right. But they KNEW that “no-one wants CAPTCHA turned on. It goes without saying.”

    In short, we didn’t dare to be data-driven. We thought that we knew better than the data and when confronted with data that contradicted our beliefs, we chose to run with our beliefs. Unwise, but not uncommon. In my experience.

    I think that the more traditional the company is the more common it’s to stick to what you “knew” before you started and disregard the data, unless it confirms your beliefs. In smaller, faster moving companies it’s probably more common to trust the data instead.

    And that ties in with my last point on how fast you can iterate. One big release every 4 months make it painful to have your beliefs and knowledge challenged. If you release once a day it doesn’t hurt that much.

  • http://weblogs.asp.net/bleroy Bertrand Le Roy

    I still don’t understand why you removed the Captcha.

  • marcusoftnet

    Thanks for you comment Kyle.

    And yes, I don’t do this without cringing either. But we need to go there in order to become really effective I think.

  • marcusoftnet

    Thanks for you comment Richard.

    It’s entirely possible and was among the “explainations” that was suggested for the “strange” result we had.

    However, I mean that this is besides the point. We have hard data that show us that CAPTCHA on is better. Or Red “Join us”-button and family-picture is better. We don’t need to explain it. We just need to follow it.

    But we don’t. Not me. Not you. And not many people in our industry (that I’ve come across YMMV). That puzzles me and I think it’s time we stop doing that and start becoming data-driven for real.

    Thanks for pushing me to clarify this.

  • Richard Gardiner

    Since this was a Shopping site, is it possible that enough people thought the site was more secure with a CAPTCHA to skew the results?

  • http://kyle.baley.org Kyle Baley

    Good read. We’re going through this now and I totally agree. Going where the data leads you is *really* tough when your intuition tells you differently.

  • marcusoftnet

    Thank you! Glad you liked it

  • netsi1964

    Great post – thank you for sharing real life experiences.