May 22, 2013

Big wins: Multi-armed bandit testing.

I’m all about big wins. I could care less about incremental optimizations if there’s a major victory to be had instead. That’s why I’m way into multi-armed bandit testing right now.

Essentially, it’s A/B testing, but with two major differences:

You can test as many variants as you care to
Your test will automatically adapt to show the version with the best conversion rate.

I call this a big win because it’s so simple to implement and to use that, if you’re not testing already, you can get started within the day. This method is so easy I was able to implement it in an hour before work this morning.

How it works

Like regular old A/B testing, you constantly measure the number of views and conversions of each of your variants. The difference: If one variant demonstrates a higher conversion rate than the others, it gets precedence as long as this is the case.

The idea is that the highest-converting variant emerges as the victor and – and here’s the kicker – is automatically shown all the time. You never even have to check up on your results, although you can if you like. But if you leave it along, your page will adaptively select the best-converting result 90% of the time.

Why 90%? Epsilon-greedy testing

Showing the best-performing variant is known as a greedy strategy. The “epsilon” refers to an extra bit of spice: some percent of the time (epsilon), you just show a random variant. This helps keep your tests moving along, in case one of your variants is dominating the others unfairly.

10% seems to be the default that people use as their epsilon, but other amounts should work as well. You’re basically creating a tradeoff between maximizing your conversion rate and achieving statistical significance sooner.

Epsilon-greedy is probably not the best algorithm, but it probably is the simplest, and that’s good enough for me. Leave the optimizing to people for whom a 0.1% conversion rate is worth real money, I say.

Recap

So, when a visitor comes to your page, here’s what your experiment does:

10% of the time, it shows a random variant
90% of the time, it shows the variant with the best ratio of conversions:hits

You have no excuse to not be doing this. Just change a button color or something; getting started is the hard part, but you’ll soon find yourself making new tests just to see if you can beat your high score.

Obligatory Clojure library

Yeah, I wrote a library to do this. It’s called bandito, and you can read all about it on Github.

References:

Here’s an edutaining point-counterpoint on the subject, in case you missed it last year. Read them in order: