In Praise of Train Wrecks

My college dorm’s rec-room TV was tuned to The Love Connection each weeknight  (preceded by Star Trek, of course).   Watching non-engineers report on their blind dates was a pretty good way to relax, after a day spent learning about differential equations or thermodynamics.  The best Connection shows had a train-wreck date. Host Chuck Woolery would intercede between two people who went on a blind date, only to discover that they really and truly hated each other.  Great stuff – an embryonic reality show.  If you’ve never seen it, check out Love Connection Disasters sometime, just to get the flavor.  No Love Connection train wrecks, No Love Connection at all.

I suppose no one will ever make a show about quantitative analysis train wrecks – I’m not sure even Car-Talking Ray and Tom Magliozzi could have pulled that off  – but train wrecks are really where the action is, because that is when we really learn something about our data.

We’ve all been there. We’re handed a can’t-miss scenario –  all we have to do is line the data up, punch out a little cross-validated pattern categorization or regression, and quality predictions must inevitably flow. Except that the damn thing just won’t work – only cherry-picking data or some other evil act can make things right.  We have on our hands a major data, metric, or algorithmic malfunction.  Train wreck!

Although we don’t actually like failures, these scenarios are blessings in disguise. First, if everything works perfectly, not only do we learn nothing about our data, but our outcomes tend to be uninteresting or even trivial – in my experience it’s a good bet we didn’t need quantitative analytics in the first place.  Second – if successful quantitative modeling were really that easy, everyone would already be doing it.  We should be careful what we wish for.  Third, unexpected failures – where we need to make a prediction, but cannot – frame immediate value propositions for our community. Making the analytics work, whether that’s through data augmentation, additional metrics, or algorithmic adjustment is just about guaranteed to add real value.  And finally, admit it – you’d be bored if this never happened.

Some level of failure is a natural part of good analytics – it’s “cross-validation” at a high level.  A zero failure rate equates to no learning or improvement in our systems, and of course a 100% failure rate means we’re getting nowhere.   Understandably though,  from a narrow funding perspective these failures can look like a simple cost.  We have a duty to educate our sponsors on why some level of failure is not only natural, but beneficial.  As a research manager told me one time, “The only experiments I really care about are the ones that don’t work. The rest is just waiting.”

I thought of this again after I was asked recently, “What’s a reasonable rate of quantitative analytics successes?”   That’s inevitably a function of our work environments, but I personally like somewhere around 2/3.  If our success rate is much lower than 2/3, our team can become discouraged and sponsors will become frustrated, regardless of their level of sophistication.  If our success rate is much higher than 2/3 outcomes veer towards the uninteresting or even irrelevant.

So the next time you come into a quantitative analytics train wreck and someone starts to get irritable about it, please tell them that Chuck Woolery and I say hello.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s