I was talking with a friend the other day about some of the blog entries, and he had an interesting remark: “People love to read the police blotter. You could have a section devoted entirely to crimes against data.”
He’s right, but having committed a few “crimes” myself, I want to be a little careful. As Linus van Pelt said to Lucy, when she presented him with a ten-foot scroll listing suggestions for personal improvement: “These aren’t faults! These are character traits!”
We can get carried away. Most analysts do a very decent job and try to ascertain reality as best they can. There’s no “crime” there, regardless of approach.
But I’m willing to apply a #truecrime tag to deliberate distortion or data fraud.
The most egregious distortion? It’s hard to go wrong with data cherry-picking. For example, cite cases in which a trained person has prevented a terrorist attack with a weapon. Therefore the more weapons we have in our hands, the better. Or getting a “better” model by tossing out selected data, for no other reason than the data we removed were a pain in the ass. It’s the same principle, if not the same stage.
Real data fraud – cooking the books, false transactions, “augmenting” time cards? Not much to discuss, really – if we can use analysis to detect this, I’m all in favor.
Some other things might be sub-optimal practice, but to me the criterion for “true crime” is deliberate and willful misrepresentation. And then I agree with my friend, we should take time to call out the perpetrators.