I was happy to see serious press being given to researchers questioning the bias and error endemic to studies of diet, exercise, and their potential health benefits. The problems are not really surprising: it’s not easy to quantify diet, exercise, or health. Study participants have trouble remembering what they ate last week, much less last month. Besides, how honest are we, really, when our doc asks us about our diet? Not least of all, researchers make no splash by confirming existing notions, or providing a detailed readout of their research technique. If you want to reach a wide audience, Coffee Kills! or Lettuce Saves! is definitely the way to go.
I’m not rooting for these studies to fail – on the contrary, I want them to succeed. However, I do support a focus on improved questions, reduced bias, and better predictors when current results are dubious, rather than just cranking out more correlations and predictions. Unbiased indicators for historical eating patterns, exercise activities, and overall health sound like a challenge – but then, it’s not my area.
It’s management of bias and error, good questiojns, and reliable predictors that make a reliable prediction. Correlations, machine learning et al. – all important and good, but ultimately they are secondary. Just ask the diet/exercise guys mentioned in the article, or the Sabermetrics guys. Even ask me: I spent most of my time on a multi-year solid-state materials design project working on enhanced predictors – each new predictor was really an exercise in theoretical chemistry and physics. A lot of work, but without them there would have been nothing to correlate, and no prediction to make..
Crafting improved methods for reducing bias or improving predictors, which critically important, won’t be critically acclaimed by those seeking to understand what foods will let them live forever. Optimized Dietary, Exercise, and Health-related Predictive Metrics simply won’t sell like Eat This, And Live. But if we’re really going to figure out what to eat, and live, it’s more important. So it’s good that people working on these issues are getting a little public notice and support.
It’s reasonable to wonder, though – why so many empirical models make a prediction that later turns out to be suspect or invalid. Part of the problem is certainly that bias can make the model, along the lines of what was described in the article. I think there is another partial answer. We readily presume that the “unexplained” error in our models has a simple, benign, and “random” character – in essence, that our prediction will have an error bar but otherwise be qualitatively valid. That isn’t necessarily true – I’ll post a short example that shows how one common type of randomness is just a limiting case of general, un-random interactions with our system. So if the outside world changes – not just the data we’re actively monitoring – our model can qualitatively change, too.