There is a great deal of information analysis writing available, and much of it is quite good. I do see one type of article though, which I think does something of a disservice to its readers – what I call the “Stone Tablet” article. The format runs something like this: today I will tell you the my correct and divinely-revealed procedure for processing information to achieve particular ends, as illustrated by data set D and software packages P. Following are many paragraphs and code snippets showing key steps, including load-the-thing, transform-a-lot, and present-o-rama. Thusly, we have the basis for any serious work in this area, Amen.
If I’ve learned anything about information applications, it might be this: don’t look for a procedure or a software package to solve your problems. Should we observe guidelines? Sure. But I’ve never seen a procedure or a package build an application, while I’ve seen plenty of procedures and packages cost my friends and clients unnecessary time, pain, and money Particularly when articles claim a particular software package is necessary to your survival, that’s a good time to stand back.
One reason I urge people to avoid implementing default procedures is that in many cases you don’t need what the article demonstrates. One of my most successful analytics applications was prototyped in about a day, and really just summarized some information and put the results in a tidy-looking set of gauges. No models at all, really. Simple can be good. Speaking of models, it’s very common for people to build models with unused detail, and not uncommon for a model to use detail available that isn’t generally available to the user community, actually rendering it useless. A simple “B”-quality model can be “A” judgment.
A second reason I urge caution about Stone Tablets is that what you read may not generalize to your situation. These articles tend to be written for very specific data sets, rarely if ever have any error handling or other production-code necessities, and can overreach in suggestions of their generality. Personally, I don’t copy code from articles. If I see an idea that looks interesting and works well, I’ll add it into the mix and document a note about the reference, but only after implementing in the notations and conventions of my current project. It really doesn’t take much longer than doing an extensive unit test on someone else’s code, it is legally and practically safer, and I know what I’ve got when I’m done. In fact, if you are like me and have your own code “toolkit” around, it’s often faster to implement an idea your own way.
My perspective is that procedure- and -package-orienting thinking can obscure what we’re really after. If we’ve made representative data available, developed suitable inferences from that data, and helped our users understand the interpretation and limits of those inferences, we’ve done our job. While I personally focus a great deal on “representative,” “interpretation,” and “limits,” the fact remains: there are many valid ways to deliver information value. When someone implies there is only one way, or one package, that can get the job done, I believe a healthy skepticism is appropriate and ultimately time-saving.