It’s more fun and practical, if less exact, to have short-take versions of things I’ve mentioned elsewhere. Here’s round one – less than 15 seconds a shot.
Analytics is a genuine “people” business – it assists the quintessential human activity of asking good questions, and interpreting the answers.
In engineering and science, we can’t don’t know what we have until we know when it breaks. One of the first duties of analytics is to understand the limits of analytics.
And if the answer is “we really don’t have the information to answer your question,” that should be perfectly OK. However, it’s a lot easier to say that early in the game.
Show me a typical database, and I’ll show you tables filled with What, Where, When, Who, How much, and How Many. But show me the questions to which people most want answers, and what I’ll see is usually How? And Why?
It is we people who fill the gap between the data we’re likely to have, and the answers we most desire.
A requirement explains to stakeholders what they should do, so they can explain to us what they might be reluctant to pay for.
We must often ask for analytics requirements without detailed data, but this is rather like asking an arborist to trim a tree, based only a general notion of what trees should look like.
We have to start somewhere. But we should plan to have another go, after we learn how desires and data really align.
Luckily, rebuilding a database is usually easier than rebuilding a tree…
Programmers and analysts tend to be creatives, which may explain why programmers and analysts have an intense dislike for documentation – it’s a little like asking an artist to explain the brushstrokes.
If there is a task more disliked by analysts than documentation, it might be uncertainty analysis, which operates at the tedious intersection of repetition, reduction of apparent impact, and differential calculus.
Insight depends as much on those receiving information, as the information itself.
A perfectly good insight might be well known but not widely known – the application assisting the information transfer is performing a real service.
That you can sing “Amazing Grace” to the theme from “The Beverly Hillbillies” is well-known to choristers, but for most of us, insight is hardly the word. (Oh, do try it…)
Some insights really are new to everyone, but that’s less common. The only analytics application not worth having may be the one that confirms what everyone pretty much knows already.
Things often assumed, which ideally would always be proven: data are certain; data are complete; metrics are meaningful; numbers can be meaningfully compared; rules and transforms have little impact on outcomes.
Things that only seem to cost very little: storing data, and shareware.
Things that only seem expensive: design, testing, and analytics applied to either one.
If I have 1015 records in a table, and I need 1000 of those records for a particular graph or table, it would take 31,710 years for me to query the whole table at a rate of one query per second.
A great deal of big data is never meaningfully used. In many cases a good use of big data analytics would be to reduce the data set, rather than find algorithms to process a great deal of valueless data. It doesn’t take 10 million records to know there are no gold mines in Kansas.
Storage and processing of data costs money, whether the data are meaningful or not. Perhaps more importantly, the mechanics of big data storage can directly impact our ability to manipulate and model data, and therefore our analysis options.
Analytics isn’t supposed to be easy. Asking good questions, assembling and using data, and understanding the answers has always been a challenge.
Those asking questions, interpreting answers, and building data systems – from infrastructure, to databases, to data engineering and science – do a remarkable job and nearly always deliver value.
Locate a technical expert bored with their job, and you’ve also found an opportunity to automate the task they would rather not be doing. With plenty of good and challenging problems to solve, why should we fear automation of the bad and simple ones?
I like to understand the code I see, except for regular expressions. Even after I write them I don’t understand them.
When interpreting data there are conclusions, which are supported by the data, and hypotheses, which deliberately reach beyond the data. Both are great in their separate realms, but it’s surprisingly easy to mix them together.
You time a car racing wildly down the street and conclude the driver is over the speed limit. You can only hypothesize the driver is not in control of his vehicle.
Each of five stores with new managers show August sales below those of last year, while other stores have done well. It’s only a hypothesis that managerial inexperience is the cause. (In reality, the managers were new, but so were the stores and the previous good sales were an artifact of “grand opening” sales.)
Data warehouses are often excellent, because they simplify and systematize asking and answering of common questions related to aggregation, particularly over hierarchies.
If you don’t want to do that, you probably don’t need a data warehouse.
There are scores of biases, but three I see the most in analysis work are confirmation biases – we want to be right; automation bias – what the computer says is valid; and sunk cost bias – the more we invest in something the more we want to believe in its authenticity.
A rather brutal but honest kickoff meeting might be: ask each stakeholder what they expect, and then explain that 1) their information may not be able to deliver what they would like; 2) computers would as soon distort their information as present it; and 3) things can only look up from here, for a cost identical to what it would cost for disappointment later.
My mother, a perfectly nice computerphobe, called one day to ask my help with a data problem she was experiencing in Excel. I listened to her problem, I helped her, and then called my consultant friends with this message: prepare yourself – analytical mechanics are now in the mainstream. And that was 15 years ago. Increasingly, the best contribution for IT professionals will be beyond the detailed manipulation of code and data.
A friend of mine insists that all problems in analytics start with invalid reasoning from aggregates. Which he also offers as proof that you can pretty much be right, and pretty much be inconsistent, all at the same time.
The real crisis of information is less about having too much information, than about having too little that is reliable, relevant, and believable.
Is that a fact? In the absence of clear assumptions, a well-defined system, and a transparent characterization – the elements of a good scientific observation, but broadly applicable – we shouldn’t be required to answer “yes.”
That excludes many statements, but includes a great deal too.
Bad day? I hear you… But if it weren’t for problems, we would quickly be out of business.
Analytics is as old as people answering questions with information.
Until about 25 years ago analytical reasoning was limited by the data at hand. Each item was examined, reviewed, scrutinized, and conclusions were drawn.
Since then, we have gained in available data, but lost in data context, as we automatically ingest data and place them in tables and fields we can handily consume.
In big data systems, we’ve gone from comprehending a lot about a few items, to comprehending a little about a lot of items. And that’s fine, if that little is enough to characterize those items.
But often the little we have about each item is not enough. When it comes to people, or organizations, or economies, or healthcare, or many other complex things, it is difficult to know what information to collect in advance of inquiry, and harder still to comprehensively collect all of that information.
Meaning: for simpler systems we’ve moved ahead; for complex systems we are limited, as we always have been, by the data we have available.
In engineering and science, simplicity may be the most complex thing to achieve, while complexity – which is frequently mistaken for sophistication – the simplest to achieve.
Simplicity may be most underrated property of good technical work, and the key to transparency, which give us understanding, which leads to acceptance, which offers the chance to make an impact.