Automation and DIY Analytics

Ready or not, self-service analytics is here.

Great tools tolerate our ignorance and reward our expertise. Software and data engineering are both loaded with development tools that automate much of what we once did by hand.  Ideally that automation speeds the mundane and helps us as we’re starting out; at the same time good automation can be overridden and extended when our expert judgment should prevail.

The line between some automation, which is often helpful, and too much automation, which creates more problems than it solves, is razor thin.  I spent some time this week thinking about where the new-ish pheonomenon of self-service analytics fits into this spectrum.

Automation is tremendous for tasks that are long but ultimately simple, with few creative or complex decisions.   Self-driving cars?  Bring it on.   Software installation programs?  Sign me up – no one want to waste time on that crap.  Build the initial prototype of a Windows or Web application?  Fantastic!   Give me four or five standard graphs to help me assess a predictive model?  I am living the dream.    These tools save us time, but if needed we can also run the show ourselves.

That’s great, but for every automated task there seems to be at least one that is over-automated.

Automatically assign relationships between data-model tables?   Nice if it works, but the it often does not.  Checking and fixing is time-consuming and error-prone – it’s better to do this one ourselves.

Automatically set up and run a Monte Carlo simulation?  The set up – perhaps, the automated run – rarely.    Knowing details like the time step and the length of the simulation are matters of experience and experimentation, in most cases well beyond simple automation.

And what about automatically detect insights in a data set?  This is a hot topic in business intelligence and analytics – the idea that  “self-service” analytics can be executed by users, offering data insights without knowing the mechanics of data analysis.

You’re expecting me to say this is over-automation too, right? It’s terrible, irrelevant, and wrong.   After all, analytics is human Q&A with information in the middle, and in that process humans should prevail now and forever, Amen.

Well, I do think humans will prevail, but self-service analytics will likely change how all of us work with data.

With certain caveats, I think self-service analytics has real merit.   When I helped a client evaluate Microsoft’s Power BI this week, I used its built-in facility to check for potentially interesting data features.  It was not a perfect experience:  Microsoft, with its talent for long-winded and irrelevant documentation, did not quite say what “quick insights” was doing, but it was something very close to univariate and bivariate correlation analysis and outlier detection, with nice little graphs showing what was deemed to be interesting.   The first caveat is this: without knowing exactly what was automated, it’s hard to know what we people should do next.   The next caveat is that the “insights” were  mostly trivial – in this data set the insights (as I know from prior work) lie in correlations across a number of data dimensions.

That said, the potential is there.  I have performed the same analysis myself. It’s a boring and routine affair, so why not automate it? Power BI has a nice feature that lets people quickly add any worthwhile outcome onto a dashboard, so everyone can see it.    OK – what came back was not interesting, but that is not the fault of automation.   What I see is a facility that eliminates routine tasks and allows us to focus on more complex or hidden features – all to the good.

Many people agree, for they rightly see self-service analytics as empowering.   I believe we’ll only see more self-service analytics, because it’s what people want, and the technology is now available to deliver it.

Automated analytics…. It was probably inevitable.  Like other technologies that seem to catch us a little by surprise, our challenge is now to optimally use this new tool – so it helps when we know only a little and does not hinder when we come to know more. Being realistic about the tool’s limits and how we’ll use it may be the place to start.

First, auto-generated data “insights” are interesting, but that is far different from knowing that those insights are complete or entirely accurate – that’s still a job for experts.   Automation should not be seen as a panacea. It saves the labor in beginning the analysis job, but does not end the job.   And that’s OK. It’s just the right role for automation, and one of the reasons I think self-service analytics has the potential to succeed.

Second, experts and general users will soon experience evolving roles in data analysis. Human data experts now spend much of their time implementing technical minutia of data processes, such as presenting results and transforming data.  Many of these data implementation tasks will decrease in importance, while the importance of analytics design will become critical.  Experienced analytics practitioners know that successful analytics has to be designed, to manage everything from biases, to data models, to data incompleteness, to models, to interpretations, to productionalization.

Empowering a more general class of users increases the importance of design.  It’s one thing to craft a good design that can be implemented by technical experts who understand the relationship between design, development, and outcomes.   As development tasks evolve to become routine user operations, analytics designs must anticipate a broader user experience.  Self-service analytics ultimately means that users themselves will need to be aware of – and manage – expectations, biases, and uncertainties; ask correct and complete questions; formulate objective functions; prototype predictive models, and properly interpret outcomes.  Assuring that this is a productive experience rather than an expensive fiasco will fall to analytics design and process formulation – essentially, to knowing the problems and challenges before users do.   That is feasible, but also entails a greater focus on design than is common practice now.

Good or bad – and it’s probably a little of both – the success of Power BI suggests that partially automated self-service analytics is well on its way. The roles of for all kinds of data users will be changing too.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s