So we’ve all heard of big data, and many people now deal with it on a daily basis. It’s no wonder big data techniques are proving so useful, since the human brain itself is the best pattern recognition ‘machine’ in existence (so far!).
Within big data analytics, our skills in making machines which can notice deep patterns have opened up exciting new avenues, which are then used to map certain trends. But this is where we need to be careful, because on such a large scale, we should be cautious of what is actually a pattern (or a waste of time) and what assumptions we are making from the outset.
The assumptions that we begin with could be numerous, maybe even in their millions, so surely the consequences of one of these being off even slightly could skew the results. It’s this very reason why we need to test these assumptions prior to our analysis and reinstate our confidence in the results we produce.
1. Choose your variables carefully
One of the many downfalls of large data volume and variety is that, while you have far more variables to analyse, you also have more chance of statistical noise. One moment you could be evaluating what you think to be a variable of value, or a correlation with meaning, when it could just as easily be a distraction.
In this vein, narrowing down the amount of variables you test at one time will reduce the amount of ‘bogus’ correlations that could be considered significant. As to which ones to choose, this is up to your judgement.
2. Know when (and when not) to nit-pick
Imagine a team of managers bickering over whether the strength of the Swiss Franc will change by one or two or three percent. If you take a step back and realise that any of these predictions will carry the same result for your business (i.e. it’s the currency one of your small clients uses), it isn’t worth splitting hairs over which precise prediction is correct.
This is particularly relevant when utilising analytics to make forecasts, or track conditions over a period of time. In these instances, it’s necessary to test your assumptions for when a change is important or carries consequences. Bill Franks tells Forbes that we should turn to the common engineering technique, sensitivity analysis. Essentially, testing these sorts of assumptions early on, using this method, will show you how precise you need to be.
3. Log all your assumptions for the future
While the first two points are integral to analytical confidence, choosing the right variables and assessing your initial assumptions are moot if you don’t document your decisions. If something were to go wrong with a product you are testing, for example, you can refer back to your previous judgement calls in order to understand which assumption meant that the issue was missed. Now, we’re not machines, and mistakes do get overlooked because we cannot possibly think of everything that could go wrong.
Even if the fault was not identified because of a valid decision you made, you can rest assured that this was a logical call, and because you have these decisions documented, you are able to pick apart the root of the mistake. By doing so, you can assign new importance to factors you hadn’t considered relevant before, rather than having spent days and months looking for something you didn’t know existed.