I’ve just come across this article, which claims that the combination of social media and big data will be responsible for who wins the US Presidential election in 2016. It’s a bold assertion, and it comes on the back of many other opinion pieces that vaunt the major role of big data in the 2012 election. Alas, as is too often the case with journalism surrounding big data, it is riddled with a number of fundamental misunderstandings.
Gaffe #1: “Big Data just means data, right?”
Big data is not synonymous with data. If it was, there wouldn’t be a need to add the adjective. So, when the author mentions how seismic the statistic of “7.6% unemployment” was in the 2012 election, we are not dealing with big data at all. It’s just a simple statistic, and not even the aggregate result of a particularly big dataset, either. The temptation to say big data, when all that’s really meant is “data” or “statistics” is the cardinal sin de jour for tech journalists.
Gaffe #2: Making up jobs
The world of business intelligence and data analysis already has its fair share of jargon, but journalists seem to find much of it inadequate for their needs. To sex up their copy, they like to append random words on the end of ‘data’ and then pass them off as real job descriptions. In this piece we have “data crushers”, which is certainly not a title I’ve ever heard of before. Elsewhere I’ve seen “data guru“, “data wizard“, “data hacker“, even “data priest” (OK, that last one isn’t serious). If in doubt, use “data scientist”: it’s relatively new, but with the added virtue of actually existing in the real world.
Gaffe #3: “I’m on Twitter, so everyone is on Twitter!”
I could write an entirely separate piece on the demographics of Twitter, but consider this: only 15% of online Americans use Twitter. For all the hype and excitement that it generates, and I am an active user myself, the fact remains that the overwhelming majority of the world (and therefore voters) are not using Twitter. It is therefore quite a leap to assume that tying together big data (of some sort) with social media data will automatically deliver the next President of the United States. If most voters really were using Twitter, it’s much more likely that Ron Paul would be the current inhabitant of 1600 Pennsylvania Avenue.
Gaffe #4: What works for one industry works for all industries
The feats of ingenuity in the world of digital advertising are hugely impressive. Ad platforms use sophisticated algorithms which take in your activities on the web and serve ads in real time that are appropriate to what you’ve just been looking at or searching for. It’s cutting edge tech, and it is exciting – so I don’t fault journalists for their enthusiasm. But it isn’t magic, and it only works for people who are (a) online and (b) capable of clicking through and making a purchase. Sure, you can target online consumers – and, therefore, online voters in much the same way. But the result of a successful targeted ad is invariably measured by whether or not somebody purchases the product or service. Applying this to psephology and campaigning is not a simple like-for-like scenario. You can’t know how somebody actually votes in the end, and this stunts what you can do (in terms of feedback data) with your predictive models. This is not to say that it’s all fruitless, because it isn’t. Nate Silver proved what you can do with data (not necessarily big) and targeted marketing. But journalists have a tendency to get carried away with it all, and then end up making grandiose pronouncements like the one in this article.
It’s undoubtedly a good thing that big data – indeed, data generally – is getting the attention it deserves in the media. I would simply urge a certain degree of caution and ask our journalist friends to do a little more research before sharing their opinions with the world.
And yes, I am aware of the irony.