At its recent re:Invent partner conference, Amazon unveiled a cloud-based data warehousing solution, provisionally named Redshift. The Register has more details here.
This is not the first time that the concepts of “cloud” and “BI” have come together. What is interesting, though, is how Amazon is seeking to position this new technology (or, at least, new variation on a theme). According to the Register’s report, the USP of Redshift will be that it allows customers to effectively “dump” their data onto the cloud and Redshift will do all the whizzy stuff to make that data accessible via standard SQL query – or the BI reporting tool of your choice.
So far, so sexy. But even on this vague description, there are the usual alarm bells going off in my mind which I have yet to see any cloud-based BI solution silence. In no particular order:
1) How would such a solution handle the perennial bugbear of any warehouse project, namely that of messy, unstructured, inconsistent, duplicated data? It is one thing to compress it and throw a lot of hardware at it to get the results back – but what about those results? There aren’t very many customers out there who like to view their KPI reports with spelling errors, still less with the same figures duplicated – or worse. There is an industry-wide assumption, outside of BI circles, which goes something like “my biggest problem is all this data; if I could just crunch through it all, I’d be able to find the answer I want.” To be sure, the processing power and query response time is a key measure of any successful BI project. But that isn’t the sole reason for having a data warehouse. The process of ETL is all too often forgotten in the mad rush for fast query response times. It’s also the reason why so-called “real time” BI solutions have struggled to compete with the more traditional batch processing systems. Messy data is a real problem, and no amount of limitless server power in the cloud is going to solve that.
2) The “intelligence” in business intelligence is there for a reason – and it isn’t just to sound fancy. The concept of a BI project emerges from a need to apply intelligence to what is basically a large, ever-growing pile of dumb data. Without taking the time to understand what is in that data, and what it can potentially answer, there really isn’t much point in having a data warehouse at all. Kimball, Inmon and all the other star or snowflake-based schemas out there aren’t simply for improving query times. The modelling of a data warehouse has a lot more to do with making sense of the data – order out of chaos, if you will. While I would not dispute the technological prowess of a company like Amazon, there is only so much that a set of algorithms can do for you. I would certainly be interested to know, for example, how much customisation and client involvement there would be in any Redshift implementation.
3) Cloud-based platforms of all kinds still struggle to get buy-in from a variety of industries for a reason that can be summed up in one word: security. Again, I do not doubt that Amazon have some pretty strong encryption and authentication processes in place – for Redshift and for EC2. But non-technical executives who make decisions about where to host their data are notoriously nervous about the cloud. And who can blame them? If a technology giant like Sony can be vulnerable to hack attack, why not Amazon? Furthermore, just because Amazon has an impressive amount of hardware at their disposal with all manner of failovers and DR strategies in place, they’re not immune from unanticipated downtime. I would be the last person to suggest that an in-house data warehouse is, by definition, more secure – in my experience, that certainly isn’t the case in many companies. But you know what, in-house hosting certainly helps those non-technical executives sleep more easily at night.
Of course, it’s early days for Redshift, and Amazon may yet wow us all with automated processes that render BI guys like me wholly redundant. But on the facts so far, you’ll excuse me if I don’t start looking for a career change just yet.
[…] reporting capabilities together into one system. MicroStrategy, Oracle’s OBIEE and the recent Amazon Redshift development are all good examples of […]