For the uninitiated, the term “business intelligence” has the potential to generate quite a lot of questions. When is business ever unintelligent? How is it different from business analysis? What tools and skills does it employ? Why is it important? Would my business benefit from it? This first post in the “BI 101” series begins with the most elementary: what is BI?
Business intelligence is about enabling informed decisions based upon real data. That data is often unstructured, disparate, messy and large, which tends to make harnessing it (at least in its raw form) next to impossible for any business of scale. BI is all about transforming it into a usable format and delivering insight from it to people within an organisation who can then use it to improve their business.
The two pillars of BI are data and reporting. The first pillar collects relevant business data together into one database (called a “data warehouse“), structures it intelligibly and then ensures that the database is “fed” from its various sources on a regular basis. The collection of routines which bring the data into the data warehouse and then cleanse and re-structure it are known as “ETL” (short for extract-transform-load). The second pillar focuses on presenting that data in a way which makes sense to its end users. This could be in the form of so-called “canned” reports which have a boilerplate structure and are refreshed whenever the underlying data is refreshed, or else as an interactive meta-data layer that enables an end user to create their own ad-hoc reports on demand.
A typical enterprise BI solution is likely to use Oracle, DB2, Microsoft SQL Server or an open-source database like PostgreSQL for its underlying data warehouse. Recently there has been an increase in database technologies specifically tailored to data warehousing, such as HP’s Vertica and EMC’s Greenplum. Data inside the warehouse is then read and delivered by a reporting tool like IBM Cognos or SAP BusinessObjects. It is also common practice for a data warehouse to be made accessible to tools such as SAS, SPSS or WPS for statistical analysis. Various technology vendors also offer out-of-the-box BI solutions, delivered either in-house or through the cloud, which combine the data and reporting capabilities together into one system. MicroStrategy, Oracle’s OBIEE and the recent Amazon Redshift development are all good examples of this.
BI should not be confused with the process of data consolidation. Just as it would be inadvisable to run reports against a live production system for performance reasons, so it would be wrong to conceive of the data warehouse as a centralised operational database. Although it is the aim of every data warehouse to become the “single source of truth” within a business, it is not designed to replace existing online transactional processing (OLTP) systems. Databases are designed in different ways to meet different business needs. Reports that run against a data warehouse should run quickly and it should be possible for an end user to “slice and dice” the data in a way that is meaningful to them. Therefore, the BI data architect must always construct a data model that suits the dual reporting purposes of speed and intelligibility. Such a model would likely not be suitable for the kinds of rapid insert-update-delete that occur in transactional systems. Not only that, but a data warehouse needs to remain consistent throughout the duration of the report (or series of reports) being run – it would go against the goal of being the “single source of truth” if the underlying data was forever in flux.
Another feature of the BI world is that the data involved is often enormous – hence the popular expression “big data”. To put this into context, consider the following example. Business X operates a chain of supermarkets throughout the world. Each individual supermarket has electronic tills which process the transactions of each customer who shops in the store. Within each supermarket is a central database that manages the till transactions and tracks inventory, turnover and profit. If X wants to know which of its stores was the most profitable this week, it would need to report across all the transactions for all of the stores across the past seven days. Imagine that in a single day, one store generates 100MB of raw transactional data. Multiply that across 5,000 stores by 7 days: that’s 3.5TB. Now let’s say that X wants to know which of its stores was the most profitable this year: 182TB. What about over the past five years? You get the picture. A good BI solution will utilise database components like compression to reduce the amount of storage space needed to keep this amount of data; but that is only the beginning. The real challenge is being able to report across that much data with response times of no more than a few seconds.
BI continues to grow in popularity across all industries, and this post will hopefully have gone some way towards explaining what it is and how it works.