The idea of writing this post came out to me yesterday after some email discussions at work around the options for my company in business reporting: between BigData and Enterprise Data Warehouse. The whole stuff was around alternatives to Enterprise Data Warehouse (EDW) and one came up with the use of BigData as the alternative. I am just publishing here the thoughts I shared with him about it but a little bit elaborated.
About BigData vs EDW
For me, these are not opposing concepts. EDW has been in place for a quite long period and is built on top of Relational DataBase Management Systems (RDBMS). This means that EDW is used for structured data. BigData is a relatively newer technology which is built on NoSQL, thus no structured data.
Each of these concepts have their pros and cons and we need to be very cautious in which way to use them, especially when it comes to manage the business requirements of such implementation. Below is a basic description on the different use cases of BigData and EDW and how they can interact.
As you can see, each of these concepts have different use cases and different ways to approach them. While BigData can leverage the use of unstructured data, it is mainly built for developer, who will have to then build things on top of its infrastructure to unleash its potential. EDW is quite different, the use of structured data (known and qualified data) is automatically leveraged (meaning once the infrastucture is readily available, minimal developer input is required) and most of the job will be on designing and building the reports to sweat each data asset available.
BigData has an important part in the future analytics environment every company would want to put in place because it helps in the analysis of lots of data and data sources; but these data sources will be coming from unknown or outside the companies controlled environment like emails, social networks, mobile data, etc as these are fields most companies are getting into. For many, the absence of structure allows to store any data, as they come and whenever they come, which is great. This is the concept of “Store First and Define how to use it after”. The arguments against EDW are:
- Rigidity against Flexibility: EDW are rigid because data is too structured, BigData is flexible because it has not structure anything can be store
- Cost effectiveness: EDW is said to be costly while BigData can start from anything and grow later.
Enterprise Data Warehouse
When it comes to EDW, data has to be structured in an RDBMS schema but this does not mean that the schema has to be rigid/unflexible. Rigidity comes only with bad or inadequate architecture or with rigid policies around the said data (like regulatory requirements, internal policies, IT policies, etc). Structure is not really a bad thing; especially when you need to ensure some level of quality in what you present to the business. I can see financial, sales, customer base reports built on un-asserted and unqualified data. This kind of requirements are solved only using Structured Data. The concept here is completely different as a “Know first and store after” one. Now about the arguments between BigData vs EDW:
- Rigidity: As I said, rigidity is required in some areas of analytics and business intelligence. BigData will never go back or add features to support Structured Data because it will be reinventing the wheel and it will also defeat its initial purpose.
- Cost Effectiveness: EDW can also start small and grow big with time, it has be implemented in a phased approach. Building a BigData capable team takes also time and resources.
BigData and Enterprise Data Warehouse
These two concepts do not oppose to each other as said earlier, each one has its own uses cases and scenarios. In some cases, you may have mainly operational databases which have structured data and you can rely solely on EDW, while in other scenarios or industries you need a big bunch of raw data to run the show.
Building an EDW can also help to build open-standards APIs to allow flexibility in the implementation of future BigData initiatives as it is easier to build one API from an approved/qualified source of data rather from multiple operational sources.
To end it: BigData and Enterprise Data Warehouse are not opposing concepts, they should rather work together to answer bigger questions like: Those numbers in our structured reports what do they relate to? who are they? how do they live? etc.
What do you think about it? Let’s discuss…