Need for the data Warehouse
The enterprise data warehouse (EDW) is the backbone of analytics and business intelligence for most large organizations and many midsize firms. The tools and techniques are proven, the SQL query language is well known, and there's plenty of expertise available to keep EDWs humming.
The downside of many relational data warehousing approaches is that they're rigid and hard to change. You start by modeling the data and creating a schema, but this assumes you know all the questions you'll need to answer. When new data sources and new questions arise, the schema and related ETL and BI applications have to be updated, which usually requires an expensive, time-consuming effort.
Enter Hadoop, which lets you store data on a massive scale at low cost (compared with similarly scaled commercial databases). What's more it easily handles variety, complexity and change because you don't have to conform all the data to a predefined schema.
That sounds great, but where do you find qualified people who know how to use Pig, Hive, Scoop and other tools needed to run Hadoop? More importantly, how do you get fast answers out of a batch-oriented platform that depends on slow and iterative MapReduce data processing?
Will Hadoop supplant the enterprise data warehouse and relegate relational databases to data mart roles? Or is Hadoop far too green and too slow to change the way most people work? In our debate, Scott Gnau of Teradata and Ben Werther of Platfora square off. Share your opinion using the comment tool at the end of the article.
Ben Werther
Founder & CEO, Platfora
The EDW Is A Relic
The proposition of the enterprise data warehouse seems tantalizing - unifying all the data in your enterprise into one perfect database.
So you start an 18-month journey to find important data sources, agree on the important business questions, map the business processes, and architect and implement it into the one database to rule them all.
And when you are done, if you ever finish, you have a calcified relic of the world 18 months prior. If your world hasn't changed much in 18 months, then that might be ok. But that isn't the reality in any large business I've encountered.
Why is Hadoop was gaining so much momentum? Clearly it's cost-effective and scalable, and it's intimately linked in people's minds to companies like Google, Yahoo and Facebook. But there's more to it. Everywhere I looked, companies are generating more and more data - interactions, logs, views, purchases, clicks, etc. These were being linked with increasing numbers of new and interesting datasets - location data, purchased user demographics, Twitter sentiment, etc. The questions that these swirling data sets could one day support can't be known. And yet to build a data warehouse, I'd be expected to perfectly predict what data would be important and how I'd want to question it, years in advance, or spend months rearchitecting every time I was wrong. This is actually considered "best practice."
The brilliance of what Hadoop does differently is that it doesn't ask for any of these decisions up front. You can land raw data, in any format and at any size, in Hadoop with virtually no friction. You don't have to think twice about how you are going to use the data when you write it. No more throwing away data because of cost, friction or politics.
You might also like
The Guinness World Record for the Largest Data Warehouse: A Q&A with Tom .. — B-EYE-Network
Business unIntelligence—Insight and Innovation Beyond Analytics and Big Data Summary Is there still a need for the data warehouse? In this excerpt from his new book, Barry Devlin looks at why the data warehouse can no longer retain its old role of ..