Thoughts on Cloudera Hadoop and Netezza
Mike Olson recently said the following while announcing a partnership between Cloudera and Netezza:
> “Enterprises want to take structured data – customer and transaction data – and combine it will [sic] all the unstructured data coming off their websites…that might not fit into a tabular schema well.” […] “All of that activity is captured in web logs that can’t easily be digested using existing relational systems.”
I couldn’t agree more with the premise and I’m glad Mike is prepared to mention the elephant in the room. Given who he works for and given that Netezza is a relational database in a very fast box, that would seem fair.
I think of this as lawnmowers and duct tape. If you strap enough lawnmowers together they will go pretty quick. They won’t handle very well, the safety record’s spotty, they burn a lot of two-stroke, and keep running out of gas, but still they go fast in a straight line.
Again Mike is right when he says customers want to be able to merge this data with other data types such as customer and transaction data. His conclusion, however, is just plain wrong.
The answer isn’t to process some data in the cloud and then some in an appliance. The answer is to put them all in a common store that can process both.
Hadoop is a great technology. Cloudera is doing a great job and adds a lot of value. But the particular problem Mike refers to would be much better handled by putting all the data in a single Column-Oriented Database Management System (CDBMS).
Why push your data into the cloud, analyze some of it, pass some of it down into an appliance, analyze some more of it, and then start all over again? Even putting aside questions of bandwidth, security, and performance, it’s simply not efficient. You can pound screws with a hammer and wrench rather than using a screwdriver, but why would you?
Given the costs involved and results demanded, enterprises are looking to use the right technology for the right job. The right technology for this job efficiently combines all the data in one place and is optimized for performance.
The right technology is SAND CDBMS.