Storage is getting cheaper but bandwidth isn't keeping up: how SAND CDBMS Nearline for SAS lightened dunnhumby's load
Dunnhumby handles retail analytics for over 350 million retail customers on behalf of companies like Tesco, Home Depot, and Kroger. When they first approached SAND, they were looking for a solution to help with the data recovery architecture for their SAS implementation. The main problem was the sheer size of the data involved: dunnhumby had terabytes of raw data in SAS that needed to be efficiently accessed by hundreds of analysts. What’s more, once this data was prepared for analytic use it ended up requiring 3 times as much storage, with the added need to maintain live data recovery capabilities.
SAND’s first step was to transfer the data to SAND CDBMS Nearline for SAS. Doing that immediately brought the data size down to *gigabytes* (over 98% compression). This greatly reduced the complexity of the overall disaster recovery architecture. It also became a far less expensive proposition, not only in terms of storage costs but also (and more importantly) from the point of view of network resources.
To put this issue in context, think about how many people are now backing up their personal data — photos, movies, music, etc. — using Cloud Computing services like Amazon’s. These services are relatively cheap and provide peace of mind to users by backing up their valuable data off-site. However, when someone wants to start moving even one terabyte of data — or worse, needs to recover a terabyte of data — over a broadband connection, it becomes so time-consuming and expensive as to be almost impossible. (There are even services that sound humorous — but really aren’t — where backup/storage companies transfer your data by dumping it on hard disk and sticking it in the mail). Scale such problems up to the corporate level, and the difficulties become easy to understand. Hard disks are getting bigger and cheaper, bandwidth not so much.
Beyond the size savings, SAND CDBMS Nearline has enabled dunnhumby’s SAS analysts to transparently access all their data, both online and nearline. Analysts access the same data as before, and because the data footprint has been reduced 98%, I/O performance is much better, significantly increasing the overall performance of the SAS environment. Query speeds alone have improved by a factor of over 7 times.
The result is that dunnhunby’s SAS analysts actually get to be analysts, rather than spending most of their time preparing and modeling data. They get their work done faster and more efficiently: at dunnhumby, analyst productivity has been improved by a factor of 3. Given how hard it is to find and train SAS analysts and how valuable their time is, this was a dramatic benefit for dunnhumby, as it would be for any enterprise.
Also, because SAND CDBMS Nearline for SAS created such a small data footprint, it became possible to execute certain data transformations that enhanced the value of the information inside the raw data, making it far more useful for analysts.
With SAND CDBMS Nearline for SAS, analysts’ data is now ready to go when they are: they simply access the data pool, shop for the data set they want, and slice away. When they want to do specific studies of specific data elements at specific times and in specific areas, the information is processed quickly to respond to their demands.
Access to more and higher quality data with far better efficiency and performance? That’s a huge win.
You can hear dunnhumby’s co-founder, Clive Humby, discuss this further in his BeyeNETWORK podcast with Claudia Imhoff.
And this is just the first piece of SAND CDBMS Nearline for SAS. I’ll tell you more about the second piece in my next post.