Primary Storage Deduplication is the Future
Until now, I’ve chosen to stay out of the little tempest in a teapot that’s going on over at Chuck Hollis’ blog, but it doesn’t seem to be quieting down. He basically says that Data dedupe has no place on primary storage, which flies in the face of where the dedupe market is going… but it’s not a bad position to take when you’re company makes a lot of money off of very expensive primary storage.
Their biggest NAS competitor took the bait, and NetApp jumped into the fray. In between the vitrol some good points are made about why Chuck is wrong. For example, if deduplication is increasing access to common blocks it means that you’ll be seeing much better cache efficiency, which will offset additional load on the drives. The “boot storms” he talks about with many virtual machines hosted on the same storage are actually less likely to occur with deduplication than without!
Now Hu Yoshida has weighed in on Chuck’s side, but then laid out the much more reasonable view that virtualization, tiering and dynamic provisioning are critical in an environment with costly top-tier primary storage. That’s correct, but it doesn’t mean that deduplication at that top tier isn’t a huge win as well.
Deduplication has seen its first successes in the D2D backup space, where it’s easy to get a lot of deduplication due to the data patterns and traditional backup schedule. Applying deduplication beyond backup is hard, because the opportunities for deduplication are fewer and further between, and so these D2D backup devices have never been able to address archive or primary storage effectively. That doesn’t mean dedupe is bad for primary, it just means that it’s harder to do.
At Permabit, we consider dedupe for backup to be Dedupe 1.0, and the future for innovation is in Dedupe 2.0, which includes dedupe for primary and cloud storage. We host a forum over at Dedupe 2.0 to discuss this further, and recently released our Permabit Cloud Storage product to address these new customer needs. I can’t give too much detail, but we’re constantly at work making our deduplication technology available to ever broader markets.
Dedupe for primary is a huge win for the storage consumer, but it’s taken us nearly a decade of extensive technology and patent development to solve the scalability and speed challenges needed for that market. I think it’s no coincidence that the two voices denouncing primary dedupe the most, HDS and EMC, has no products to offer that include a feature which will soon become a customer requirement.
If you’re going to be at Storage Networking World next week and would like to hear more on primary dedupe, Arun Taneja is moderating a panel, “Primary Storage: The New Frontier for Data Deduplication“. I’ll be there, along with Val Bercovici from NetApp, Carter George from Ocarina, and Peter Smails from Storwize. It should be a lively discussion! Perhaps Chuck will stop by?