Deduplication is Not a Feature
I’ve been writing an awful lot about deduplication lately, how it works, how it doesn’t, and how Permabit does it. I’ve been drumming it up a lot, so now I’m going to turn the tables and say something different: Deduplication doesn’t matter.
No, I’m not contradicting myself.
When you set out to buy an archive storage product, there are things that are features and things that are product characteristics. Examples of features are NFS protocol interface, unlimited volume size, low cost, and comes in blue, red or black. Examples of characteristics are Intel processor, SAS drives, number of gigabytes of RAM and, yes, deduplication.
These look like similar lists; what’s the difference?
Anyone in product management will tell you that the hardest thing about defining a product specification is getting the real requirements from the customer. People are very good at correlating characteristics they’ve seen before with qualities that they want in a product, and will mistakenly ask for the characteristic instead of the quality, or feature, that they want. This can result in disasters such as products that meet the specification 100%, but don’t solve the underlying problem.
Looking at the first list again, the requirement for “NFS protocol interface” probably comes from the desire “works with my existing software on my application servers”, and those application servers probably support NFS. NFS support is a feature. Similarly, I might need “works with my existing software without continual configuration changes”, which may lead to the requirement for “unlimited volume size”. These are clear features that support business requirements.
Reviewing the second list, however, what business driver would lead to a requirement for “uses an Intel processor”? Unless I have a business agreement with Intel, it doesn’t really matter if my storage appliance uses an Intel processor, an AMD processor, or a SPARC. The real requirement is probably “meets my performance needs”, and the customer has been conditioned to think that Intel processors, or SAS disks, or 32 GB of RAM are more likely to make that the case. But a system with those characteristics might just as easily fail to perform, while a system with different characteristics might well exceed the requirements. The job of a product manager is to extract the real requirements so that the product fulfills the customer needs.
Which brings us to deduplication. Deduplication is not a feature. it does not satisfy any underlying business requirement. The real requirement is the reduction of cost.
Permabit Enterprise Archive provides the most scalable, sub-file data deduplication because that technology helps to reduces the cost per-gigabyte of storage for your business, while maintaining outstanding performance in terms of reliability, availability and scalability. That’s the reason we built in dedupe.
Data deduplication reduces cost in a number of different ways, and this has driven many vendors to scramble to find ways to integrate it into their products. Capital costs are clearly reduced because there’s less physical storage that must be purchased. Just as important are the operational cost savings: fewer drives to spin and cool for the same amount of data storage. Deduplication is an inherently “green” technology. Additionally, deduplication provides greater effective density, reducing rack space requirements in the data center.
There are other technologies that can reduce the cost of data storage, but only deduplication offers the potential to drive down the effective cost of online archive storage to below that of the hardware itself. That’s the real reason deduplication has so rapidly taken off as a technology in the marketplace, and one reason for the massive success of newer deduplicating VTL systems. We’re convinced that the next market to strongly realize the cost benefits from deduplication technology is the large-scale archive market, and that’s why we’ve built it into the core of our appliances, designed from the ground up to deduplicate data on ingestion.
The other lesson to learn from this is: don’t buy expensive dedupe! Expensive storage with deduplication is still expensive storage, and doesn’t meet the real customer requirement — cost reduction. Deduplication technologies should never be considered just a checkbox that needs to be on the data sheet for your next storage system; it’s only useful if it’s saving you money today.