Compression and Dedupe: Business Value and Data Safety
Many people are saying our Albireo embedded OEM deduplication changes the storage landscape. I am gratified by the response to Albireo by analysts (and here) the press (and here), by recent OEM adoption. Albireo is becoming a standard in data optimization because the product provides maximum business value without data safety compromise. Albireo also works well in combination with compression and over the past several months we’ve often been asked about the relative benefits of compression and deduplication by storage vendors as they consider these complementary data optimization choices. I wanted to share with you our view of these two complementary technologies and how they measure up in two vital areas: OEM business value and enterprise data safety.
First let’s look at the basics of compression.
Compression works only in file based storage compressing each file, but does not function across files.
Compression identifies redundant data across a very small window, usually 64 KB.
Compression produces data reduction rates at most 2X for most data types.
Compression alters the underlying data structures and requires compression and decompression of data.
Compression operates in the data path and impacts read/write performance as a ‘bump in the wire’ (kudos to Storwize for their work to improve performance).
Compression is a potential single point of failure for data retrieval.
Given these attributes, compression can provide a level of data optimization for NAS systems, but the only safe way to implement compression is as an embedded feature in the storage software – deployed, owned and managed by the storage vendor (not OEMed or deployed as a third-party stand-alone appliance in the read/write path). Embedded compression is starting to take hold with NAS OEMs (EMC and IBM speculation) and we expect to see widespread future adoption. Again, Albireo works well with compression, so we support this incremental data optimization move.
And now let’s look at Albireo embedded data deduplication. (See Jered Floyd’s detailed blog on the key attributes for a high performing data optimization solution here.) Here’s how Albireo stacks up:
Albireo supports block, file and unified storage architectures.
Albireo dedupes data across the entire storage pool, up to petabytes of data.
Albireo produces 3.75-100X data reduction for typical enterprise data types.
Albireo doesn’t alter the underlying data structures.
Albireo operates out of the data path with no impact to read and write performance.
Albireo operates as an inline, parallel or post-process operation and is never a failure point for the storage system.
So let’s summarize the comparison of Albireo embedded data dedupe and compression technologies in terms of data safety and business value.
| Albireo Dedupe | Compression | ||||
| Data Safety Impact | |||||
| Alters Data |
NO |
|
YES |
||
| In Data Path |
NO |
|
YES |
||
| Requires De-/Re-Hydration |
NO |
|
YES |
||
|
|
|
|
|||
| Business Value Impact |
|
|
|
||
| Optimizes Block |
YES |
|
NO |
||
| Optimizes File |
YES |
|
YES |
||
| Optimizes Unified |
YES |
|
YES |
||
| Reduction Range |
3.75-100X |
|
2X |
||
Albireo deduplication outperforms compression for data reduction by an order of magnitude and insures enterprise class data safety. Superior business value and data safety – that is Albireo data optimization for primary storage.
In my next post, I’ll discuss how combining Albireo embedded dedupe and traditional compression provides best of class data optimization.