Latency, Latency, Latency…
A few comments have been made regarding latency, most recently in a Search Storage piece from Dave Raffo, when comparisons were being made between compression and deduplication applied to primary storage. The concern expressed is that deduplication may be inflicting latency in the write or read process for primary storage. That may have been an astute observation a year ago but today its simply just not a valid observation!
At issue is the performance of primary storage which needs to be as efficient as possible since businesses are run on primary storage that houses transactional data for core applications such as finance, manufacturing, CRM, sales operations, and supply chain to mention just a few. For these applications any significant delay in data access is simply not tolerable because businesses run on them! With today’s high performance data optimization (deduplication) tools such as Permabit’s Albireo there is no latency on data access because Albireo is NOT IN THE READ PATH! And for that matter Albireo is not in the write path for all but in inline implementations.
In a recent blog post NOT in the READ Path I go into some of the details ‘As we designed and developed Albireo, we specifically addressed the most critical of all performance criteria – data read. Why is it significant? Because data is usually written once, but can be read hundreds, thousands or millions of times clearly affecting the performance of the storage. As a result, Albireo is NOT in the read path — so there is no performance impact on read and no latency!
In the write process Albireo’s flexibility enables inline, parallel and post-process approaches (or any combination of the three) so the storage vendor can choose which approach they prefer. Let’s look at each of these:
* Of course in an inline approach, there is a very small write latency as Albireo does a hash key create and look-up. When we say very small, we’re talking a few microseconds. Can a user detect a few microsecond delay on write? I think not!
* In a parallel approach. there is NO performance impact on write at all because the vendor continues to write the data as they normally do. Albireo provides duplicate advice to the storage and the vendor can apply it when cycles permit.
* In a post-process deduplication approach, there is NO performance write impact because a write is done as the vendor normally would and a post-process deduplication can be initiated as cycles are available.’
Albireo is a new approach to data deduplication that is embedded into the storage stack using patented indexing, memory utilization and hash key technologies enabling scale out and performance that until Albireo has not been seen in other deduplication implementations. Previous iterations of dedupe, which have primarily been used for backup, had their idiosyncrasies such as performance, latency, scale-out and memory usage to name but a few. These do not apply in Albireo’s implementation because we have solved these issues as we developed Albireo.
In summary, using Albireo High Performance Data Optimization (deduplication) in primary storage does not introduce latency!