permabits and petabytes blog oem data optimization for next generation storage OEM Data Optimization Solutions

NOT in the Read Path or ‘Performance Matters’

In Mike’s Blog a couple of weeks ago, ‘Get Out of My Way’, he accurately portrayed our Albireo offering as ‘not in the read path’.  In primary storage implementations, it’s critical to ensure the most efficient access to data. ‘Out of the read path’ has been one of our key messages that bear repeating because it’s so significant. Albireo is embedded in the storage vendors stack so we have been extremely careful not to impact their operations. As such, we have developed Albireo around three key principles: performance, simplicity of deployment and data safety. In this post, I’ll look at the most prevalent concern of our partners — performance.

The most significant concern raised as we discussed Albireo with our partners has been the potential impact to performance. In many ways it’s their stock and trade and they protect it as any business should ‘with their lives!’ The thought of embedding deduplication technology, even though there is a clear customer benefit in cost, space and power/cooling reduction, is orthogonal to their performance mantra particularly in primary storage.

As we designed and developed Albireo, we specifically addressed the most critical of all performance criteria – data read. Why is it significant? Because data is usually written once, but can be read hundreds, thousands or millions of times clearly affecting the performance of the storage.  As a result, Albireo is NOT in the read path — so there is no performance impact on read!

In the write process, we also were careful to provide our partners choices that enable them to determine whether to impact their performance or not.  Albireo’s flexibility enables inline, parallel and post-process approaches (or any combination of the three) so the storage vendor can choose which approach they prefer. Let’s look at each of these:

* Of course in an inline approach. there is a very small write latency as Albireo does a hash key create and lookup. When we say very small, we’re talking mere microseconds across the entire data stream. Can a user detect a microsecond delay?  I think not!

* In a parallel approach. there is NO performance impact on write at all because the vendor continues to write the data as they normally do.  Albireo provides duplicate advice to the storage and the vendor can apply it when cycles permit.

* In a post-process deduplication approach. there is NO performance write impact because a write is done as the vendor normally would and a post-process deduplication can be initiated as cycles are available.

In summary, Albireo does NOT impact performance on read! There is also NO impact to write performance in parallel or post-process modes.  So, the vendor concerns about performance are addressed!

In my next post I’ll address implementation simplicity.

No Comments