permabits and petabytes blog oem data optimization for next generation storage OEM Data Optimization Solutions

Albireo Flexibility

As we talked to potential OEMs for our Albireo Data Optimization offering several key requirements kept emerging. One was ‘don’t impact performance’, followed by ‘don’t impact my existing feature set (because I’ve invested $$$ in it)’ and finally ‘please make your deduplication easy and flexible to deploy.’ As we developed Albireo, we kept these in mind and have succeeded in addressing each!  

‘Don’t impact performance’ I addressed in my blog last week: ‘Not in The Read Path’. Simply said, we don’t impact performance and provide a simple and elegant method of implementing deduplication. As a result of our approach, we developed a dedupe advisory approach that also enables the OEM to utilize all of their existing features as well!

When it comes to flexibility, we really took a step forward. We knew the concerns about deduplication technology revolved around 3 key areas:

* Hash key development/indexing and performance impact

* Resource utilization

* Deployment flexibility.

Hash keys are the lifeblood of deduplication in that they are the unique identifier of a data chunk. Once the key is developed for a chunk of data the next question is ‘have I seen this data before?’ This is where it gets difficult because doing this look up is what causes most dedupe systems to come to a grinding halt. Big indexes and large keys (yes we use SHA 256 so the key is large) create performance slow downs and in some lesser systems, scalability limits.  We skinned that cat with patented indexing that can return an index lookup in a few micro seconds! Hardly impact the performance at all! And scale out is also managed by patented memory based indexing that is 99.5% memory resident! No slow down there either!  Hardly ever see a disk fetch!   Once we do this we return advice to the OEM storage stack that the data chunk is a duplicate. The OEM then creates a pointer for the duplicate.

Resource utilization or how much of the processing power is consumed is a significant issue for storage vendors.  For example, newer systems have fast quad core processors (sometimes 2!)  while older or less high performance systems may have dual core processors.  In any case a flexible dedupe engine should enable the OEM to define the number of processors they used for Albireo depending on their overall system performance criteria. We designed that flexibility into Albireo to ensure our deduplication did not consume resources that would impact our partners performance.

Flexibility as you can see from the resource utilization comments is already there but we took an additional step by enabling our OEM partners to deploy Albireo in inline, post-process or parallel (hybrid) methods.  For example, if the partner performance tests enable an inline deduplication within the OEMs performance characteristics then they can implement Albireo inline!   If they want to balance their performance and deduplication efficiency then parallel mode may be the best for that partner. And finally, if high performance storage were the dominant criteria, then post-process mode for Albireo would work for that partner! We also give the option for the OEM to leverage all 3 so depending on use cases, they can implement the appropriate method (all within one system). Their choice!

All this is done with an SDK that can be integrated with 6-8 API calls. Some of our partners have integrated in as little as a few days. Albireo is an OEM partners tool that they can deploy based on their performance, resource and efficiency design criteria.  We made it that way!

In my next posting I’ll discuss data protection.

No Comments