permabits and petabytes blog oem data optimization for next generation storage OEM Data Optimization Solutions

“Dedupe Everywhere” isn’t always dedupe “everywhere”

Dedupe is a hot topic and the differences between dedupe implementations was the topic of my previous post Now there continues to be differences in vendor offerings and marketing plays a role in defining who does what and how they describe it.  “Dedupe Everywhere” has been a phrase a few are beginning to use so I wanted to provide some clarity on the use case of dedupe everywhere and what it means.

Data is everywhere and the need to reduce the data footprint and storage consumption cannot be more prescient. As we have seen advances in deduplication technology that optimize resource utilization, improve dedupe efficiency, broaden dedupe scalability and enable deployment flexibility, there is increasingly broader functionality and the “out of the box” approaches for dedupe can now take hold.

Today’s high performance dedupe engine, such as Permabit’s Albireo can be employed anywhere, whether it’s: primary storage, tier 2, archive, replication, backup or even the cloud.  Meaning a version of the same dedupe engine can be in each storage tier and, as a result, become universally deployed across all storage. With an ingest rate that is IO optimized and can run circles around its (backup-only) predecessor, Albireo’s dedupe is now approaching nearly limitless scalability.  In addition, as unified storage evolves a dedupe engine can sit inside the unified storage array and apply to as many tiers as the vendor needs. This does two dramatic things: first it saves storage space at each layer of storage (primary, tier 2 etc.) and second it reduces processing loads because fewer and few cycles are needed to analyze the data at each tier and less and less storage is required because the deduplication occurred upstream, making the process increasingly more efficient.

But let’s not stop at storage.  Data is created higher up in the data stack, in applications, so why not apply deduplication there?  Up until now this was unheard of.  With Albireo’s ability to apply the deduplication engine in the application, data storage is optimized because duplicate checks are made at the application layer, in a source application manner, and as data moves, before a chunk of data is sent to the next tier, that tier is queried to determine if the duplicate exists at that location. This is done at very high speed and the process is efficient enough not to impede application processing.

These are completely new approaches to data deduplication.  If you can optimize the data at the point of creation, the efficiency benefits can be realized across the entire data stack. This saves costs, in CAPEX in storage purchases and OPEX in management and operating expenses throughout the data lifecycle because there is no need for rehydration and the negative performance impact it brings and no need for duplicate storage to be bought, managed and warehoused! In addition, this approach could be deployed in an operating system as well, adding to the overall data efficiency and financial impact!

On the other end of the data storage spectrum, cloud based storage deployments also benefit from dedupe everywhere. With a deduplication engine onboard, the uploading client queries the cloud storage to determine if data already exists. If data is found in the cloud it is not resent saving both communications bandwidth and storage space.  In addition, cloud implementations save storage costs and enable the enterprise to take advantage of the CAPEX and OPEX savings that are typically seen in external cloud storage.

Now, as I mentioned, there are other implementations of dedupe that are using the phrase “dedupe everywhere”.  As I see it, they are employing a source based approach for backup. This means that the dedupe engine is resident on the source and uses the backup application as the target. Simple enough and a nice way to shift the dedupe processing load from the backup application or appliance and spread it across several other servers and applications to lessen the performance impact. But is it everywhere? No not really! It’s really just everywhere…within backup!

Dedupe Everywhere starts at the applications and data bases and the operating systems level then to storage, usually primary first, and across the storage tiers to archive and even to the cloud.  So “Dedupe Everywhere,” as Albireo can be deployed, is a much broader application of dedupe and has a much more significant financial and operational impact.  “Dedupe Everywhere” isn’t always dedupe “everywhere!”

In my next post I’ll explore the Albireo enablers that deliver the most robust deduplication in the industry and can be deployed everywhere!

No Comments