The Case for VMware and Deduplication
We’ve all watched over the past several years as VMware virtualization has revolutionized IT, instigating the most significant consolidation of computer hardware assets in history. What you may not know is that VMware deployments offer one of the most attractive use cases for data optimization in a primary storage environment. A great example of this comes from our friends over at NetApp®. Shortly after releasing their deduplication feature, NetApp’s customers began to report tremendous success with deduplicating VMware virtual machines (VMs). Their users were seeing storage savings of 50% or more with little performance impact and some were seeing savings approaching 90%. Let’s take a look at how primary storage deduplication works in VMware environments with Permabit’s Albireo™ OEM High Performance Data Optimization software.
Deduplication systems work by identifying and eliminating duplicate chunks of data. With Albireo, deduplication gets implemented by the OEM at the shared-storage, logical volume or filesystem level – the larger the shared pool of data on the back-end, the greater the potential for savings from redundant data. Albireo’s flexible deployment options enable deduplication to occur inline, parallel or as a post-process event to address varying performance requirements. We’re working with some OEMs who are implementing all three options on the same systems. Regardless of where or when deduplication is implemented, the process is the same; duplicate chunks are eliminated and data pointers are modified to share a single data chunk on disk. Albireo’s job is to identify those duplicates as quickly as possible, while utilizing as few system resources as possible!
To help simplify the management of consolidated environments, VMware server virtualization allows administrators to create ‘template operating environments’, each with a standardized operating system and application environment. These templates are then ‘cloned’ into separate VMware images and installed as ‘guests’ on a physical server. The result is savings from simplified configuration management. Today, with modern processors from AMD and Intel, I’m seeing VMware users running modern server environments with 12 VM guests per physical server. I have heard of desktop virtualization environments with over 60 ‘guests’ per server. While VMware substantially reduced server costs, it has done nothing to help storage administrators who are left with the same number of operating system images on what is now a complex mixture of shared-storage resources.
It’s important to realize that while VMware provides a valuable cost benefit from consolidating IT servers, it offers no means for consolidating the storage used by VMware ‘clones‘. That’s where deduplication comes in. You see it turns out that each of these ‘cloned’ VMware images starts off by taking up as much space as the template from which it was created. The set of data chunks making up a ‘template‘ and each of its ‘clones’ are nearly identical. When Albireo see’s these duplicate ‘chunks’ it identifies them and allows the storage system to replace them with pointers to the original data. Its easy to see why in a 12 VM guest environment, there’s nearly a 12:1 savings even after accounting for personalization of each guest. For a 60 VM Desktop environment, there’s a potential for almost 60:1 savings!
Next week I’ll talk a little about how Albireo can actually help VMware administrators better meet their storage performance requirements.