In my previous post, I highlighted the massive savings that can be realized by leveraging deduplication technology with virtual server environments. In this post, I’ll take it one step further and discuss virtual desktops.
While virtual servers are approaching deployment numbers in the hundreds of servers in larger enterprises, deployments of virtual desktops are in the thousands and in some cases tens of thousands of images being deployed! So, same scenario here – redundant operating systems, applications, and our favorite friend – service packs! The savings can be even greater here, compared to virtual servers, when leveraging deduplication technology on primary storage that is hosting these images.
The biggest concern with virtual desktop deployments is what is called the ‘boot storm’ problem. Basically, everyone arrives at work at 9am and boots their virtual desktop at the same time. You also have the same problem at the end of the workday with everyone logging off (and flushing their cache) at the same time. Can a storage infrastructure that has built-in deduplication handle that challenge? The answer is…it depends.
First off, the storage will need to scale. When you’re looking to deploy these centralized virtual desktop (or server) environments, you want to have a storage system that can scale to the hundreds of terabytes, if not petabytes. This is especially true if you’re a cloud storage service provider (note the trend highlighted in the previous post)! Once you go down the path of virtual environments, the request for new servers/desktops will increase dramatically. After all, you’re saving the company a ton of money by not buying all those new physical servers and desktops, right? So, if you wanted to deploy storage with deduplication, you don’t want to be hindered by tiny volume sizes of 16TB or so. You need scale!
Next, you’ll need to determine if deduplication will impact performance. Should it be an inline or post-process function (or some other implementation such as our parallel process implementation)? What really drives dedupe performance rates is how quickly a digital fingerprint can be looked up in an index to determine if it’s a duplicate or not. If a system can do this in the scale of microseconds (like Albireo), you could easily deploy dedupe as an inline solution without users ever noticing a difference in performance. If that’s not good enough, the parallel or post-process implementations are still options. It all depends on what storage you implement and what type of throughput is needed.
The next question is how fast can data be read? Wouldn’t it be nice if the data can be read off of disk without having to go back through the dedupe engine? Direct access to data that has already been deduped? Too good to be true? Not so! That is exactly what Albireo has been designed to do (based on unanimous demand from our storage partners). The storage vendor wants to (and should!) own access to data. If you ever want to disable/remove the dedupe engine, you always want access to your data! If the dedupe engine is needed to read data back, then if that engine were removed, you’d be effectively left with a bunch of encrypted/inaccessible data. That is not acceptable! Do you want to be the one to tell thousands of users (or customers if you’re a cloud provider) they can’t access their virtual desktops or servers because you can’t read the data from disk? Not me!
Whether you’re a customer looking to a cloud provider in the future to house your virtual environments or you’re an enterprise looking to deploy additional storage internally to support this, make sure you’re asking your vendor whether dedupe technology is included or not. If it’s not, see if they’ll give you a 100:1 discount on their pricing instead!