In my last post, I mentioned the results of the Permabit Labs six week study of dedupe’s effectiveness with Oracle RMAN backups. In this post I want to talk about the methodology used in this study in a bit more detail.
The most important thing we learned in performing this study is that capacity savings are extremely dependent on how RMAN is used. To maximize deduplication efficiency when using RMAN, there are three things to keep in mind:
This setting suppresses Oracle multiplexing of more than one data file into a backup set. When you suppress Oracle multiplexing, Oracle creates the backup set identically each time the backup runs, maximizing the dedupe rate.
Both compression and encryption must be disabled in RMAN to see effective deduplication. However, compression should be left enabled when backing up transaction logs and control files since they don’t see good deduplication rates anyway (see below).
3. Archive Logs and Control Files
By default, RMAN interleaves archive (transaction) logs and control files in with database content. This defeats deduplication because it changes the alignment of data. When used with deduplication, these additional files should be written to separate streams.
With these settings in place, our team performed two tests of dedupe efficiency with RMAN.
First, we performed incremental backups. We created an Oracle database and populated it with 1 GB of randomly-generated data. This was followed by 6 incremental backups (simulating a full week of backups), each after an additional 100 MB of data was added to the database. Finally, a second full backup was performed. The results were scanned using the albscan utility, the dedupe assessment tool we include with our Albireo SDK. For this example, we saw up to 91% capacity savings in RMAN environments.
As a second test, we again populated an Oracle databse with 1 GB of randomly-generated data. This time, seven full backups of the database were performed (simulating seven weeks of full backups), again each with an additional 100 MB of data. For this second case, we saw a 94% capacity savings.
To put it another way, Permabit Labs saw a 10x increase in storage efficiency in just the first week of RMAN backups with deduplication. With retention of 7 full backups we saw a 16x increase. That’s some substantial short-term ROI!
With a long term data protection strategy it’s easy to project savings as high as 35x. For more information on data protection and Permabit’s Albireo High Performance Data Optimization products check out http://permabit.com/oem/products-with-albireo/data-protection/
It’s been a while since I last wrote about Deduplication and the Enterprise Database. My previous posts focused on databases in the primary storage environment, but, of course, backup was really the first use case to reap the benefits of deduplication.
For this discussion, we will focus on Oracle database environments (but keep in mind that this applies to other Database environments as well). Oracle backups are performed using the RMAN utility. When backing up Oracle databases, DBAs use a rotating schedule of full backups, which backup the entire database, and incremental backups, which backup only those changes made since the previous operation. Incremental backups are more efficient to store and transmit, but DBAs cannot use them forever because they complicate the restoration process. Each incremental backup since the last full must be replayed back into the database increasing the time it takes to restore. For each full backup performed, an entire copy of the database gets written to a backup device , typically consuming as much (or more) additional disk space than the previous full backup. Backups data is typically groomed over time with older copies being deleted on a periodic basis depending on company policy. Many of those company policies are driven a combination of factors that include both data retention objectives and storage costs.
If that same data is sent offsite, over the network (for disaster preparedness), it consumes not only more disk space at the remote site, but also significant bandwidth. In this case, if that full backup is large, the time it takes to replicate it offsite can extend beyond a company’s backup window, increasing exposure to data loss in the event of catastrophic failure.
Permabit labs just recently completed a 6 week study on the effectiveness of the data deduplication technology provided by our Albireo data optimization product offerings when used with Oracle RMAN backups. In these tests, we were able to demonstrate data reduction rates of between 83% and 94%. This study proves that by taking advantage of Albireo technology, our OEM partners can help their customers to save significant costs on storage while meeting their disaster recovery objectives.
In follow on posts, I’ll discuss the methodologies that were used in performing these tests as well as the implications of applying Albireo deduplication technology in the database environment.
VMware announced the acquisition of storage hypervisor software maker Virsto, this week. This is a really profound move by VMware that will have widespread impact on storage efficiency. Don’t confuse Virsto with caching companies, they have implemented VM-centric thin provisioning and storage virtualization on the client. It can be used with any vendor’s block storage!
What does this mean for VMFS? Not sure, but it is clear that with Virsto thin provisioning, VMware has gained a great platform to advance software defined storage in VM environments.
The promise of cloud storage is that it delivers users seamless “just in time” storage scalability to handle growth and enables them to quickly respond to peak loads delivering business agility. With data growth consistently in the 50% per year range enterprises are seriously evaluating the use of cloud storage and are taking the cloud storage step with increasing frequency. The business impact of cloud storage is a compelling financial proposition in today’s budget constraint IT environments. To the IT consumer shifting what was a fixed cost capital expense to a variable cost operating expense is financially compelling further assisting in justifying an already strong business agility value proposition for adopting cloud storage.
Data optimization is the most significant recent software innovation in the storage industry, enabling organizations to save more information in a smaller physical footprint. Data optimization incorporates deduplication, compression and thin provisioning technologies that maximize storage efficiency. Nowhere does the fundamental concept of storing less data dramatically change economics more than for providers of cloud storage infrastructure.
Data optimization’s impact on storage consumption in the cloud, or in the data center, yields operating efficiency and critical business advantages;
- Reduced capital expense – In any storage environment media is a substantial expense. Disk drives and now SSD’s are a significant cost because they need to be available in anticipation of demand. Reducing cost to store data will have a direct impact on the IT expense budget bottom line. Data optimization with a 5X data reduction applied here will drive down capital storage costs by 80% or more!
- Data center operating efficiency – With a 5X increase in storage efficiency the requirements for the cloud or data center proportionately decrease. The same data is stored on data optimized storage which is now 80% less and is delivered with 20% of existing floor space, power and cooling costs.
- Additional benefits are realized through data optimization in network bandwidth consumption, manpower requirements, operational systems needed to support the infrastructure and overall management of the cloud service provider or data center.
Data optimization is one of those technologies that can make market leaders by clearly differentiating what they offer versus their competitors. In the case of data optimization for the cloud it delivers both economic and technology leadership! This makes data optimization in the cloud a market disruptive technology that will drive market share growth and enable rapid business growth.
In the case of cloud storage, the first storage systems vendor to bring cloud storage products to market with integrated data optimization technology will gain a huge advantage and will become deeply entrenched into the cloud provider infrastructure because they reduce storage costs and increase overall operational efficiency. What a great combination!
I read a great piece by William Blair’s, equity analyst, Jason Ader yesterday, in which he called into question EMC’s 2013 growth prospects. He cited the following causes:
- Structural Changes – flash arrays are eating VMAX for lunch. We are seeing this first hand at Permabit. We are helping a broad range of independent flash array makers with data efficiency and yes, they are replacing VMAX – and because $1 of flash replaces something like $10 of VMAX the impact is massive
- Impact of Storage Efficiency Adoption – “increased adoption of storage efficiency technologies such as deduplication, compression, and thin provisioning…resulted in 10% to 15% capacity expenditure savings and delays in customer product refreshes. We (William Blair) believe that the adoption of these technologies will only accelerate, making the prospects of acceleration in storage hardware expenditure less likely.”
Jason is right on. Not only is flash eating VMAX alive, but low-cost NAS companies (soon with dedupe) are also taking chunks out of VNX/VNXe sales. In addition storage efficient cloud companies are hitting Isilon and Atmos right in the nose with superior value.
EMC may have finally met a foe they cannot beat – themselves. By being late to the storage efficiency table, they have enabled dozens of companies with disruptive technologies to rake in many hundreds of millions in investment, achieve billions in market cap and to create data efficient products that are beginning to eat EMC offerings for lunch.
$10:1 efficiency wins every time!
Backup is a critical linchpin in any business data protection strategy. Unfortunately, business dynamics of today aren’t well aligned with the IT objectives of protecting business critical data. Data growth rates of 30-50% annually continue to add to backup complexity. In addition, businesses are increasingly more 24/7 than ever before and as a result are minimizing the backup window even more! The conundrum of more data and shorter backup windows is driving the need for a better backup solution.
A recent study by IDG (5/12) cited some interesting results. IDG found 27% of respondents were running 1 to more than 4 hours over their allotted backup window with only 56% able to backup within the window. Gartner’s Dave Russell indicated that “Backup while very important, has become very brittle. One of the reasons why we have to modernize the backup infrastructure is that it’s been woefully under –invested in for a long period of time, and it’s being tasked with doing substantially more.”
Backup solutions have employed data deduplication engines for many years to save storage space in the backup data store. Some backup deployments have deduped in a post process mode to enable backup window attainment. However, the associated cost of the data cache storage requirements prior to deduplication needed to be factored into the overall TCO. Although the cost of storage continues to decrease, data growth has expanded faster and as a result more and more storage is being consumed. Additionally, the backup windows mentioned above are more and more critical and any performance hit that deduplication may impose further impacts the ability of the backup solution to complete within the allocated window.
Since the initial use of deduplication in backup solutions occurred, the baseline dedupe technology has evolved significantly as breakthroughs have occurred and newer platforms enabled more capabilities to emerge. Today it is possible to run deduplication inline without latency or performance impact which eliminates the post process data cache and its cost. With indexing techniques that enable much more RAM efficiency and I/O rates that can run at flash memory speeds, the dedupe engine of today is orders of magnitude faster than predecessors used in initial backup solutions. Today our Albireo implementations can run at nearly 4 GB/s I/O rates and can scale to multiple petabytes while delivering resource efficiency of 0.0024% for memory and storage at a 4K data chunk size. Each of these superlatives by themselves would deliver significant advances to any backup solution as well as competitive differentiation. However, Albireo delivers all three of them together and the result when applied to an existing backup offering will be a game changer which will enable the OEM to gain competitive market share and revenues. Backup windows will be met readily, backups will scale to meet and exceed data growth and the resources needs are probably already available on the appliance or server that the backup solution runs on today enabling field upgrades rather than forklift upgrades.
So isn’t it time for backup to have a tune-up?
We made a few public announcements today, the first being with one of our newest OEM partners Cibecs. Cibecs is one of our first publicly announced partners in the data protection market segment. They and others will soon be in the DP market with dedupe that performs at scale and doesn’t require a platform on steroids. Thanks for choosing us, Cibecs!
Traction for Albireo continues to accelerate in enterprise storage, enterprise flash, SMB storage and data protection environments. Today, driven by customer demand, we announced Albireo data deduplication for Windows which enables our OEM customers to address a market that will exceed $8 billion by 2015, according to IDC.
So, while deduplication is becoming a “must have” in many markets, all deduplication isn’t the same. Albireo continues to lap the field in terms of I/O performance, PB scale and memory/processor efficiency.
If you build storage and are looking to add dedupe to your products, give us a ring. We’ll get you to market with the best dedupe available in your next product refresh.
This week we welcomed Big Data storage company, SeaChange to our Albireo partner program. SeaChange adds deduplication SeaChange is a leader in media-centric cloud storage and will embed our Albireo Data Optimization Software into the SeaChange Universal MediaLibrary (UML), the only media storage that supports both real time play-to-air and high performance production.
This announcement turned heads in the storage industry as a commonly held belief is that video content and Big Data in general, do not dedupe very well. However, what many don’t realize, video editing workflows create multiple copies of footage. Just like other areas of data storage like databases, virtual server and VDI environments, analytics and such, multiple copies of data is a necessary part of work flow in a video editing environment. Albireo data deduplication is an ideal technology to reduce storage footprint and improve operational efficiency 5 – 10x for Big Data environments.
SeaChange is the first Big Data vendor to announce embedded deduplication technology in their media storage solution. Their tests have shown 5x capacity reduction with Albireo deduplication technology. That is enormous savings for a Big Data storage company! Congratulations to SeaChange! We look forward to working with you to create a sea change in storage efficiency and massive storage savings for your media customers.
Recently, I had the opportunity to sit down with a VP of Research who specializes in storage deduplication for a leading global technology research firm along with Tom Cook, Permabit CEO. We talked for a while about Permabit’s data optimization technologies and all the success Permabit is seeing with OEMs in the Flash, NAS and SAN enterprise storage markets. Then we briefed the analyst on the developments in our ready-to-run Albireo VDO solution for Linux-based storage OEMs. It was at this point that the conversation turned in a direction I hadn’t expected – I was surprised to hear the analyst suggest we should take Albireo to… Enterprise Backup Vendors?
“Don’t they already have dedupe,” asked Tom Cook?
“Well yes,” replied the analyst, “but every one of them, even the largest and most established players, has issues with performance, resource utilization and/or scalability when compared to the leading data protection appliance companies. From what you’ve said about your technology, I can see a fit in any of these product offerings. You’d improve any product out there in at least one key metric by an order of magnitude.”
An order of magnitude? That’s interesting, I thought. This analyst talks with IT organizations AND vendors on a daily basis and we’ve known each other for some time. I’ve come to trust his advice over the years. But it’s frankly difficult to check out the performance, resource utilization and scalability limitations of enterprise backup vendors without kicking off a massive research project. If you’re in that space, or even if you’re in another area where you think you’ve got dedupe covered, maybe you should be asking these types of questions?
- How fast is my solution with/without dedupe? If your ingest rates drop substantially due to dedupe – Permabit Albireo can help!
- How memory intensive is dedupe in your solution today? If you’re consuming more than 256 MB of memory to dedupe 10 TBs of unique data – Permabit Albireo can help!
- How scalable is my solution – can I dedupe across multi-system pools of storage or is my solution limited to addressing specific media silos? Can I scale beyond a single server? 2 servers? 16 servers? 128? If you can’t utilize commodity servers to scale to PBs of storage – Permabit Albireo can help!
To learn more about the capabilities of Permabit Albireo versus other deduplication technologies, check out this series of articles by Wayne Salpietro, Permabit Director of PMM and be sure to visit our newly redesigned website at www.permabit.com.
Keep a watch out for disruptive data optimized storage products in 2012.
Solutions built with Albireo and delivered by leading hardware, software and service providers are on the way. From SOHO/SMB appliances, to flash-based, high performance enterprise storage and cloud, Permabit Albireo software massively improves performance and efficiency of data creation, transmission and storage.
Explore the examples of Albireo deployments on our site, see how you can license our optimization products and let us know how we can help you to optimize your company’s products.