Permabit Albireo Data Optimization Software
Virtual Data Optimizer (VDO) for Linux
Rampant data growth is the IT industry’s single most important issue, affecting budgets, operating costs, floor space and of course capital expenditure through the amount of data created and its associated cost. According to IDC, the amount of electronic data created is expected to reach 35 zettabytes by 2020.¹ The conundrum facing every business is how to afford to store, analyze, manage and house this data.
With recent advancements in data optimization, IT organizations are exploring this technology as a more efficient means of housing information. In fact, another study by ESG shows that data efficiency is the number one priority of storage professionals in IT today. As a direct result of this increasing customer demand, storage manufacturers and online service providers are now offering data optimization capabilities. Of the available data optimization technologies, data deduplication is the one with the greatest potential to deliver substantial and recurring impact on the cost and manageability of data growth.
The challenge for many of these manufacturers and providers is to incorporate deduplication technology that can be leveraged across storage platforms while meeting market timing demands and not derailing other high priority R&D projects. Several have made investments in deduplication technology specifically to address problems associated with backup, only to find the resulting solutions were not suitable for primary storage workflows or were not extendable to new technologies.
Albireo Virtual Data Optimizer (Albireo VDO) provides the fastest route to market for deduplication on systems running the popular Linux operating system. Powered by Permabit’s Albireo Data Optimization Software, Albireo VDO is a complete, ready-to-run solution that offers block-layer deduplication services for both Linux-based storage OEMs and online service providers.
Unbridled information growth is forcing all organizations to re-think their storage strategies in light of flat IT budgets. As a result, the $25+ billion storage market is facing a sea change that promises to reset the competitive landscape. Technologies that substantially reduce overall storage costs are “must have” requirements. OEMs that provide the highest storage efficiency and most substantial top-line and bottom-line impacts are poised for the greatest success.
Low-End NAS Market — Big Storage is Coming to Town
The market for Linux-based OEM storage has grown 300% over a 3 year period. “Big Storage” providers recognize this growth as an opportunity and are moving down-market using data optimization technologies to deliver highly efficient storage at extremely low realized costs per GB. Gartner defines this as the Low-end Enterprise NAS market segment, where Linux-based storage dominates. Deduplication technology is allowing “Big Storage” to be increasingly competitive in this cost-sensitive market. The challenge facing today’s value-oriented, Linux-based storage OEM is: how can they continue to leverage open source to compete effectively with larger competitors who have data optimization capabilities?
Enterprise Flash — New Solutions, New Challenges
At the same time, on the high-end of the storage market, Enterprise Flash-based appliances have rapidly evolved to become the performance leader in IT. An array of Enterprise Flash typically accelerates overall application performance due to its I/O capabilities when compared to spinning disk. Many of these appliances also provide caching capabilities that automatically move inactive data to traditional hard drives. Mission-critical applications in the enterprise today such as database indexing, online transaction processing, desktop and server virtualization, front-end Web serving, and key infrastructure offerings (such as email and messaging) have been the targets of Enterprise Flash deployments.
Over the past few years, the cost of high-performance storage has dropped significantly relative to performance density. The main driver for this cost reduction has been the emergence of flash-based solid-state devices (SSDs) in enterprise storage appliance configurations. When deduplication technologies are applied to Enterprise Flash environments the effective costs become even more aligned with spinning disk storage In addition, deduplication optimizes flash write operations because less data is being written relative to the amount of data stored, providing incremental life cycles to flash and improving data safety. Flash vendors are beginning to offer deduplication in these environments today, and deduplication is the enabler that closes the cost and data safety gaps that have previously inhibited more widespread adoption.
Cloud Computing — A Growth Opportunity
As mentioned above, the amount of electronic data stored worldwide is on track to exceed 35 zettabytes by 2020, and as the data footprint expands, the process of storing and managing information becomes more complex. By 2015, nearly 20% of this information will be “touched” by (and as much as 10% maintained in) the cloud2. Cloud-based server and desktop Virtual Machine (VM) providers, in particular, stand to benefit from these growth trends. Gartner estimates that 5% of all VMs will be hosted by cloud providers by 20143. Infrastructure providers have emerged with solutions to help manage and protect this massive pool of desktops and servers.
Ready-to-Run Deduplication Software
Market dynamics justify storage manufacturer and service provider efforts to develop or integrate comprehensive, sub-file-level deduplication capabilities into their existing single-tier storage solutions, while providing a viable roadmap to tomorrow’s universal storage solutions. The overarching requirement is for data optimization to increase storage efficiency without incurring a performance penalty. All differentiating features of the storage platform must remain intact, with no compromises in functionality, data ingestion, or data access performance.
Key requirements involve the following areas:
- Performance — Data optimization must be extremely efficient and maintain a level of performance that does not impede overall storage performance on read and write
operations. Storage vendors have made billion dollar R&D investments to optimize their storage performance as a means of differentiating their offerings.
- Feature Set Compatibility — Data optimization software must operate in conjunction with existing storage software and not interfere with or impede existing features. Storage vendors have invested millions into storage features that are vital to the operation and market value of their respective storage solutions.
- Resource Efficiency — Cost is king, particularly in the Low-end NAS appliance space. Accordingly, data optimization software cannot increase resource requirements that then impact that cost.
¹ Extracting Value from Chaos, IDC, June 2011
² Extracting Value from Chaos, IDC, June 2011
³ Virtual Machines Will Slow in the Enterprise, Grow in the Cloud, Gartner, March 2011
Albireo VDO Data Optimization Software
Albireo VDO provides ready-to-run data deduplication capabilities for Linux-based storage, enabling OEMs to continue leveraging all of their storage solutions’ existing features, including existing Linux file systems, storage virtualization features, and data protection capabilities. Because Albireo VDO uses Permabit’s patented Albireo deduplication technology it is able to avoid costs associated with today’s high-end enterprise deduplication solutions that typically require large amounts of system memory and proprietary PCI Express cards to achieve even a fraction of Albireo’s scalability and performance.
Albireo’s high performance data deduplication provides a truly competitive feature set for mixed applications and use cases. Albireo VDO’s straightforward block-level, content-agnostic approach to data optimization provides an effortless solution that is both transparent and non-disruptive to enduser customers. With Albireo’s record-breaking performance, Linux-based storage OEMs can extend their deduplication capabilities and out-compete even the high-end proprietary storage players by providing data optimization capabilities for mission-critical application storage while effectively leveraging Linux open source to maximize value. Since Albireo VDO is implemented in terms of the Linux device mapper, it provides the perfect solution for Linux-based storage providers who wish to leverage their existing Linux integration investments, increase margins, and accelerate time-tomarket with leading-edge data optimization.
Albireo VDO Architecture
The Albireo index provides the foundation for the Albireo VDO solution. The single greatest challenge when implementing a deduplication system is in rapidly identifying duplicate information across a storage pool that can contain hundreds of billions of items. To achieve acceptable levels of performance the system must, for each new piece of data, quickly determine if that piece is identical to any previously stored piece of data. If a match is found, the storage system can then internally reference the existing item to avoid storing the same information a second time. The Albireo Index Engine can identify duplicates across large storage pools in memory more than 99.95% of the time, eliminating the largest deduplication bottleneck, disk-based fetches. Index lookup averages just 5 microseconds on flash or 10 microseconds on traditional hard drive-based storage – orders of magnitude faster than other deduplication solutions. This enables Albireo VDO to support sustainable ingestion rates of over 1 GB/sec with a single 6-core processor.
The Albireo VDO Linux kernel module is implemented in terms of the Linux device-mapper. In the Linux kernel, the device-mapper serves as a generic framework to map one block device onto another. It forms the foundation of LVM2 and EVMS, software RAIDs, dm-crypt disk encryption, and additional features such as file-system snapshots. Device-mapper works by processing data passed in from a virtual block device, in this case Albireo VDO, and then passing the resultant data on to another block device.
Albireo VDO provides deduplication and is exported as a normal block device that can be used directly as block storage or presented through one of the many available Linux file systems (such as ext3 and XFS). Before writing new blocks to the underlying disk device (or at scheduled intervals if post-processing is in effect), duplicates are found with the Albireo deduplication technology. Any unique blocks are then (optionally) compressed and written to disk. Data optimization occurs, increasing the overall capacity of the underlying device.
In addition to deduplication, Albireo VDO provides thin provisioning services for Linux. Thin provisioning allocates physical volume or file system capacity as applications write data, rather than pre-allocating all physical capacity at the time of provisioning. This allows space savings to be realized from the deduplication process, effectively making more virtual space accessible than is physically available.
Figure 1: Albireo VDO Architecture
Albireo VDO Deduplication Savings
Deduplication savings are highly dependent on the way that data is used (workflow) as well as the type of data being processed. Albireo has been tested on a wide range of popular data types including common office productivity files, Backups, and VMware system images (Table 1). Albireo achieved the best results with VMware images with a deduplication rate as high as 99%. Excellent results were also achieved with the Exchange data and office files. Albireo reduced Exchange data by 86% and office files by 33%. Across the board, Albireo deduplication delivers massive cost savings.
Table 1: Albireo VDO Deduplication Savings
Albireo VDO Resource Efficiency
Albireo VDO requires a single, dedicated Intel (or compatible) CPU core, 350 MB of memory and 52 GB of disk space to address deduplication requirements for a 1 TB storage partition. Efficiency is improved for larger configurations. For example, 32 GB of memory can be used to support a 256 TB storage partition (0.13 GB of RAM/TB of disk). Albireo VDO also requires 42 GB of physical storage for indexing along with 1 GB of physical storage per TB of logical storage for handling metadata.
*Assumes 10x logical storage Table 2: Albireo VDO Resource Efficiency
Only Permabit Albireo VDO enables OEMs to rapidly deliver high performance deduplication for Linux-based storage solutions. Albireo VDO is a plug-and-play OEM solution that flexibly integrates within the constraints of existing storage architectures and leverages existing significant R&D investments.
Permabit is an expert in the development of highly scalable, next-generation storage solutions that deploy full inline data deduplication. By offering the industry’s first embedded OEM data optimization solution, Permabit is enabling Linux-based storage OEMs to compete effectively with breakthrough technology. Emerging storage vendors can capitalize on this major market shift by introducing new storage solutions that take market share away from incumbents. Leading storage vendors can leverage Albireo to further solidify their market position.
The Permabit track record in storage expertise and innovation is without peer for a company of its age and size. Permabit has a total of 37 patents filed and 28 patents granted, all in the storage-related field. Its MIT-educated engineers have earned multiple awards for product innovation. Since 2000, Permabit has worked to develop the latest storage technology to address the challenges of highly scalable storage. With the release of Albireo, Permabit has made its core intellectual property for data optimization available for the first time as an OEM offering to other manufacturers and service providers. The Albireo architecture is a proven technology that has been implemented in production environments as a core technology in the Permabit Enterprise Archive and Cloud Storage.
Data centers are dealing with explosive data growth and flat budgets. As a result, IT organizations are making storage purchase decisions based on storage efficiency and total storage costs versus simply buying “cheap capacity.” Storage vendors and online service providers who will grow and flourish in today’s business environment must adapt their existing storage solutions and/or introduce new offerings that provide greater storage efficiency and reduced operating cost.
Albireo VDO delivers advanced data optimization technology that substantially reduces effective storage costs without sacrificing existing storage features. Once deployed, Albireo VDO provides unsurpassed performance and exceptional deduplication efficiency. In addition, the process of storage allocation is greatly simplified with the introduction of thin provisioning capabilities.
By delivering a virtual block device that “just works” out-of-the-box with existing file systems and data management features, Albireo VDO offers the fastest possible route to market for Linux-based storage OEMs, both manufacturers and online service providers. Whether the OEM is delivering NAS, SAN, or unified storage solutions based on traditional hard disks or flash-based storage, Albireo VDO provides the ideal ready-to-run data efficiency solution with leading capabilities for powerful competitive differentiation.
Permabit is a recognized leader in data efficiency technology. We enable OEMs to leverage their R&D investment, increase margin, accelerate time to market and achieve competitive advantage. Permabit Albireo software massively improves performance and efficiency of data creation, transmission and storage. Solutions built with Albireo are being delivered by leading hardware, software and service providers.
Find Out More
To learn more about the Permabit Albireo technology, or to license our products, visit our website at www.permabit.com or call us directly at 617.252.9600.