High Performance Index Engine
Albireo uses a hash function to identify duplicate segments of data. Searching the resulting content fingerprints, which can quickly number in the billions (or trillions), is the bottleneck for all data deduplication solutions. As the number of fingerprints increase the size of the associated hash table data structure increases, very quickly exceeding available memory and spilling over to disk, which results in large processing delays. This bottleneck is the Achilles heel for all deduplication systems and it limits both scalability and performance.
The Albireo index uniquely addresses this challenge with patented delta indexing, sparse indexing, and least recently used (LRU) discard technologies. Delta indexing is a hybrid memory and disk resource technology that requires just 0.1 bytes of RAM per hash fingerprint. Assuming a 64 KB chunk size, the Albireo index, with 1 GB of RAM, can deduplicate 640 TB of unique data. Deduplication capacity is further increased by utilizing sparse indexing and LRU algorithms that optimize memory based on time. As a result, the Albireo index can deduplicate an amazing 20 PB of data with just 16 GB of RAM Table 1.
Table 1. Albireo index dedupe capacity based on chunk size and available RAM
| 1GB RAM | 4GB RAM | 8GB RAM | 16GB RAM | |
|---|---|---|---|---|
| 4 KB Chunk | 40 TB | 160 TB | 320 TB | 640 TB |
| 64 KB Chunk | 640 TB | 2.6 PB | 5.1 PB | 10.2 PB |
| 128 KB Chunk | 1280 TB | 5.1 PB | 10.2 PB | 20.5 PB |
The breakthroughs in the Albireo index result in an unparalleled combination of performance with extreme scalability. Even for the largest petabyte-scale storage environments, the Albireo Index provides data deduplication with no performance penalty to the storage system.
Data Safety
Albireo addresses the data safety issues traditionally associated with data optimization solutions by never altering the data. Unlike a point solution or appliance, Albireo does not modify data or impact read/write processes because it acts as an advisory service to the storage stack. Albireo analyzes the data, determines if it is a duplicate, then advises the storage system whether it is or is not a duplicate. If it is not a duplicate, the storage system does nothing to the data. If it is a duplicate, the storage system uses a pointer to the original data and removes the duplicate releasing storage space and reducing associated costs. This results in less storage consumed, reduced power/cooling requirements, smaller data center footprint and fewer storage purchases to meet anticipated data growth. These impacts drop CAPEX and OPEX providing budget relief.
With Albireo, data is written as it always would be by the storage system in its native form. It is never changed or modified nor does it need an interpreter to reconstitute it. Albireo relies on the existing features of the OEM filesystem or block layer. There is no third party dependency. Data can still be read regardless of whether the data optimization functionality is disabled, removed, or fails. Data remains safe and protected by the OEM storage system. Only Albireo can make these important data safety claims – an industry exclusive.
Summary
Comparing figures two and three, it is clear that Albireo does not function in the data read path. Integration with Albireo reduces storage capacity without endangering data safety and without impacting read performance.
Performance
There are three aspects of performance that should be considered when looking to implement storage deduplication: deduplication throughput, deduplication latency, and deduplication savings. Albireo achieves unparalleled performance in each of these key metrics.
Deduplication Throughput
Albireo deduplication throughput was measured in a test environment at 11 GB/sec on a single-core processor when operating with a 64 KB chunk size and hardware-based hashing. While hardware hashing provides the best performance, SHA-256 hash performance in software can still achieve speeds beyond 700 MB/sec on a single modern multi-core processor, even when used with more granular 4 KB chunks. Performance is sustainable as data is added to the system and scales linearly with additional cores to gigabytes per second. In a 16-node environment Albireo performance, with grid (scale-out) technology, was measured at 400 GB/sec, with 128 KB chunk size and hardware-based hashing. Albireo deduplication throughput is over five hundred times greater than other deduplication technologies, which can only achieve rates in the 50-100 MB/sec.
TABLE 2: Albireo deduplication throughput performance with hardware hashing
| Chunk Size | Single Processor, 1 core | 16-NODE GRID |
|---|---|---|
| 4 KB | 700 MB/sec | 13 GB/sec |
| 64 KB | 11 GB/sec | 200 GB/sec |
| 128 KB | 22 GB/sec | 400 GB/sec |
Deduplication Latency
The Albireo index addresses perhaps the most challenging aspect of data optimization — the amount of time it takes the deduplication engine to identify if an entry is a duplicate or not. Because of its highly efficient memory use, the Albireo Index is able to identify duplicates in memory more than 99.95% of the time, eliminating the largest bottleneck in deduplication solutions — having to read from disk. Index lookups average less than 10 microseconds — orders of magnitude faster than other deduplication solutions. This enables sustainable ingestion rates of 11 GB/sec for a single-core processor (64 KB chunk size and hardware-based hashing) and scales linearly across multiple cluster nodes with Albireo grid technology.
Deduplication Savings
The final key area of performance is the deduplication savings ratio. Deduplication ratios are highly dependent on the data being processed. Albireo has been tested on a wide range of popular data types including common office productivity files, Microsoft Exchange data and VMware system images Table 3. Albireo achieved the best results with VMware images with a deduplication rate as high as 97%. Excellent results were also achieved with the Exchange data and office files. Albireo reduced Exchange data by 86% and office files by 33%. Across the board, Albireo deduplication delivers massive cost savings.
Table 3: Permabit Optimization Results
| Sample Data | Dedupe Rate 4 KB Chunks |
Dedupe Rate 64 KB Chunks |
|---|---|---|
| User Directories, Fixed Chunk | 2.8 : 1 | 2.7 : 1 |
| User Directories, Variable Chunk | 3.9 : 1 | 3.8 : 1 |
| Tar Backups, Fixed Chunk with LZ77 compression | 25.1 : 1 | 14 : 1 |
| VMware Images, Fixed Chunk | 36.3 : 1 | 26.4 : 1 |
Process
The three integration options for Albireo are shown below. In each, it is important to note that Albireo always operates outside of the data read path (shown in yellow) and does not alter data written to disk. This avoids any reassembly performance penalty and protects against data corruption and lock-in. The OEM always controls the data write path, enabling them to manage data integrity and maximize performance.
Inline Processing
Albireo can operate 100% inline and process incoming segments in real-time. Inline processing is a good fit for applications that benefit from immediate data optimization and for which a slight latency can be masked by parallelism and write caching. Performing inline, Albireo intercepts the write path (shown in red) for only microseconds to determine if data is a duplicate, and then passes back advice to the storage software. Remember, Albireo never operates in the read path (shown in yellow) so data access is never impaired.
Post-Process

Figure 2. A post processing integration is a good match when application performance is an absolute must.
Albireo can also operate as a post-processing function, if desired by the OEM. Performing post-process, the storage stack reads data back from disk either on demand or on a fixed schedule, with Albireo identifying duplicates and providing update information to the storage application.
The Albireo mode of integration — inline, post-process, or parallel — is selected by the storage vendor. Having flexible integration options allows Albireo to perform optimally across all standard storage infrastructures, workloads, and content types.
Parallel
Parallel processing has the advantage of no latency in the write data flow. Duplicates are identified and updates for deduplication are managed in parallel to the write operation. Performing in parallel, Albireo receives a copy of the data as it is written to disk, so there is no delay in write. If a duplicate chunk is identified, the update (shown in blue) is asynchronously pushed to the storage software, where it can frequently be applied while data is still in the write cache. Parallel processing delivers no latency in write and real-time data optimization, which is perfect for a wide range of applications.




