Deduplication Index

The Challenge

The single greatest challenge when developing a data deduplication solution is the need to rapidly identify duplicate items across enormous data sets without sacrificing performance, scale or resource efficiency. While many solutions exist, most only partially address this challenge. They either consume too many system resources to be economically viable, or they forsake performance, efficiency and/or scalability in order to squeeze into a tight resource footprint.

Our Solution

Permabit’s Deduplication Index technology offers the world’s only licensable index technology for duplicate identification.  The index is used across Permabit’s Albireo product line to address the diverse requirements of Storage OEMs, ODMs, Cloud Service Providers, and Software Defined Storage vendors and is deployed in thousands of customer environments, worldwide. Our deduplication index is unique in being able to address the challenge while delivering, high performance, extreme scalability and massive resource efficiency.

Permabit Advantage

How It Works

Delta-Master Index – The in-memory portion of the index is designed to deliver high performance with minimal memory footprint. Because this portion of the index operates in RAM, it introduces an extremely low 5 microsecond latency. This index is designed to keep track of records which are associated with each chunk of data. By storing data in a unique compressed form, the index is able to address over 20X as many records in RAM.

Albireo Volume – The on disk representation of the index is designed to minimize I/O overhead by utilizing a log structured database format which it keeps organized for locality. The Albireo Volume format is designed to ensure to minimize seek overhead for the <2% of data inquiries that cannot be satisfied by the Delta-Master Index.

LRU Deduplication Window – A key customer requirement is that the Delta-Master Index must operate within the constraints of a fixed memory budget and that the Albireo Volume will never run out of space. To ensure neither case occurs, the index automatically discards least recently used items. This has added benefit in that there is no need for developers to free items from the index upon delete.

Sparse Indexing – A sparse index exploits locality to keep the most relevant non-recently accessed index entries in RAM. The implementation utilized with the Delta-Master Index provides 10X disk coverage in the same memory. This technology is designed expressly for primary storage workloads by supporting the rapid update and deletion of existing data while supporting LRU, discard, and grid scale-out capabilities.