Content-Aware Segmentation


Albireo data optimization technology delivers deduplication at the sub-file level. Facilities are provided for both fixed-block and variable-block data deduplication.  Variable-block deduplication begins when data is intelligently segmented into chunks of variable length (based on an analysis of its content).  This additional step can provide substantial savings over and above more traditional fixed-block deduplication schemes.  Fixed-block deduplication cannot reclaim space when duplicate chunks of data are not aligned on block boundaries.  For example, if container objects (such as ZIP archives) that share some files in common are broken into fixed-size blocks, the chunks will deduplicate only if embedded files are stored with identical alignment within each container, which is unlikely.  To handle deduplication of chunks with arbitrary alignment, Albireo supports content aware deduplication APIs.  These APIs perform intelligent variable segmentation based on the type of stream or file being processed.

Albireo’s content-aware segmentation breaks large container objects into variable-sized chunks as new data is pushed to Albireo.  Content-aware scanners analyze the data stream to identify chunk boundaries to provide optimal deduplication.  This is a benefit where duplicate data occurs in container objects having different block alignments (as when the same files appear within two Zip archives).  Content-aware segmentation ensures that the embedded files are located and deduplicated even if the files appear at different offsets within two container objects.

Albireo provides a “plug-in architecture” for content-aware, variable-length segmentation along with several scanner modules for popular formats.  Albireo uses content “scanners” to identify and optimize deduplication of objects within specific compound data formats (e.g. Microsoft Office documents, ZIP, PDF, tar).  Data is analyzed by the scanners in real-time delivering rapid identification and deliver the data in optimal formats to enable efficient data optimization. The scanners loaded in Albireo, mentioned above, provide a set of scanners most often found in business data stores.  An API is available for OEMs to create and implement their own application-specific scanners for further customization and savings.    

Media Center

More Media →
About Permabitmore
Read More →

Permabit is a recognized leader in data efficiency technology. We enable OEMs to leverage their R&D investment, increase margin, accelerate time to market and achieve competitive advantage. Permabit Albireo software massively improves performance and efficiency of data creation, transmission and storage. Solutions built with Albireo are being delivered by leading hardware, software and service providers.

Albireo Read More →

Permabit Albireo is the industry’s first purpose-built OEM data deduplication software designed to meet the needs of hardware, software, and service providers who wish to expand their existing solutions without negatively impacting differentiating capabilities or reducing performance. Albireo delivers deduplication at the sub-file level and can be flexibly integrated into existing or next-generation storage and platform architectures. Albireo deduplication is seamlessly deployed in primary, archive, and backup storage across the data center and the cloud. With Albireo, OEMs leverage their R&D investments while accelerating time to market for must-have, industry leading data optimization capabilities.

Twitter

More →

Twitter: permabit