Deduplication; How It Works - HP StorageWorks 12000 User Manual

Hp storageworks 12000 gateway virtual library system user guide (ah814-96015, september 2010)
Hide thumbs Also See for StorageWorks 12000:
Table of Contents

Advertisement

6 Deduplication

Deduplication is the functionality in which only a single copy of a data block is stored on a device.
Duplicate information is removed, allowing you to store more data in a given amount of space and
restore data using lower bandwidth links. The HP StorageWorks virtual library system uses Accelerated
deduplication.
NOTE:
The deduplication feature is only available on systems running VLS software version 3.0 or higher.
This section describes deduplication including getting deduplication running on your system, configuring
deduplication, and viewing reports.
NOTE:
See the HP StorageWorks VLS and D2D Solutions Guide for more detailed information.

How It Works

HP Accelerated deduplication compares the most recent version of a backup to the previous version
using object-level differencing code. It places pointers in the earlier version that identify duplicated
content in the new version. Deduplication then eliminates the redundant data in the earlier version
while retaining the complete, new version. You can improve deduplication performance simply by
adding additional nodes.
NOTE:
Deduplication takes place after the data has been processed to the backup tapes. Therefore, any data
backed up to compression-enabled virtual tape drives (both software and hardware compression) is
compressed before it is deduplicated.
The following is an overview of the deduplication process. See the HP StorageWorks VLS and D2D
Solutions Guide for more detailed information.
1.
When a backup runs, a data grooming exercise is performed on the fly. Using meta-data attached
by the backup application, data grooming maps the content or "objects" of the backup, and
assembles a content database. This process has minimal performance impact.
2.
After the scheduled backups have completed, the content database is used to "delta-difference"
(compare) objects in current and previous backups from the same hosts. There are different levels
of comparison. For example, files may be compared using a strong hashing function, while other
objects may be compared at a byte level.
HP StorageWorks 12000 Gateway Virtual Library System User Guide
91

Advertisement

Table of Contents
loading

Table of Contents