Deduplication

Deduplication

Deduplication

Introduction

Deduplication is a technique for saving data to factor consisting of sequences of identical data to save space used. In the process of deduplication each file is divided into a plurality of sections. Each of these sections is associated with a unique identifier; the identifiers are stored in an index. The goal of deduplication is to store only once a single section. Also, a new instance of an already present section is not saved again, but replaced by a pointer to the corresponding identifier. A simple example is the use of centralized corporate e-mail system when an employee sends an email with an attached file size of 1 MB to two of his colleagues, this letter is preserved

1) In the folder Outbox sender

2) In the "Inbox" of the two receivers

3) All duplicated a backup of the database (at least one).

The total size of the e-mail of 6 MB could be reduced to 1MB with the help of deduplication. This is an example deduplication database email server, storage systems; this technology is implemented a more complex way. The use of data deduplication is actively developing in the storage of backup data, among hardware devices (NetApp NearStore VTL and, EMC Data Domain), and software solutions (Symantec Backup Exec 2010, and others), as often as a result of backups, the storage devices are virtually identical in content files with minimal changes in them. The uses of deduplication not only saves storage space, but also increase the speed of data storage and retrieval, due to the smaller amount of reading and writing of data backups.

Discussion

Data deduplication has facilitated the storage and transfer of redundant data through a network, along with the optimization of the bandwidth of the network and increased storage capacity. Storing data on disk more efficiently preserves the data for a longer period of time and allows the user further protection of data by safeguarding applications to the disk, which increases the likelihood of recovering data quickly. Less data transfer through the network also improves performance. The reduction of data transferred over a network connection wide area can allow organizations to consolidate the protection of remote sites or extend disaster recovery to data that previously did not benefit from this protection. The basic idea is that data deduplication is to save time and money by allowing more data to recover from disk and reduction of footprint, power consumption and cooling requirements of secondary storage, along with improved data protection.

Operation of deduplication

Deduplication systems operate differently than traditional compression methods, which use only a few samples compared to the so-called block-level, that is, the files are the same as in a number of blocks of size (usually a power of two) divided considered. This is also some confusion with the Single Instance Storage (SIS), designed to eliminate the identical files. An important function of deduplication is the "fingerprinting". Here files are broken down into segments of varying sizes (chunks). Byte-level then it is analyzed which segments offer the highest repetition ...

Duplication