Monday, June 30, 2014

New Back up Technology – Disk data deduplication and compression techniques

By: Paul Oh   Categories:Storage and Data Management, Data Protection




In today’s environment, capacity-optimization technologies play a critical role where companies need to do more with less. With data growing exponentially, new methods of handling growing data have emerged both, in storage, and the backup world. Two of the most popular methods in the backup world are deduplication and compression.

While similar in purpose, these technologies provide data reductions for dissimilar data sets. It’s critical to understand how these two technologies operate, which application types will benefit from their use but most importantly, understanding how the combination can provide unmatched storage savings across the broadest set of use cases.

Data Deduplication

Data Deduplication provides storage savings by eliminating redundant blocks of data. The data deduplication process works by eliminating redundant data and ensuring that only the first unique instance of any data is actually retained. Subsequent iterations of the data are replaced with a pointer to the original. With ‘Data Deduplication’ data is stored in the same format as if it was not deduplicated with the exception that multiple files share storage blocks between them. This design allows the storage system to serve data without any additional processing prior to transferring the data to the requesting host. A good example would be backing up the OS files for 1000 servers, where the OS is pretty much the same for 1000 servers. This would benefit well from deduplication.

Additionally, data deduplication can operate at the file, block or bit level. In file-level deduplication, if two files are exactly alike, one copy of the file is stored and subsequent iterations receive pointers to the saved file. However, file deduplication is not highly efficient because the change of a single bit results in a totally different copy of the entire file being stored.

In block deduplication and bit deduplication, the software looks within a file and saves unique iterations of each block. If a file is updated, only the changed data is saved. This is a far more efficient process than file-level deduplication. Block deduplication and bit deduplication can achieve compression ratios ranging from 3: 1 to 30:1. Block level deduplication is what’s the commonly used as the standard in the backup world due to its greater efficiencies.

Data Compression

First of all, compression is the process of using algorithms to reduce the amount of physical space that a file takes up. Data compression provides storage savings by eliminating the binary level redundancy within a block of data. Unlike dedupe, compression is not concerned with whether a second copy of the same block exists, it simply wants to store the most efficient block on flash. By storing data in a format that is denser than the native form, compression algorithms “inflate” and “deflate” data, respectively as it is read or written.
Compression at the application layer, like a SQL or Oracle database, is somewhat of a balancing act. Faster compression and decompression speeds usually come at the expense of smaller space savings. In general, deduplication doesn’t work well after compression but the reverse (deduplication followed by compression) can provide you some additional storage space savings and can be beneficial in certain situations.

Faster Backups and Recovery Times

Deduplication and Compression Data Reduction Technologies result in enabling more data to be kept on disk. With data de-duplication, at an effective compression ratio of 5:1 to 10:1, 300 GB could be stored on 60 GB to 30GB of disk space. It's easy to see how this can lead to big savings, since not only do fewer disks need to be purchased, but disks also take longer to fill.

With less data to be backed up, it can be done faster, resulting in smaller backup windows, smaller (more recent) recovery point objectives (RPOs) and faster recovery time objectives (RTOs). Data de-duplication also speeds up remote backup, replication and disaster recovery processes. Data transfers can be accomplished sooner, freeing the network for other tasks, also allowing additional data to be transferred or reducing costs through the use of slower, less-expensive WANs.

Therefore since deduplication and compression have more CPU and calculation overhead, Data Reduction Solutions need to be carefully designed around your type of data, planned growth and other IT patterns in your environment. Various deduplication methods can be used - inline vs. post, source vs. target, client and/or server side deduplication.

Or course, having said all this it’s always better to save space at the source – i.e.  at the user level but this would require Company policies and processes to enforce this practice. Therefore, to assess what best data reduction technology can improve your data backup processes, meet your SLA requirements and budget, talk to Sentia. An expert advisor can assess the optimal solution to fit your business objectives. Feel free to drop me a line or call me at 1-866-610-8489 to discuss and provide guidance.

Paul Oh, Vice President, Technical Services
Sentia

Sarah Warsi
Sarah Warsi

Paul Oh

As marketing manager, Sarah plays a key role in managing Sentia's marketing efforts including developing the overall marketing strategy and direction, digital and social media management, to campaign development and execution.

Other posts by Paul Oh
Contact author Full biography

Full biography

As marketing manager, Sarah plays a key role in managing Sentia's marketing efforts including developing the overall marketing strategy and direction, digital and social media management, to campaign development and execution. She holds a degree in Journalism from Ryerson University.

x

Contact author

x

CategoryID: