Deduplication: Make a ‘DASH’ for It

Press release

Published November 21st, 2011 - 10:30 GMT

Al Bawaba

Deduplication is a topic that could, and has, filled volumes. How ironic. By now just about everyone that has anything to do with data protection, management or storage has heard about dedupe and most people have ensured that the ‘check box’ for dedupe is included in their plans.

Most people have heard the arguments for how it will help them. It can:

Optimise current storage investments for backup due to the large, recurring and highly duplicative data sets

Enhance protection by enabling the ability to keep more backup copies using the same amount of storage

Delay the need to buy more storage, even though data is growing rapidly

Alleviate the headaches around protecting this expanding data within finite backup windows – that often couldn't be met already

Reduce the cost of tape and associated expenses

However, one aspect that often gets overlooked is the benefit that deduplication can provide when it comes to disaster recovery, information access and, at a more basic level, how copies of backup data are distributed across the environment (a.k.a. retention). It is also fair to say that limitations are also often overlooked and some tried and tested practiced used for years become impossible or just too impractical with current processing power.

Instead of deliberating about the cost/benefit of tape over disc-based storage, and viewing deduplication simply as a means to reduce your storage footprint, it would be wiser to devise a company-wide archiving and retrieval strategy and put in place a deduplication strategy that provides the speed/ease of data search and retrieval that meets your business needs, on an everyday basis and with business continuity and more stringent e-discovery guidelines in mind.

The solution is to use global, source and target deduplication coupled with Deduplication Accelerated Streaming Hash, a.k.a. "DASH" to create independent retention periods for copies of backup data so that they can be efficiently distributed across the enterprise. This enables IT staff to centrally manage and move data to multiple locations on different storage hardware with different retention periods to meet retention, access, recovery and protection requirements. This is good news because many dedupe systems can only have data on multiple sites if they replicate the whole disk library, often with proprietary replication software to the same type of dedupe disk.

Until now, in order to get copies of dedupe data without using expensive combinations of dedupe disk and replication, you had to go through a series of time-consuming rehydrate / reprocessing steps. In contrast, DASH Copy simply moves the changed block/segments relative to the deduplication database so that you can ‘spin off’ subsets of a dedupe data store without rehydration. This DASH Copy process keeps the data footprint small for transport over wide area networks and also allows for incremental updates, which are also deduplicated. Another trick that DASH technology provides enables you to create much less resource-intensive Synthetic Full backups in half the time.

Essentially synthetics provide ‘incremental forever’ backup but crucially it also provides a ‘DR ready’ full backup without going back to the production server to get it. This has many benefits and is very desirable, especially for busy application servers such as those performing email and document management tasks.

Dedupe makes the synthetic process incredibly processor intensive, with the task taking up to 50% longer than an actual full backup, even if media servers work with dedupe appliances to spread the load.

This is where DASH Full completely re-arranges the process. Instead of using the actual backup data the processing of a Synthetic Full is performed in the dedupe database. This means the entire process is completed with metadata – in effect a map of what a new full should look like is created, rather than an actual new full backup. For many organisations the ability to go back to ‘incremental forever’ is very valuable indeed and because the DASH Full process executes in just minutes it’s never been easier or more cost effective to keep DR copies of servers and application data ready to go if needed.

DASH functionality can also help you eliminate the added complexities of host or storage replication, like breaking mirrors, creating clones, attaching volumes, getting bogged down in scripts, etc., and help you meet recovery and retention requirements without paying the penalty of having to rehydrate data first and then deduplicate it again on the backend.

Ultimately what this means for business is that you can enhance data protection and recovery while at the same time reducing costs, adding flexibility and increasing access to your data.

DASH dedupe enabled organisations are now in a position to select the most appropriate storage type for the data they wish to store based on its value, the length of time it needs to be archived, how often and quickly it is likely to be required and its associated restore times. Those individuals emotionally scarred by the pain of retrieval from tape and its management often want it removed from their domain, but like it or not it still provides fantastic value for the medium to longer term. Having the ability to dedupe to tape ensures it can be reduced by around 90% but its low cost benefits means keeping some tape does still stack-up. The good news is that, used in this way, it has a role for the storage of non-essential data and the likelihood of it actually being used for a recovery is almost zero.

Synthetic backup in the guise of DASH Full allows you to slash backup windows and reduce the impact on production servers, whilst still maintaining all of the benefits of dedupe. Making use of this technology also minimises the need for infrastructure investment and the rollout of snap enabled arrays outside of the top tier of production systems. DASH Copy completes the story by keeping DR costs down while maximising recoverability.

Background Information

CommVault

Commvault is a recognized global leader in enterprise backup, recovery, and data management across any hybrid environment.

Commvault’s converged data management solution redefines what backup means for the progressive enterprise through solutions that protect, manage and use their most critical asset — their data.
Founded in 1996, Commvault is publicly traded (NASDAQ: CVLT) and headquartered in Tinton Falls, New Jersey.