To de-dupe or not to de-dupe?
- IT TOPICS:Storage
Data protection, as we know it, is changing rapidly – and for the good. One of the most exciting technologies is "data de-duplication," otherwise known as "capacity optimization," "commonality factoring," or "single-instancing," depending on the vendor you talk to.
While data de-duplication has been available for the past year or so from start-ups (e.g., Asigra, Avamar, DataDamain, and Diligent) and OEM software providers (e.g., Rocksoft), it is now on virtually all data protection providers' radar screens. In fact, I'll go out on a limb to say that de-duplication will be one of this year's most-talked-about and implemented new technologies, far surpassing CDP.
End-users can expect more widespread availability of de-duplication capabilities (as both stand-alone products and as options to existing backup applications and appliances) over the next few quarters. In that time, de-duplication should appear also as an option for minimizing data redundancy on primary storage systems and as an enabler of various WAN services (e.g., replication is much more affordable when WAN traffic is minimized).
In the context of backup, data de-duplication goes one significant step farther than an incremental: Incremental backups minimize backup traffic by copying only changed blocks of data after each initial full backup; data de-duplication copies only the changed blocks that are unique. Optimally, data de-duplication is done before data is written to the storage system (e.g, Asigra and Avamar) or it can be done during the process of data being written to the storage system (e.g., Data Domain and Diligent), but it can also be done after data is written to the storage system if you're concerned about data fidelity (i.e., accidental erasure of unique data).
Either way, de-duplication has the potential of saving end-users huge IT dollars in recouped storage space; ongoing de-duping can reduce back-end media requirements significantly – and without sacrificing data protection. In fact, it is not uncommon for these types of solutions to provide a 20:1 reduction in backup data. In dollar terms, de-duplication can reduce storage costs from $30 per GB to $1.50 per GB!
What's the catch? There really isn't one, but if you're skeptical, try rolling it out in a controlled environment first. I'm pretty certain you'll be amazed at the results!



