Big data strategy break free from legacy

Much has been said about the 'big data' phenomenon with very little talk of how to effectively deal with this trend. Big data is a well known issue for most CIO's and there is no doubt that they are willing to act upon this. It is their ability to do so whilst restrained by traditional data management techniques however that is causing them consternation.
It is extremely difficult to base a big data strategy on legacy solutions that duplicate and silo data, based on vendor's products which operate independently. As a result most organisations have to process and retain between 10 and 15 copies of their original primary data source in backups, archives, disaster recovery copies and snapshots. This has a significant impact on both costs and operations and detracts from an organisations ability to make effective changes in order to resolve this vendor made problem.
To plan an effective long-term strategy to deal with big data, it is necessary to first look into the root of the problem. Stop for a moment and work out where the data is growing. Interestingly, most organisations, besides those dealing with media content, will find that the growth is not necessarily on the business systems or those systems critical to their ongoing operation and success. Understanding the data that is being processed is the first step to overcoming this problem. By focusing on processing only the important data, a large proportion of the ongoing costs of data management can be avoided. Once an understanding of what's critical is in place, then tiering off of redundant or unused data and building policies to stop copying irrelevant or unimportant data sets such as employee media files can be implemented.
It must be remembered that archiving is not purely for compliance but also to reduce the amount of low value data being stored on high value storage. After it has been moved from prime storage to lower cost storage levels, then the amount of data processed is reduced by the percentage of data archived. For many organizations this can be as high as 80 percent. That's 80 percent less data to scan, move over a network or SAN, process for backup and retain on media which equates to a significant saving in time, manpower and infrastructure.
Another important consideration when monitoring data growth is that it should only be seen at the source i.e. on the production systems. If it is not then it is highly likely that the organisation is deploying technologies which are duplicating the data sets. This needs thorough review to work out why there are so many copies of the data and for what purpose they are being created and retained.
For the production systems, consider where it is possible the use of appropriate technologies such as source side de-duplication. Most organisations still take a full backup of the system weekly. This means that each week, they are copying exactly the same files. Instead, by de-duplicating the data at the source only unique data is processed, not just from a single machine but across all physical and virtual servers and their associated applications and data.
As a final bit of advice, when selecting your data management solutions vendor, be aware that there is no one size fits all solution. Applying another new technology without a clear view of the goals and infrastructure could simply translate to another silo and yet another headache.
Background Information
CommVault
Commvault is a recognized global leader in enterprise backup, recovery, and data management across any hybrid environment.
Commvault’s converged data management solution redefines what backup means for the progressive enterprise through solutions that protect, manage and use their most critical asset — their data.
Founded in 1996, Commvault is publicly traded (NASDAQ: CVLT) and headquartered in Tinton Falls, New Jersey.