3 Key Processes For Data Reduction

The days when a company could store all its customer data in a single spreadsheet on a mainframe server are over three decades behind us. Data storage needs are increasing exponentially, leading some business to make the move to cloud storage. But what if your business needs local access to data? What if you need the fastest response times possible and the security of knowing all your data is safe behind your own firewall? Unless your IT budget is massive, you’ll have to be selective with your data storage, not just with your hardware, but with how the data is handled on the physical storage medium, too. You need a way to implement things like deduplication, compression and thin provisioning.

Any company that deploys multiple virtual computers—whether servers or desktops—can benefit from implementing a mixture of data reduction methods. If your company’s deploying a virtual desktop infrastructure or running multiple web servers, it’ll be easy to save some space and money by handling your data more efficiently. These methods don’t work as well for completely unique data, such as massive data sets collected from a variety of marketing surveys, for instance; but wherever you have similar or duplicate data—such as operating system files on multiple virtual machines—you’ve got a prime candidate.

Deduplication

The core technology in data reduction is deduplication. Storage controllers juggle the heaviest deduplication tasks in the most efficient setups, ferreting out exact copies in data sets or portions of sets and replacing them with references to the identical data already stored. This can result in a reduction between 10 and 30 times, depending on the uniqueness of the overall data being stored. Dedupe technology is a key feature for any organization trying to make its data storage more efficient, because it reduces the overall cost of data storage systems and makes things like flash storage arrays more affordable, because you start needing less physical storage.

Compression

At the most basic level, deduplication and compression are the same thing: They make your logical data take up less physical storage space. In reality, however, they work on two different levels of data. Whereas deduplication looks at small sections of each file you store and replaces identical copies, compression looks at whole files, replacing identical ones with short codes referencing the original sequence. This works well when you have multiple virtual computers all running the same operating system; compression can take the duplicate system files, like everything in Windows’ System32 folder, and store just a single copy. Instantly, you have more space.

Thin Provisioning

Thin provisioning is less about making the files you actually store take up less space than it is tricking your systems into using less space than they would otherwise. Normally, your various servers would request more storage from your NAS than they’d use immediately; it’s a bad thing when servers run out of space, so of course they’ll maintain some overhead. Just the same, that can end up wasting a lot of space. Thin provisioning is using your storage controller to tell your servers, “Sure, have as much space as you need! I’ll reserve whatever you want for you.” Of course, it doesn’t exactly do this. It only maintains as much space as the server is actually using, but it can provision more as needed. This keeps greedy servers and programs from earmarking a bunch of storage space they won’t need, keeping it free for other uses.

If you want to get the most out of your local data storage system, you can’t just set it and forget it, unless you buy specialized storage arrays. In order to save space and, hopefully, keep your company from requiring racks of new disks every year, you have to implement a mix of strategies to reduce underutilization and pare down the data you actually store to just the essentials. There’s a lot of duplicate data running around. Before you upgrade your NAS, check out some ways to make better use of what you have, or use your knowledge of data efficiency to inform your next hardware purchase.

TopPins

3 Key Processes For Data Reduction

Deduplication

Compression

Thin Provisioning

Image credit: nokhoog_buchachon on Freedigitalphotos.net