So I’ve thought to myself lately when going to customers and suggesting repositories for their Veeam backed up data, that there is a little haze over what should be best for him. Of course this would be specific per environment and I’m not taking sides her 🙂 but I think a small comparison between the three options should be a good take on subject and of course explanations on the statistics relevant for the Battle of the species.
Let’s map the environment that I will use for the testing scenarios. The battle in this case is only on functionality and I don’t think that the performance aspect is an actual live environment metric showcase but I’ll point it out also. All Appliances I’ll use are Virtual Appliance based and are on the same vSphere Datastores with the same amount of drives and types. You can see below the four Veeam Repositories that we will use:
The following will be four (4) scenarios we are going to check:
- Veeam Backup & Replication Native – In this scenario we will see what the native compression is and deduplication ratio for the software on a Windows local repository on the backup proxy server itself.
- EMC DataDomain DDboost – Here we will use EMCs’ DDboost drivers which do a Pre-process on backup server (client side) compression and deduplication which decrease the throughput over the bandwidth for only the changed blocks, therefore we disable the Veeam Backup & Replication Compression and deduplication mechanism for the specific job.
- EMC DataDomain CIFS – Here we will use the EMC Data domain as a “Native CIFS” mechanism because the Compression and Deduplication is a Pre-Process (Server side) which means that all data is transferred over the network and the EMC DataDomain appliance will do the compression and deduplication before it writes the data to disks. Also disable the Veeam Backup & Replication Compression and deduplication mechanism for the specific job as we did with the EMC DataDomain DDboost drivers.
- HP StoreOnce – Similar to the EMC DataDomain CIFS behavior that is what we expect from this solution. HP StoreOnce currently works with Veeam B&R integration only via CIFS and not using HPs StoreOnce Catalyst which is the matching feature like EMC DataDomain DDBoost.
Data will be transferred fully over the network and the deduplication and compression will be done on the appliance itself.
Note: For a deep dive on more features of each Deduplication appliance go to the Veeam Backup & Replication help Center section.
I’ve created for each of the scenarios a Job with the same five (5) identical windows machines in order to get the highest ratio from each mechanism.
The first jobs is with the “Veeam Backup & Replication Native” this one is configured with the Veeam B&R Windows Repository” we have prior configured, all other configurations I’ve selected default “Next-Next” methods everyone likes J . The second Job is “Veeam Backup & Replication EMC DDboost” has the same configurations as the default job except that that the Veeam B&R repository which is “Veeam B&R EMC DDboost Repository” and then the “Advanced” section.
In the Advanced section we will deselect “Enable inline data deduplication” because we do offloading for this part to the EMC DataDomain change the “storage optimization” section to “Local Target (16TB + backup files)”
For the third job we will configure the “EMC DataDomain CIFS” which has the same configure as above “EMC DataDomain DDboost JOB” except changing the repository to the “Veeam B&R EMC CIFS Repository”.
Note: We can press “right-click” on the job and choose Clone, after only edit and change the repository and name.
The fourth and last job in our testing will be “HP StoreOnce” which is as the EMC DataDomain appliance because all deduplication appliance that Veeam supports should be configured the same from their side. So this job will be done as we mentioned above in “EMC DataDomain CIFS” but selected the “Veeam B&R HP StoreOnce Repository”, other than that kept the same configurations.
Now after we are aligned on everything lets go to the findings of all jobs and the conclusions of things.
I’m not a real expert here in the Backup Appliance department and it’s not in my ally but because of my Veeam Backup & Replication projects I only want to point out the results here and the difference between behaviors observed.
So for starters as we can see the Processing rate and Throughput Peak is almost the same in all cases, because the network connectivity are the same here. I’ll skip also the Data Processed and Data Read which is obvious that it’s the RAW capacity allocated on the vSphere VMFS Datastore which is 5 VM x 50GB VMDK files prior to doing any kind of manipulation on the data itself. The Data Transferred section is where everything starts to be different between all scenarios.
When we look at the Transferred data it’s the amount of data that actually moves with the help of the Data Mover sub-service (Transport Service) on the source proxy to the target Data Mover which is used as the data gateway for the Repository. In our case they are the same but it works the same. We always would like to have as less data transferred on the network as possible so we can cut-off cost on WAN Links etc. We can see here that the Veeam B&R and EMC DD CIFS repositories are moving less data than the Deduplications appliance methods .This is because the Veeam Data Mover service is doing a deduplication and compression before transferring the data as mentioned above already. When you choose the Deduplications Appliance Veeam B&R ignores the build-in mechanism and uses the Vendor DLLs library’s in order to do the deduplications and compression over the network. Therefore in our case you see that the numbers for EMC DDboost method are around 212GB which are not the real data that was transferred over the network, rather the WRITTEN data itself is the amount that was transferred.
In the HP StoreOnce method we can actually see the real life numbers where no deduplication and compression done prior to moving data to the HP StoreOnce but only the unique blocks are written which is similar as in the EMC DataDomain appliance.
Note: There is a deep dive into the statistics report of Backup Job in the following link http://helpcenter.veeam.com/backup/80/vsphere/realtime_statistics.html
We can see here that Veeam B&R Native Repository is most data saved on disk rather the other methods. In the last row of the above chart table we can see exactly that Native methodology has wrote 15.2% of the Data-Read which means savings of ~85% of original capacity which is great. When we go to the appliances it’s another playfield. On both methods of backup EMC DataDomain DDbost and EMC DataDomain CIFS we got the same percentage of ~93% of original written data on disk because it uses the same compaction mechanism for the data itself. For the HP StoreOnce CIFS we got also great results, actually I was really surprised to see that here also the same mark of ~93% was hit.
Note: Currently Veeam B&R with HP StoreOnce supports only CIFS or VTL which in this case was the same battle against the EMC DataDomain CIFS, The rumors are that next Veeam v9 will support HP StoreOnce Catalyst which is the same mechanism as the EMC DataDomain DDBoost feature.
I wanted to do another check with the incremental backup sessions which should have also difference but let’s see what came up at the end. Basically I’ve added 5.9GB to each of the Fiver (5) virtual machines in the C:\ volume. The ZIP is an image of some sort and is actually a VM appliance which has multiple VMDKS files which is type of opened OVF.
I’m not going to get into the small details again here just recap on what happened now. We added ~30GB of new data and here in the current chart table we can see that Veeam Backup & Replications Native is giving a huge fight in how it reads the data and does its manipulations in order to save less on the disks itself. As you can see the percentage here is currently at the 15%-20% but if you go on and on with the chains of incremental and with a retention policy of weekly/monthly etc. the deduplication appliance will be far way efficient in data written to them than the Veeam Backup & Replication Native mechanism.
Well I’ve enjoyed it very much, did couple test runs again and again to verify I’m not putting the wrong numbers out there. Basically my opinion is split in two ways. If any customer is in the lower capacity range and doesn’t need retention of long term, I wouldn’t go with the Deduplications appliances because they are not cheap even for the starter-kits and you can be satisfied with some kind of external capacity via Direct Attached/ iSCSI Luns/CIFS.
When you really going the whole nine yards and your organization understands that it needs to have Daily’s, weeklies and couple monthly’s ,then you should start thinking on the Appliances as your possible solution.
Until next time 🙂 Have a Great week