Saturday, February 06, 2010

ZFS Compression Vs Deduplication (dedup)

Been playing with ZFS dedupe for the last two weeks and just wanted to share my findings.  

Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm

root@osol:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
compress                    72K  66.9G    21K  /compress
dedupe                      72K  66.9G    21K  /dedupe

root@osol:~# zfs set compression=on compress
root@osol:~# zfs set dedup=on dedupe

Wanted to see how much real data would dedupe.

I loaded the my company  project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb

Load times, copying files from local UFS filesystem to ZFS dataset.

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - )
real    19:51.930407394
user        5.807881662
sys      1:48.025965013
38.8GB 0:19:51 [33.3MB/s]

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - )
real    18:46.544321180
user        3.368262960
sys      1:52.065809786
38.8GB 0:18:46 [35.3MB/s]

The deupe ZFS volume was 66 seconds slower than the compress volume.

Let see how much space we saved for both methods

root@osol:/ufs# zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
compress    68G  36.1G  31.9G    53%  1.00x  ONLINE  -
dedupe      68G  38.4G  29.6G    56%  1.02x  ONLINE  -
rpool       67G  49.8G  17.2G    74%  1.00x  ONLINE  -

root@osol:/ufs# zfs get compressratio compress
NAME      PROPERTY       VALUE  SOURCE
compress  compressratio  1.08x  -

The compressed volume did a better job than dedupe and saving an extra 6% storage.

Conclusion
There isn't any advantages for dedupe on a general home file share, slight slower performance and less space saved when compared to compression.

Now why would you want to dedupe ? Well just look at my dedupe ratio of 2.28 for a NFS share with VMware, now this is exciting!

root@osol:~$ zpool list vm-dedupe
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vm-dedupe  68G   16.1G  51.9G    23%  2.28x  ONLINE  -

Therefore I can only say "Some data is more equal than others."

Andy
 (Minor edit  7.02.10)

No comments:

Solstice DiskSuite Command Summary

Having a sort out and found this course handout originally written by John Furlong - Sun Trainer in 23/10/2002  (Free to distribute). Remind...