ZFS Compression Vs Deduplication (dedup)

Been playing with ZFS dedupe for the last two weeks and just wanted to share my findings.  

Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm

root@osol:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
compress                    72K  66.9G    21K  /compress
dedupe                      72K  66.9G    21K  /dedupe

root@osol:~# zfs set compression=on compress
root@osol:~# zfs set dedup=on dedupe

Wanted to see how much real data would dedupe.

I loaded the my company  project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb

Load times, copying files from local UFS filesystem to ZFS dataset.

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - )
real    19:51.930407394
user        5.807881662
sys      1:48.025965013
38.8GB 0:19:51 [33.3MB/s]

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - )
real    18:46.544321180
user        3.368262960
sys      1:52.065809786
38.8GB 0:18:46 [35.3MB/s]

The deupe ZFS volume was 66 seconds slower than the compress volume.

Let see how much space we saved for both methods

root@osol:/ufs# zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
compress    68G  36.1G  31.9G    53%  1.00x  ONLINE  -
dedupe      68G  38.4G  29.6G    56%  1.02x  ONLINE  -
rpool       67G  49.8G  17.2G    74%  1.00x  ONLINE  -

root@osol:/ufs# zfs get compressratio compress
NAME      PROPERTY       VALUE  SOURCE
compress  compressratio  1.08x  -

The compressed volume did a better job than dedupe and saving an extra 6% storage.

Conclusion
There isn't any advantages for dedupe on a general home file share, slight slower performance and less space saved when compared to compression.

Now why would you want to dedupe ? Well just look at my dedupe ratio of 2.28 for a NFS share with VMware, now this is exciting!

root@osol:~$ zpool list vm-dedupe
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vm-dedupe  68G   16.1G  51.9G    23%  2.28x  ONLINE  -

Therefore I can only say "Some data is more equal than others."

Andy
 (Minor edit  7.02.10)

Comments

Popular posts from this blog

Solaris 11 Locale en_GB.UTF-8 / en_GB.ISO8859-1 / en_GB.ISO8859-15

Scheduled network capture on Windows using Wireshark (tshark.exe)

[Linux] X-server ScreenShots from the CLI "ImageMagick"