Saturday, February 06, 2010

ZFS Compression Vs Deduplication (dedup)

Been playing with ZFS dedupe for the last two weeks and just wanted to share my findings.  

Setup OpenSolaris build 131
Sun X4200, 2 x Dual Core Opteron 2.6Ghz, 8Gb Ram, 4 x 73Gb SAS 10Krpm

root@osol:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
compress                    72K  66.9G    21K  /compress
dedupe                      72K  66.9G    21K  /dedupe

root@osol:~# zfs set compression=on compress
root@osol:~# zfs set dedup=on dedupe

Wanted to see how much real data would dedupe.

I loaded the my company  project/Software folders, 68,000 files (Visio/PDF/Project/Word/OpenOffice/Excel,ISO's... ) total of 38.9Gb

Load times, copying files from local UFS filesystem to ZFS dataset.

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /dedupe/ ; tar xf - )
real    19:51.930407394
user        5.807881662
sys      1:48.025965013
38.8GB 0:19:51 [33.3MB/s]

root@osol:/ufs# ptime tar cf - iso projects software | pv | ( cd /compress/ ; tar xf - )
real    18:46.544321180
user        3.368262960
sys      1:52.065809786
38.8GB 0:18:46 [35.3MB/s]

The deupe ZFS volume was 66 seconds slower than the compress volume.

Let see how much space we saved for both methods

root@osol:/ufs# zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
compress    68G  36.1G  31.9G    53%  1.00x  ONLINE  -
dedupe      68G  38.4G  29.6G    56%  1.02x  ONLINE  -
rpool       67G  49.8G  17.2G    74%  1.00x  ONLINE  -

root@osol:/ufs# zfs get compressratio compress
NAME      PROPERTY       VALUE  SOURCE
compress  compressratio  1.08x  -

The compressed volume did a better job than dedupe and saving an extra 6% storage.

Conclusion
There isn't any advantages for dedupe on a general home file share, slight slower performance and less space saved when compared to compression.

Now why would you want to dedupe ? Well just look at my dedupe ratio of 2.28 for a NFS share with VMware, now this is exciting!

root@osol:~$ zpool list vm-dedupe
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vm-dedupe  68G   16.1G  51.9G    23%  2.28x  ONLINE  -

Therefore I can only say "Some data is more equal than others."

Andy
 (Minor edit  7.02.10)

2 comments:

Juan Fer Martins said...

Anyway it doesn't mean that dedup isn't a great ZFS feature.

take a look.

Base File System

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 696G 5.59G 690G 0% 1.00x ONLINE -
Result After Copying Up the OpenSolaris 2009.06 ISO

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 696G 6.23G 690G 0% 1.00x ONLINE -
Result After Copying Up a 2nd OpenSolaris 2009.06 ISO

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 696G 6.23G 690G 0% 1.92x ONLINE -
Result After Copying Up the OpenSolaris Build131 ISO

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 696G 6.84G 689G 0% 1.50x ONLINE -
Result After Copying Up the a 2nd OpenSolaris Build131 ISO

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 696G 6.89G 689G 0% 2.00x ONLINE -

Source: http://www.lastoctet.com/index.php/zfs-dedup-results

A Paton said...

Juan, your right ZFS's dedupe is great but not for all data types.

My tests on for VMware images see some real savings.

Thanks for commenting.