zfs

basics

The two major programs you must learn to use are called zfs and zpool. Both of them are as important to know as mkfs.ext3 and e2fsck when using an ext file system. The syntax sometimes looks a bit inconsistent, because some commands given to these programs expect the name of a zpool, some of a dataset and some need both separated with a slash.

DO NOT COPY AND PASTE, BUT UNDERSTRAND AND TRY ONE BY ONE!

Enabling zfs during boot (zfsonlinux with systemd)

zfs.target can than be used in service files as dependency, i.e. useful for libvirt-guest.service

what is what

For a better understanding of the features ZFS offers I find some of the names rather confusing. Also there is way too little information on how to use ZFS in the field. The internet is full of examples on how to do stuff with ZFS, but not in which scenarios these features actually make sense.

zfs clone should be called
zfs branch
because a clone gets created from a snapshot (like a commit). Unlike a snapshot the clone can be modified (written to) and finally converted back to a dataset, which can be renamed to replace an original dataset (which is like ‘merge’)

Network file sharing

samba/cifs/windows

In order to share files among different devices and operating systems samba has been proven to be a solid solution, so that it is widely used in industrial environments as well. ZFS relies on a relativly new feature of samba, called usershare. It is not enabled in default installations and must be explicitly set in /etc/samba/smb.conf, looking like that:

The folder in this configuration snipplet must be created by hand and *afaik* it’s name is hard-coded in zfs. Other sites use usershare as an example, which I found not working together with zfs.

If this does not work you might get a better error message when trying to do a user share by hand. This is done with:

NFS – the network file system

NFS generally offers better support for filesystem features when compared to samba, can be tuned to be slightly faster and is sometimes easier to integrate in fstab.

how2: restart nfs

If you restart the nfs daemon you must reshare zfs shares as well or the exports might not longer represent your configuration. So get used to do:

nfs version issue

Be aware, that there is NFS4 and NFS3 out there, which use different configuration file formats, although the configuration file usually has the same name, namely /etc/exports. ZFS uses the older nfs3 *afaik* so that you must take care for yourself to not mix up both configuration file formats, but stick with nfs3, because mixing the two can lead to unexpected behaviour.

FreeBSD

FreeBSD users can configure a set of configuration files in their /etc/rc.conf by adding a line

for example. But that will cause trouble, because NFS3 and NFS4 easily can easily get mixed by the configuration style, which makes both versions of NFS active and leads to unresponsive hosts or connection errors. As a rule of thumb: Look which configuration file format is used by your ZFS and then append shares in that format accordingly.

SELinux: Shared file systems need the right context to be set

You must tell zfs in which selinux security context it shall mount the dataset or pool. It is much like the fstab-option I meantioned here.

FreeBSD: Alignment and Sector size

There seem to be different implementations of ZFS where some support the creation of a fixed sector size and others do not. That topic is somehow complex, because it is hardware related and while some manufacturers have made their devices to report a sector size of 4096byte, others have programmed to fake 512byte for compatibility with some operating systems. But using 512byte make things slower. In short: One needs a trick to make zfs use 4096bytes and that is done with gnop and gpart like so:

This will use the temporary nop-device to create the pool with the nop devices sector size and can also be done to attach a disk to an existing pool:

Which will initialize a resilver from /dev/ada0 to /dev/ada1.nop, effectively making ada1.nop a mirror of ada0.

As said before: We just wanted to use that nop device temporarily, so that we can do

which will reimport the pool without that nop device.

Backup strategy

zfs can be easily backed up using zfs send and zfs recv commands. It makes incremental backups possible without scanning for changes in real time as rsync would. Basicly you must have a snapshot laying around for each backup you want to make. I suggest using the current date in the form of yyyy-mm-dd (see: ISO8601). The steps are:

zfs send | zfs receive

The pv tool can be used to monitor the progress of the transfer. Also zfs send can predict the amount of data to be transferred:

example

performance tuning

The performance of zfs can really suck when using the wrong block size for your devices and can also waste disk space. These settings can only be changed when creating a pool and not modified afterwards.

* some devices, mostly hard disk drives ‘lie’ about their block sizes in order to preserve compatibility with legacy operating systems

Bugs and workarounds

No nfs shares after reboot

#2883: Currently a bug is preventing zfs to share datasets via nfs during boot. It has something to do with the order in which the zfs-services are called by systemd and someone has suggested to devide the existing systemd-scripts into smaller portions in order to gain more control about the order in which commands are executed. I have had success in working around this with this ugly script:

Permanent errors after scrub with silly names

These hex numbers represent the inode numbers of the broken files. We can locate those with find, but find uses decimal numbers. However Bash can convert those:

They must be deleted or restored and the error message will be gone after the following scrub

Command cheat sheet

To display the size currently occupied without snapshots (=refer), requires bc:

To be tried

Just as a side note for me

  • does wipefs do the same thing as zfs labelclear?
  • max,