ZFS: It’s that simple!
- select the contributor at the end of the page -
But first, a word about Linux
ZFS may have started on Solaris, but it is not limited to Solaris. The good folks at zfsonlinux.org/ have made it so that you can install and run ZFS on a number of Linux distributions.
My test environment is CentOS. So, let's get ZFS up and going there:
That will take a few minutes to install because it's doing some compiles along the way, but it does so without any issues or errors.
Setup ZFS to start up at boot:
Disable SELinux, because this is not supported with ZFS yet:
I then added five new 2 GB virtual disks to give me some devices to play with. (My test environment is a CentOS virtual machine running on VMware Workstation.)
Disk discovery can be tricky with Linux. There are some tricks you can try in order to locate those disks without the reboot. But for the purpose of this article, I will keep it simple by doing the reboot. (Reboot also ensures that the software is installed properly and SELINUX is disabled).
Once the virtual machine is back up and running, I can see the new disks with the fdisk utility:
The disks were discovered as /dev/sdb through /dev/sdf.
Building the ZPool
From here on out the commands are no longer Linux specific. All these commands will work equally well on Solaris as they do in Linux. The only thing to be cautious of is the device names. In Linux, hard disks are sdb, sdc, etc. In Solaris, they are c0t0d0, c0t0d1, etc.
Let's create the ZFS Pool with the command:
And you're done! You now have a file system in and data ready to use. It's just that simple!
But I know you overachievers out there aren't done yet.
Note that no special commands were required to create logical volumes; no mkfs was required to create the file system on the volume; no mkdir was required to create the mount point; no mount was required to mount the file system; and finally, no edits were required to /etc/[v]fstab to make the mount permanent. All these things are associated with making a file system. The one “zpool” command took care of everything.
Let's step through the command to understand what we did.
- create: As the command suggests, we are creating a ZFS Pool.
- -f: f is for force. The command is destructive to data that is in the underlying devices, and the -f tells zpool we know what we are doing.
- data-pool: The name we are assigning to ZFS Pool.
- raidz: How we are going to protect the data from disk failures. Raid-Z is a variation of Raid-5 used by ZFS. Or you could “mirror.” If you had hardware RAID, you'd omit this parameter because you wouldn't need ZFS to protect you.
- sdb sdc: The disks you will be using in this pool. In this case, we are using four disks.
- sdd sde: Three will be data disks. The fourth will be a parity disk. Even though the parity is distributed among all four.
- spare sdf: The fifth disk is a spare disk that will kick in when one of the other fails.
- -m /data: The mount point where this ZFS file system will be mounted.
File systems, quotas and reservations
Let's create a couple file systems with quotas and reservations:
In the early days of UNIX, you would “partition” your hard drive into slices, and you would create a file system in each of those slices. If you had 20 GBs of disk, you could create one file system with 10 GBs of space and 10 GBs on the second. But there was no easy way to adjust if you needed 12 GBs on the first and only 2 on the second.
Then along came the Logical Volume Manager, and combined with a decent file system, you might be able to shrink one file system and its underlying volume, and then grow the second volume/file system.
But the shrinking always was a bit of a concern because you had to move data from the back of the volume of unused blocks to the front of the volume, and you might even need an outage.
ZFS Pools address all those issues. When you create the pool and the file system, all the space in that pool is broken down into small “extents,” and those extents are made available to whatever file system needs it. So, you don't need to know in advance how big each file system needs to be.
With that said, you don't want one application in one file system using up all those free extents and putting everything else in jeopardy. So, the concepts of quotas and reservations come into play.
A quota is how large a file system is allowed to grow to. A reservation is how much space from the pool you want to set aside for that file system to use, to prevent other file systems from taking it. If you need to change those values, a simple command adjusts them without having to move a single block of data.
ZFS has built in compression that shrinks the amount of data that will be written to disk before it writes it. Not only does this allow you to save on disk space, but because you are reducing the amount of data that you're writing to disk, you might actually speed up your application. Here is an extreme and unrealistic demonstration of this:
Turn on compression on one file system. Turn it off on the second one:
Note both file systems are empty:
Write 500 GBs of data to each of them. The data is all zeros and is very repetitive, compressing really well.
In the compressed case, note that the speed is at 154 MB/s, and that the file system still looks empty.
However, when not compressed, the speed slows down to 53 MB/s as the system has to write a lot of zeros to the hard drive. Also note that the space used is actually showing up one for one in the df output.
Snapshots and clones
ZFS has the ability to create a read-only view of your data, called a snapshot, to help you recover from accidental data loss or corruption.
Start by taking a snapshot of your file system:
Now, you go on your way, and whoops! You accidentally delete a file.
The snapshot can be accessed by changing to the hidden .zfs/snapshot directory.
Let's bring back that file that you accidentally deleted.
If you want a read-write version of your file system, you need a clone. Clones sit on top of a snapshot. Let's create one on the snapshot from above:
You now have two read-write copies of your file system that you can use as needed.
Let's clean up the clone and the underlying snapshot:
File system sharing with NFS
Before trying to share anything, let's make sure we have all the pieces required to run a NFS server:
The actual sharing is one single command:
Let's use that share on the NFS client:
So what happens when a disk in the Raid-Z pool fails? Fortunately, nothing very exciting. I emulated a disk failure by removing a virtual disk from the virtual machine. /var/log/messages started to report some error messages, which you should theoretically be monitoring from your monitoring host:
If you check the status, you will see the disk is in trouble, the pool is degraded, but the data is fine:
Let's replace the failed disk with the spare we added in the beginning:
Check the status again. Notice it's rebuilding:
Wait a little longer. The rebuild is now complete and running on the hot spare:
The pool is in a degraded state, but you are back to full redundancy using the spare.
At a later date, you replace the faulty drive and rediscover it, or reboot to rediscover it. Then, one last command to fix up the ZPool:
The status shows the rebuild taking place again:
And eventually, everything goes back to normal:
Hopefully you now see the simplicity and power of ZFS, and hopefully you have the basic skills to start using it yourself. There are of course many more options that you can take advantage of, and you'll need to read up and experiment with them!