Custom Search

Friday, January 29, 2010

Backup on ZFS, part 1

One of the nice things about having systems on ZFS was that the disk failures in the last few days didn't cost me any noticeable downtime per se. Pulling and replacing disks - without hot swappable hardware - and the system upgrade those inspired still costs time, as are hardware failures that leave a system unbootable. But in general, disk problems with ZFS file systems are just minor problems: you notice the disk is no longer in service, decide how to deal with it, and then do so.

Part of that is having reliable backups. ZFS makes even that easier. The best example is of course the OpenSolaris "Time Slider" tool, which uses the ZFS snapshot feature to let you recover old versions of files. Snapshots also make backups to other disks - suitable for taking offsite, for instance - easier to deal with as well.

As disks have gotten cheap, it's become common to keep backups on line. A typical home-grown backup script will use something like rsync to copy files to the destination disk, or file server. To make old versions available, it will then play games with a copy of the directory tree and symlinks to create an image of the tree at that time while not duplicating files that haven't changed between backups.

Snapshots can go one better. If your copy software will write just changed blocked in a file, instead of recreating the entire file, then the blocks that haven't changed in a file will also be shared across snapshots. Better yet, the snapshot can be created by running one command - a "zfs snapshot backuppool/mybackup" on the system the backup resides on.

The final nicety is that even systems without the hardware oomph for ZFS - it was designed for 64 bit CPUs with a gigabyte of ram - or an OS that doesn't support ZFS can take advantage of this in their backups. Here's the script I use for  my local backups. While I use it in production, it's not up to product status, in that it's really intended for use by relatively astute system admins. In particular, there's no nice error reporting, no simple tools for either complete restores or simple file recovery, etc. Those shouldn't be hard to build on top of this, but these are good enough for my use.

As with the previous script, the goal is more to get people thinking about how to leverage ZFS for these types of chores. If you've already done that and have tools available, provide a link in the comments and I'll pull it into the body so you get the traffic. If you feel moved to productize this script - the same applies.
#!/bin/sh

BACKUP_DEST=/export/backups
BACKUP_FS=external/export/backups
BACKUP_HOST=backups
BACKUP_USER=operator

if [ "$DEBUG" = "" ]
then
ECHO=""
else
ECHO=echo
fi

case $(uname) in
Darwin)
dump_list=$(df -T ufs,hfs | awk 'NR != 1 { print $NF }') ;
extra_flags="--extended-attribues"
hostname=$(hostname -s) ;;
FreeBSD)
dump_list=$(mount -p -t ufs,zfs | awk ' { print $2 }') ;
extra_flags="--acls --xattrs"
hostname=$(hostname -s) ;;
SunOS)
dump_list=$(/usr/gnu/bin/df -P -t zfs -t ufs | awk 'NR != 1 && !/^external/ { print $NF }') ;
extra_flags=""
hostname=$(hostname) ;;
esac

if [ $# -eq 0 ]
then
dump_name=$hostname
else
dump_name=$1; shift
dump_list="$@"
fi

for dir in $dump_list
do
case $dir in
/tmp*) echo Skipping $dir ;;
*) $ECHO rsync --verbose --archive --hard-links --delete --one-file-system --no-whole-file --exclude /.zfs $dir $BACKUP_DEST/$dump_name$dir ;;
esac
done

SNAPSHOT_COMMAND="/usr/sbin/zfs snapshot -r $BACKUP_FS/$dump_name@$(date +%F)"
if [ "$BACKUP_HOST" = "$hostname" ]
then
$ECHO $SNAPSHOT_COMMAND
else
$ECHO su $BACKUP_USER -c "ssh $BACKUP_HOST 'pfexec $SNAPSHOT_COMMAND'"
fi

Monday, January 25, 2010

Some better practices for ZFS on FreeBSD

Rather than working on the clojure web framework, I've been dealing with broken hardware, including some system reinstalls. So let's talk about that.

ZFS has been available in FreeBSD for a while now, and in the recent released 8.0 is now considered production quality. There are a number of write ups on the web about how to set up various configurations of FreeBSD on ZFS: with a UFS boot, on a GPT mac drive, with no UFS at all, etc. Most seem to have one thing in common - they just duplicate a standard FreeBSD UFS file system configuration, without seeming to consider how ZFS has changed the game. Not really the fault of the author; I did much the same when I set up my first ZFS system a few years ago. But having those few years experience - and seeing how the OpenSolaris folks set things up - indicates that there are better ways. I want to talk about that in hopes of getting others to spend more time thinking about this.

First, a note on terminology. Those of you familiar with FreeBSD on X86 can skip this. Unix has had "partitions" since before there was a DOS. FreeBSD continues to call them that. What the DOS folks - and most everyone else - calls partitions are called "slices". A FreeBSD installation typically has one slice for FreeBSD on the disk, with multiple partitions - one per file system - in that slice. Slices are numbered starting at 1. Partitions are lettered, usually a-h. A typical FreeBSD partition name is ad0s1a, meaning drive number 0 on the ATA controller, slice 1, partition a.

Now a quick overview of how to set up FreeBSD with a ZFS root file system. Details area easy to find in google if you need them;

  1. Partition the drive, providing a swap and data partition. If you're using GPT for partitioning, you'll need a boot partition as well. Note that on OpenSolaris, giving ZFS a partition is a bad idea, as it disabled write caching on the drive because OpenSolaris has file systems that can't handle drive write caching. On FreeBSD, all the file system handle drive write caching properly, so this isn't a problem.

  2. Create a zfs pool on that partition.

  3. Install the system onto an fs in that pool. Most people seem to like copying the files from a running system. I used the method documented in /usr/src/UPDATING to install to a fresh partition. For that to work cleanly, you'll want  NO_FSCHG defined in /etc/make.conf, or -DNO_FSCHG on the command line, as FreeBSD's zfs doesn't do attribbutes. You'll also need to make sure that /boot/loader was built with LOADER_ZFS_SUPPORT defined.

  4. Install a boot loader. Just install the appropriate ones for your partitioning scheme.

  5. Config for zfs. You may want to set the bootfs property on your pool to the root file system to tell the boot loader where to find /boot/loader. You'll want to set zfs_load="YES" and vfs.root.mountfrom="zfs:data/root/fs" in /boot/loader.conf to tell the loader where the root file system is. Set zfs_enable="yes" in /etc/rc.conf so the system knows to turn on zfs. Finally, to prevent zfs from trying to mount your root file system a second time, set the mountpoint property to "legacy" on that file system.

  6. Last step: export and import the resulting pool, then copy /boot/zfs/zpool.cache to the same location on your new system.


Again, this is a quick overview. Google for details if you need them.

Now to the point - how to set up your filesystems under ZFS, considering how ZFS has changed the game.

For instance, it's much more robust than the UFS file systems, so there's little point in creating partitions to protect things from corruption - though the UFS file systems have been solid enough for that for a while. Likewise, ZFS file systems aren't locked to a pool of blocks, so there's not much point creating file systems to allocate disk space to different purposes - though you can put limits on a specific file system if you want to. Those are the classic reasons to set up new file systems.

With ZFS, file systems are cheap and easy to create. The reason for creating one is that you want to have different properties on it than on it's parent, or that you might want to have a group of file systems inherit some set of properties. You might also want to use the spiffy ZFS snapshot/clone tools on some file system without using it on them all.

So, the first thing to notice is that it's easy to boot from a different root file system. Booting from a UFS file system needs the root on partition a - unless that's been changed and I didn't notice - meaning you need to create a new slice for it, and possibly put it on a new disk. With ZFS, you can boot from a new root file system by changing two settings: bootfs on the boot pool, and vfs.root.mountfrom in /boot/loader.conf  (i'm sure one of those will vanish at some point) and rebooting. So you could, in theory, have a couple of different versions of FreeBSD installed on the same pool, and boot between them.

In fact, that looks like it's worth automating, as it's trivial to do, and will cut down the number of places you can typo the pool name. So here's zfsbootfrom.sh:
#!/bin/sh
FS=$1
POOL=$(echo $FS | sed 's;/.*;;')
$DEBUG zfs set bootfs=$FS $POOL
$DEBUG sed -i .old "/vfs.root.mountfrom/s;=.*;=\"zfs:$FS\"" /boot/loader.conf
This is simple enough I didn't add options; to debug it, just run it as "DEBUG=echo zfsbootfrom.sh newrootfs". Better yet, grab the tool cryx mentioned in the comments from http://anonsvn.h3q.com/projects/freebsd-patches/browser/manageBE/manageBE.

You can do this with your root file system as the top file system in the pool, but that's going to get confusing eventually. Best to sort things out into their own group. So my suggestion is that the root file system be something like "data/root/8.0". An 8-STABLE root might be "data/root/8-STABLE".

You can even avoid having to do fresh installs on each new system - unless you want to - by leveraging the power of zfs. To wit:
zfs snapshot data/root/8.0@NOW
zfs clone data/root/8.0@NOW /data/root/8-STABLE
mount -t zfs data/root/8-STABLE /altroot
cd /altroot/usr/src
make update
# proceed with bind and install to /altroot, rather than modifying your running system.
can now boot that, and try a new version of  FreeBSD - without having to change your old file system. If it doesn't work, just reset the bootfromzfs values, delete the file system, and try again later. Or ignore it until you feel like updating it and trying again later.

So, what things would we want to share between two different versions of FreeBSD this way? Not /usr - the userland is pretty tightly tied to the kernel in FreeBSD. usr/local? Certainly - packages and anything you build will work on updates to the current release, and on the next one (or more) with the appropriate compatibility options. For that matter, /usr/ports probably wants to be it's own file system since the ports project explicitly supports multiple versions. /etc? Maybe. The system knobs can probably be shared, but some applying and some not for each system will be confusing. On the other hand, the ports/package system writes to /etc/make.conf as well. If you're not running a mirrored root, you might consider making /etc a file system just to set "copies=2" to improve reliability. /home? Certainly it should be shared. /var? Most likely, as ports put info in there as well as the normal spool things. In fact, enough different things go on there you may want it to be a file system so you can create subfilesystems with the appropriate properties. If you're exporting file systems, you can create an fs to hold them, and set the appropriate property on it to the most common value so you don't have to set it on it's children. The file systems underneath that will then all be exported automatically.

That said, you might want to do step 2 above something like so:
zpool create data ad0s1
zfs create -p data/root/8.0
zfs create -o mountpoint=/home data/home
zfs create -o mountpoint=/usr/ports -o compression=on data/ports
zfs create -o compression=off data/ports/distfiles
zfs create -o mountpoint=/export -o sharenfs="rw=@192.168.195.0/24"  data/export
zfs create -o mountpoint=/var data/var
zfs create -o copies=2 data/root/8.0/etc
You can also set the properties exec (allow - or not - execution of files on that fs) and setuid (honor - or not - the setuid bit) as appropriate for each of these file systems. /var, in particular, bears a closer look. You might consider turning off setuid and exec on it and most of it's descendants. /var/log might be compressed unless you send all your logs elsewhere. /var/db/pkg is a candidate for compression. Some database packages install things in /var/db as well; in which case you might want to check to the zfs best practices wiki for that database.

One final note. I mentioned a mirrored root pool. I run most of my systems this way, and recommend it.They meant that, even though the hardwares did cost me time, it was to repair them, not because the services in question were unavailable. Setting up the mirror is simple. You'll need to install the boot loaders on both disks - that's step 4 above for the second disk. You also need to use a mirror vdev on the zpoolc create command. That changes the command to something like "zpool create data mirror ad0s1 ad1s1". The rest of the zfs commands will be the same; the only thing that knows that the pool is mirrored is zpool.

Tuesday, January 12, 2010

There isn't an app for that

Or: Why the app store isn't up to the android market.

A number of people have pointed out that using an Android phone compares to using an iPhone much like using a Windows box compares to using a Mac. I've used both fairly extensively, as I bought an iPhone the first week they were available, and eventually upgraded to the 3G when it came out. I slowly grew more dissaponited by the iPhone, and sold it to buy a G1 about the time that iPhone OS 3.0 launched. I installed 3.0 as a favor for the buyer, but never played with it. I've since upgraded to a Nexus one. I agree with that assessment - the iPhone is a better overall experience.

But that's only half the story. Apple achieves that experience by having an incredibly tight rein on the available applications. They insure that the applications actually follow the UI guidelines, which exhibit Apple's typical high quality. The first problem is that they extend this to exclude things that might confuse the user by looking to much like an Apple phone call. Which means that you can only get the Apple experience via the App store. If you have more advanced needs or higher standards, you loose. The net result was that I wasn't happy with the iPhone unless it was jailbroken, and hence sold it to buy a platform without those problems.

Examples abound. The best known is probably Google Voice, which Apple supposedly rejected because it looked to much like the internal phone application. Before iPhone 3.0, users wanted MMS, push email from Exchange servers, and the ability to record video. Some were available if you jailbroke your phone - but not from the App store. They all showed up in the 3.0 OS. They were all available from the Market - if not shipped on your phone - before the iPhone had them.

Multitasking is still an "Apple apps only" thing - unless you jailbreak your iPhone. I'm not sure if other apps can now use the headset buttons - they couldn't prior to 3.0, and it was really annoying. Android apps have been doing both for quite a while now.

As a potential application developer, this cuts even deeper. For the iPhone, you pretty much have to develop in Objective-C. Nothing wrong with that per se, but what if it doesn't fit your project? Google publishes a collection of scripting engines which don't yet have full access to the Google UI APIs, but can be used to write background tasks and plugins for some applications. Further, a number of interesting modern languages built on top of the JVM - Clojure and Scala, for instance - have been ported to the Android Java platform. Should worst come to worst, it's possible to write Android apps in C. This makes for a more diverse development ecosystem, and happier developers.

The other issue that results from Apples tight control actually makes the iPhone experience - not just the applications - worse than Android, at least in this one small area. Apple has failed to provide a standardized way to move data to and from your phone. Net result - all the apps do it differently. Many have built-in web servers that you connect to from your desktop. Others use a web server elsewhere to exchange data. Some send email to get it out, with no way to update it. Some even have proprietary servers for their data.

Android, on the other hand, uses an SD card for data transport. Pretty much every app that uses real data can read it from and write it to the SD card. Which Android will let you mount via the standard USB cable to do data exchange. Apps in the Market provide bluetooth file exchange if that's to cumbersome. While this is generally not as nice as any single iPhone application, it's better th an the conglomeration that you get from an assortment of applications.

Bottom line - Apples tight control means the overall iPhone experience is better than Android, much like the Mac is better than Windows. But that same tight control means that for any given application, the experience for that application may be better than any competition available from the App store, just as the huge number of applications available for Windows improves the possibility that the best one available is available on Windows. Except for apps that need to share data with a desktop, where Apples tight control has left you hosed.