Custom Search

Sunday, July 23, 2017

Some ZFS speed tests

Introduction

I just got my first new desktop in long enough that it's the first one with SSDs. Those were large enough that I had a lot of space for level 2 archive replacement cache (L2ARC) for the spinning data drives. It also had an M2 slot, so I could add a small device as a separate intent log (SLOG), neither of which I had used before. Finally, I'm moving to a new operating system (Ubuntu from FreeBSD), so it seemed like a good time to do some disk benchmarks.


DISCLAMER: This is not a rigorous or scientific test. The primary goal for my desktop storage subsystem is inexpensive reliability, and if I can improve performance for little cost, cool. These tests are mostly to verify that I had not seriously decreased performance. If you are serious about disk performance, you should run your own tests, with a proper simulation of your workload.

Components

I've been running mirrored drives on my desktop for over a decade now, with the OS and my data on separate drives for about half of that. It's incredibly reliable - drive failures are at most just hiccups. Upgrades are trivial, and trivial to back out with boot environment support. I saw no reason to change this configuration, other than upgrading the components.

The system drive is a pair of INTEL 240G SSDs. They are partitioned: the root file system got 75G, there's a small EFI boot partition (mirrored with the Linux md driver), a swap partition on each drive, and the rest is L2ARC for the spinning drive that home is on. Those spinning drives hold various non-system data files, and are 2TB WD drives.

L2ARC is just a cache of recently used blocks from the cached pool. If it's on a device that's a lot faster than the disk, you should get better disk performance. Since it consumes RAM that would otherwise be used as a buffer cache, it's possible to have so much L2ARC that performance will suffer. It uses about 400 bytes of RAM per block of cache, and I use 128K blocks on my spinning drives, so 320G of L2ARC will use about a gigabyte of RAM, which shouldn't be a problem.

When a process does a synchronous write to a file system, it expects that the write will be stored to disk before the system call returns. ZFS file systems do this with a log of pending writes. The write system call returns when the intent log is written, but before the actual write to the file system happens. Should the system be disrupted before that second write happens, the intent log will be used to complete the writes when the file system is mounted. Otherwise, it gets flushed when the writes finish without ever having been read. Putting your intent log on a separate device (aka a SLOG) that's faster than the devices in the pool increases the speed of the synchronous writes to the file system.

I bought a cheap 30GB mSATA m2 card, and put the intent log for booth pools on it in two separate partitions to try and see if I could get a cheap performance bump. Since my SLOG is mSATA, it isn't any faster than the system SSD. On the other hand, it's never read from in normal use, and only written to for sync writes, we'll see. Worst case, I can put swap on this device just to get it o

Measurements

Simple checks

Just to make sure the test software (bonnie++) was returning sane results, I started with some tests that I had a rough idea of how it was going to behave.

Effect of mirroring

Let's see how much difference mirroring the SSD makes:
SATA SSD Sequential Output Sequential Input
Single Throughput 1470142K/sec 2663890K/sec
Latency 2346us 634ms
Mirror Throughput 1479113K/sec 2845500K/sec
Latency 7686us 11106us
This is what I expected: latency on output to the mirror is much worse, because the data has to be written to both drives. Throughput is about the same, because the system can go on to the next write once the first one is done and catch up later when the drives aren't busy. Input latency to the mirror is much better than for the single drive, as we can read from both drives at once. Input throughput is only 10% faster, though. Since this is the system disk, writes are relatively rare, so this is a clear win.

Speed of the SSD

Now let's compare the SSD and spinning media in my standard mirrored configuration to see how fast the SSD really is:
Mirrored pools Sequential Output Sequential Input
Spinning Throughput 153481K/sec 228104K/sec
Latency 3016ms 633ms
SSD Throughput 1479113K/sec 2845500K/sec
Latency 7686us 11106us
And yeah, the SSD is FAST. Roughly an order of magnitude better throughput, and much shorter latency. Note that the latency units went from microseconds for the SSD to milliseconds for the spinning disk, so the SSD is hundreds of times faster. Seek times are a killer.

The two SSDs

One final simple test - how do the two different SSDs compare? This is a single SATA SSD vs. the inexpensive M2 SSD using the SATA interface.
SSDs Sequential Output Sequential Input
SATA Throughput 1470142K/sec 2663890K/sec
Latency 2346us 634ms
M2 Throughput 1503676K/sec 2925337K/sec
Latency3148us21203us
Output to the M2 SSD is a about the same, but has worse latency. Input is faster, especially the latency. This is probably because of different technologies used by the drives - SLC vs. MLC.

Since the SLOG is mostly written to, putting a SLOG for the system drives on the M2 SSD definitely needs more testing. Using it for an L2ARC for them would help on reads, but not nearly as much as for spinning media on either SSD, and it would use extra RAM.

System disks

Let's start by looking at the system disks. There's no point in putting either the intent log or the L2ARC on the same or slower media, though there might be some point in different media of the same speed. So the two things that can be put on the M2 drive mean there are four cases:
System Pool Sequential Output Sequential Input
Mirror Throughput 1479113K/sec 2845500K/sec
Latency 7686us 11106us
With L2ARC Throughput 1546149K/sec 2951646K/sec
Latency 3291us 10190us
With SLOG Throughput 1571529K/sec 2855131K/sec
Latency 3601us 10643us
With L2ARC & SLOG Throughput 1397074K/sec 2821671K/sec
Latency 2552us 13188us
There isn't a lot of difference, which isn't surprising considering the similarity of the two devices. Throughput is less than 10% difference for all except output to the SLOG and L2ARC both on the M2 drive, which is the worst case. Output latency is really bad without a second device, and noticeably better with both a SLOG and L2ARC. But that's the worst case for everything else, and output to the system files isn't a priority.

Using the M2 drive as an L2ARC for the system pool seems like the best use, but having a SLOG is nearly as good. Having both isn't a good idea.

Data pool

Now let's look at the many different configurations for the spinning media pools.

SLOG & L2ARC on M2 drive

While I'm unlikely to put the L2ARC on the M2 drive - there's a lot more room on the system drives - let's check the four options using just the M2 drive:
Spinning using M2 Sequential Output Sequential Input
Mirror Throughput 153481K/sec 228104K/sec
Latency 3016ms 633ms
With L2ARC Throughput 169588K/sec 218247K/sec
Latency 32133us 628ms
With SLOG Throughput 143583K/sec 224642K/sec
Latency 38404us 947ms
With L2ARC & SLOG Throughput 155064K/sec 241953K/sec
Latency 73543us 611ms
This shows quite a bit more variance than we got from the system disks. Adding an L2ARC is generally a win, except that output latency goes to pot with both L2ARC and the SLOG on the same device. Having just an L2ARC seems to be better than having just a SLOG.

With L2ARC on the system drives

Testing the L2ARC on the data drives is problematical. Normally, you want to arrange to use enough data to flush the cache between tests. A test big enough to flush the L2ARC I planned on using would have taken forever. I could have used a partition small enough to flush with the 32GB tests I was using. But - well, the point is to test adding an L2ARC to the disk system, so flushing it seem to defeat the point. It's not clear that bonnie++ is the appropriate tool for this kind of test, but I couldn't find anything that seemed like it would do better.
Spinning media Sequential Output Sequential Input
Spinning Throughput 153481K/sec 228104K/sec
Latency 3016ms 633ms
With L2ARC Throughput 169711K/sec 203280K/sec
Latency 2483ms 1127ms
SLOG Throughput 153702K/sec 206575K/sec
Latency 5475ms 591ms
With L2ARC & SLOG Throughput 156295K/sec 233175K/sec
Latency 54263us 1069ms
M2SLOG Throughput 143583K/sec 224642K/sec
Latency 38404us 947ms
With L2ARC & M2SLOG Throughput 147467K/sec 213030K/sec
Latency 50493us 590ms
Input throughput doesn't change much, but latency gets bad with an L2ARC or L2ARC and SLOG on the system disks. The best input latency happens with the L2ARC and SLOG on different devices. Output is a bit more variable, but the best combination also seems to be splitting the L2ARC and SLOG across different devices.

Conclusions

Unfortunately, bonnie++ isn't really suitable for answering questions about how the various data streams interact with each other. For instance, when I'm doing a compile, does loading data from the L2ARC on the system drive interfere with loading compilers, libraries and so on from them? Even less fortunately, I couldn't find a tool that would really help answer that question. So the test would be to pick a long build and try it on all the configurations.

Maybe someday. For now, I'm satisfied that I haven't badly broken basic throughput somehow. Until then, the spinning disks will have L2ARC on the system disks (not really any question about that) and SLOG on the M2 disk. The system disks will also put the SLOG on the M2 drive, as that's just a bit slower, doesn't use RAM which might interfere with other things, and is a bit safer, which is the goal. The remainder of the M2 drive is going to be used for swap, which isn't really used that often on this system. That will provide a bit more L2ARC for the spinning disks (a win), and keep keep swapping from interfering with system operation.

One final note: it's generally considered a poor investment to buy SSD's for L2ARC - the money is better spent on more RAM which will be used from cache. But the price on SSDs was such that larger drives for my system drives cost LESS than the smaller ones, so I have that L2ARC for free.