Archive for October, 2017

Peak ZFS

Monday, October 30th, 2017

I went to the open zfs developers conference last week and I learned a lot.

I’ve gone for a few years now and this year I thought was the most interesting Maybe it was only 2 years, I forget. Anyway, all the talks were technical and were about useful features and things that could make zfs better. I didn’t notice how quickly the whole day went by. It was a lot to take in, but it was all zfs all the time, and if you’re in to that sorta thing, it was a lot of good.

First of all I have to mention one very noteworthy moment. Every speaker got well deserved applause after their talk, but there was one guy, lundman who got applause just for saying what he did, and what he said was this: “I ported zfs to windows.” And he showed it and it worked. Truly a moment to behold. It was amazing.

But after listening to a few of the talks I started to notice something, and that something is that I think we’re approaching Peak ZFS.

We, as in not me, but the we of all the actual zfs developers of which I am just a wannabe so my opinion has no merit. But the web being what it is, and thinking of my favorite line from the movie Dark Star, “A concept is valid regardless of its origin” you can choose to disagree with me, but you can’t tell me what I’m saying is wrong just because I’m not a zfs developer.

ZFS has had a good run so far, it is 10-15 years old depending on how you count and it has come a long way and it does a great many amazing things, but like all software, if you keep adding to it, you’re eventually going to end up with exception upon exception upon exception that wasn’t in the original design, that has to be taken into consideration when adding new features. And any new feature you add will be an encumbrance to any future features added, so you should be careful with what you add and how you add it so as to minimize the future pain everybody’s going to have to suffer.

But that’s not what’s going on.

There are currently 3 prefetchers in zfs. Nothing wrong with prefetchers, nothing wrong with three of them either, but it is note worthy that there are three and not one.

There are currently 2 log writing systems, one for the zil and one for the spacemaps. Matt suggested adding another one to make dedup faster (a welcome feature if there is one) and Sarah from delphix suggested another log to optimize clone deletes. Possibly a less popular use case, but valid nonetheless.

Yet this will yield 4 separate and different log implementations. Is the zil anything like dedup? No, but a log is a log, and maybe it would be in somebody’s interest to save the future from the present and consolidate the logging concept into a subsystem that can be shared by all of the things that need to log things to disk for optimization purposes. All 4 of the logs (with the possible exception of spacemaps) are optimizations, and now there are 3 or 4 of them.

But the icing on the cake was this one:

George was suggesting a feature to compensate for the performance hit caused by 512 byte sector emulation on 4k sector drives. If you know anything about the world of storage, you know this is a noticeable performance problem, and not just to zfs.

But in my opinion, it’s also a problem that’s going to go away by itself, and it got me wondering if it’s really worth adding another bit of code that will probably be in zfs forever, to compensate for a temporary problem. And then I realized there were a few other features that fall into this category.

Gang blocks exist to solve the problem of zfs not dealing well when it’s running low on space.

Seems to me if you’re running a data storage system large enough to justify a filesystem that can store a zettabyte, it’s hard for me to imagine you running out of disk, and if you are, you’re probably not doing your job very well. But from now on and forever more every zfs developer has to work around gang blocks because it seemed like a worthy goal at some point to sub allocate blocks of storage to deal with low-availability situations.

Somebody pointed out that the 512 emulation problem may go away, but someday it will be replaced by a 4k->16k emulation problem. Fair enough, but I say again, if you’re running an important enough system that you require zfs, you should be able to make sure your pool is filled with disks of the same type. And if you need to move to 16k sector disks, then you make a new pool of them and figure out a way to migrate your data, not make every future zfs installation suffer the cruft of dealing with this one edge case that most of the time, nobody will experience.

Hacking more and more exceptions into zfs isn’t going to help anybody in the long term, but that’s what’s happening, and there’s no SUN in control to keep it from getting out of hand, which it seems to me, it already has.

I love zfs and probably always will, it’s hard to imagine something cooler coming along anytime soon, and I’ve been doing this software stuff for 30+ years and I’ve seen it over and over and over, it’s inevitable, and there’s nothing you can do to stop it, but you can slow it down by taking a step back and thinking about what’s really worthwhile and what can be lived without to make it last as long as possible.

Now the real answer is to “write one to throw away.” Which means starting over with the knowledge of all the lessons learned, leaving out things you no longer need (sendmail being able to send mail via carrier pigeon comes to mind (by the way, last time I looked sendmail’s main() function was 3000 lines long.))

But that can’t happen, it’s called btrfs and it didn’t fly, or is slowly heading towards a landing or something. You can’t replace zfs, if anything I predict somebody will come along and fork zfs, remove all the stuff they don’t need and maybe some other people will pick up on it and it will become the new popular zfs. But it seems unlikely a new upstart will come of out nowhere and win.

If you see how open source projects come together, it’s easy to see why zfs was awesome when managed by sun and eclipse was awesome when managed by IBM.

No disrespect to any of the current zfs developers, they’re probably the most brilliant collection of developers there is, but being open source there’s nobody running the ship with an iron fist like a company could, and in my opinion, it’s starting to show.