Periodic scrubs revisited

There were a number of questions how you control at which time you do a scrub, as some considered it as a bad idea to start a scrub during business hours. I think this is a very good question. Well, you can’t do this via scrub interval. At least not directly (more about this at the end). However, and I don’t really think that this something bad. I would like to explain why.

  1. Many customers don’t have something called „business hours“. Something is always happening on the systems, even it’s not the regular load. Like Backups, Reports, Analysis jobs … something like that. Many customer have their production load 24/7 on the systems because Solaris systems are often the most mission critical ones. So when do you do scrubs on them at off-business hours when there are no off-business hours.
  2. The ZFS scrub is designed to stay in the background. It runs at a low priority in the background.
  3. I am pretty opinionated, that you should design your disk and processor layout in a way that I can cope with all events that has to be expected. Backups are events to be expected for example, so you plan the necessary CPU and disk performance. You have to reasonably expected that during the life time of a system. A disk fails, a solid state disks perhaps more seldom than a rotating rust disk. But they fail. You have to expect that a disk will fail at the worst possible time and you have to expect that your system will start resilvering to a hot spare at the worst possible time.You can’t postpone it to a convenient time, because without being mirrored for example your data it at risk and need to get covered soon. So if the performance impact of of a scrub is too much for you, the impact of a resilvering is too much as well. However if you plan accordingly that you could do a resilvering at any time, you could do a scrub at any time as well. So it doesn’t really matter if the scrubs are scheduled or periodic.
  4. Given that we more and more talk about storage on SSD or hybrid storage pools with flash to accelerate normal accesses and we usually talk about a IOPS budget vastly higher than the needed budget, I think this cases it’s not likely a problem as well.
  5. If you have really a requirement to start the scrub at an exact moment, you can still go the cron route, either for all filesystems or just for one.

    That said, there is a way to get your scrubs running at the time you want. According to the man page scrubinterval is defined like this:

When scrubinterval is set to a time interval, a new scrub will be initiated after the time specified by this property had passed since the start of the last scrub, which had either completed successfully or been canceled explicitly viazpool scrub -s.

So start a scrub at a matching time, so all follow-up scrubs will fall into convenient times based on their scrubinterval setting.