Over the years I did many, many presentations. Whenever talking with the customers afterwards about what they would like to see in ZFS, there was one feature that was always mentioned: Removing devices. While it was no problem for example to remove a member disk of a mirror, you couldn’t remove a top level vdev, you wasn’t able to remove a mirror out of a stripe of mirrors. With Solaris 11.4 we finally have such a feature allowing to do you exactly this. It’s really easy to use, so if I would like only to show this feature this would be a rather short entry. However I would like to shed some light about the mechanism behind it.

Preparing an example

Let’s assume we have three devices and we have created a striped pool out of it.

root@batou:/# zpool create testpool c1t2d0 c1t3d0 c1t4d0

We create some files in it:

root@batou:/# cd testpool 
root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6

Let’s now check the structure of the pool. For this I’m using the zdb -L command. The output is much longer than represented here.

root@batou:/testpool# zdb -L testpool
[...]
        name: ‚testpool‘
[...]
        hostname: ‚batou‘
        vdev_children: 3
[...]
            children[0]:
                guid: 1209395020087258815
                id: 0
                type: ‚disk‘
                path: ‚/dev/dsk/c1t2d0s0‘
                devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB96a218f1-27200143/a‘
                phys_path: ‚/pci@0,0/pci8086,2829@d/disk@2,0:a‘
[...]
            children[1]:
                guid: 5622741003370822611
                id: 1
                type: ‚disk‘
                path: ‚/dev/dsk/c1t3d0s0‘
                devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB9cc00131-8b8a0295/a‘
                phys_path: ‚/pci@0,0/pci8086,2829@d/disk@3,0:a‘
[...]
            children[2]:
                guid: 12149574521403767327
                id: 2
                type: ‚disk‘
                path: ‚/dev/dsk/c1t4d0s0‘
                devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB5b29f40e-f9bc48b9/a‘
                phys_path: ‚/pci@0,0/pci8086,2829@d/disk@4,0:a‘
[...]
                            capacity   operations   bandwidth  —— errors ——
description                used avail  read write  read write  read write cksum
testpool                  5.85G 41.8G   745     0 80.9M     0     0     0     0
  /dev/dsk/c1t2d0s0       1.95G 13.9G   244     0 26.6M     0     0     0     0
  /dev/dsk/c1t3d0s0       1.95G 13.9G   255     0 27.1M     0     0     0     0
  /dev/dsk/c1t4d0s0       1.95G 13.9G   245     0 27.2M     0     0     0     0
[…]

We have 6 Gigabyte worth of data, three devices thus 2 Gigabytes per device. Before you ask, I honestly don’t know why zdb -L shows no writes. Will check this. Now let’s remove one of top level vdevs.

Removing the device

The removal process is really simple to trigger via the remove subcommand to zpool:

root@batou:/# zpool remove testpool c1t4d0 

The device you want to remove then gets into REMOVING .

        NAME                        STATE      READ WRITE CKSUM
        testpool                    ONLINE        0     0     0
          c1t2d0                    ONLINE        0     0     0
          c1t3d0                    ONLINE        0     0     0
          c1t4d0                    REMOVING      0     0     0

After a while the device will disappear from the pool.

        NAME                      STATE      READ WRITE CKSUM
        testpool                  ONLINE        0     0     0
          c1t2d0                  ONLINE        0     0     0
          c1t3d0                  ONLINE        0     0     0

In case you want to remove a top level vdev in a mirror you have to use the name of the top-level vdev. Let's assume a pool consisting out of two mirrors.

           NAME        STATE      READ WRITE CKSUM
          testpool    ONLINE        0     0     0
            mirror-0  ONLINE        0     0     0
              c1t2d0  ONLINE        0     0     0
              c1t3d0  ONLINE        0     0     0
            mirror-1  ONLINE        0     0     0
              c1t4d0  ONLINE        0     0     0
              c1t5d0  ONLINE        0     0     0

To remove the top-level vdev you have to address its name. In this case mirror-0.

 root@sol114s1:~# zpool remove testpool mirror-0

Behind the curtain


So how was this done by Oracle Solaris? Well, this is quite simple. It doesn't really reorganize the data. The pool has still three devices after the change. You just don’t see the third one. When you check with zdb -L testpool you will see that the third device changed to

            children[2]:
                guid: 14641473971126587410
                id: 2
                type: ‚pseudo‘
                path: ‚$VDEV-9DA81B2EED2E2E37‘
                phys_path: ‚testpool/$VDEV-9DA81B2EED2E2E37‘
                removing: 1

The third device has been substituted by an virtual devices. This virtual device resides on the disks remaining in the pool. You can see it quite nicely in the output of zdb

description                used avail  read write  read write  read write cksum
testpool                  6.03G 25.7G 3.55K     0 3.87M     0     0     0     0
  /dev/dsk/c1t2d0s0       3.02G 12.9G 1.77K     0 1.92M     0     0     0     0
  /dev/dsk/c1t3d0s0       3.02G 12.9G 1.76K     0 1.91M     0     0     0     0
  $VDEV-9DA81B2EED2E2E37  2.00G 13.9G    20     0 28.9K     0     0     0     0

There is still a third device in it with 2 G worth of data, but more interesting the remaining devices now have taken over the data as indicated by the increased used column for both devices. As long as the data isn’t changed the data will stay on this virtual device. Please note that the system isn't simply blocking the full size of the vdev on disk, but it's only the space for the data.

Let’s now delete everything in the pool by issuing a rm /testpool/* command:

                            capacity   operations   bandwidth  —— errors ——
description                used avail  read write  read write  read write cksum
testpool                   499K 31.7G   460     0 2.85M     0     0     0     0
  /dev/dsk/c1t2d0s0        316K 15.9G   203     0 1.51M     0     0     0     0
  /dev/dsk/c1t3d0s0        184K 15.9G   248     0 1.22M     0     0     0     0
  $VDEV-9DA81B2EED2E2E37  6.50K 15.9G     9     0  119K     0     0     0     0

The consumption has been significantly reduced. Let’s now recreate our datafiles.

root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6

After this you will see the following output in the zdb -L output.

                            capacity   operations   bandwidth  —— errors ——
description                used avail  read write  read write  read write cksum
testpool                  6.00G 25.7G 2.54K     0  194M     0     0     0     0
  /dev/dsk/c1t2d0s0       3.00G 12.9G 1.31K     0 96.7M     0     0     0     0
  /dev/dsk/c1t3d0s0       3.00G 12.9G 1.22K     0 97.7M     0     0     0     0
  $VDEV-9DA81B2EED2E2E37  6.50K 15.9G     9     0  119K     0     0     0     0

The virtual device isn’t used for new writes, however all reads for the removed disks are now serviced by the virtual device, which means by proxy by the remaining disks. But the virtual device doesn’t get any new data. So over time in case you change the data on your pool, the virtual device won’t be used anymore. Of course when the data is static and you never change it, it won't be migrated of the vdev.

When you add a new device, it won’t substitute the virtual device acting as the third device:

root@batou:~# zpool add testpool c1t4d0

You will see a pool with four devices instead.

                            capacity   operations   bandwidth  —— errors ——
description                used avail  read write  read write  read write cksum
testpool                  6.00G 41.6G 1.55K     0 4.07M     0     0     0     0
  /dev/dsk/c1t2d0s0       3.00G 12.9G   771     0 1.64M     0     0     0     0
  /dev/dsk/c1t3d0s0       3.00G 12.9G   774     0 1.62M     0     0     0     0
  $VDEV-9DA81B2EED2E2E37  6.50K 15.9G     9     0  119K     0     0     0     0
  /dev/dsk/c1t4d0s0       21.0K 15.9G    38     0  708K     0     0     0     0

Conclusion


After quite a time ZFS has finally the ability to remove top level vdevs. I think that reduces a lot of questions from now on in presentations.






1 Comment

Linear

  • Daniel  
    So we have a zpool with 4 devices and when these are full add a fifth one. That one becomes a hot device, as most reads and writes are centered on it and thus becomes a bottleneck for IO performance on this pool.
    Assuming we add four more disks to the pool and then remove the hot device, will it's data get distributed evenly on these four new devices? Will it still remain a bottleneck due to the hidden internal vdev?

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA